LoginSignup
4
4

More than 3 years have passed since last update.

GeoLite2をBigQueryに格納してGeoLocationを取得(IPアドレス→国情報変換)

Last updated at Posted at 2018-11-27

2021-04-11更新

Googleにgeolocationをbigqueryで扱う記事の解説
https://cloud.google.com/blog/products/data-analytics/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds

  1. サブネット表記から network_bin, maskを抽出し、付き合わせテーブルに仕込んでおく
create table geolite2_city as 
(SELECT *
  , NET.IP_FROM_STRING(REGEXP_EXTRACT(network, r'(.*)/' )) network_bin
  , CAST(REGEXP_EXTRACT(network, r'/(.*)' ) AS INT64) mask
FROM `fh-bigquery.geocode.201806_geolite2_city_ipv4` 
JOIN `fh-bigquery.geocode.201806_geolite2_city_locations_en`
USING(geoname_id))

2.join する時に下記のようにINNER JOINでIPアドレスとそれに一致するネットマスク表記を結合する。

select * from 
(select * , NET.SAFE_IP_FROM_STRING(ip_address) & NET.IP_NET_MASK(4, mask) network_bin from
ip_table ,UNNEST(GENERATE_ARRAY(9,32)) mask
where 
BYTE_LENGTH(NET.SAFE_IP_FROM_STRING(ip_address)) = 4)
 JOIN
  geolite2_city
using (network_bin, mask)

下記の方法はIPアドレスを数値に戻してIPレンジで指定していた。ネットマスクの方がASNの付き合わせとかにも応用が聞くので上の方法を推奨ですね。

やりたいこと

IPアドレスに対してGeoLocation情報を付与したい。
BigQueryに格納しているIPアドレスにCountry,Cityの情報を付与したい。

必要なもの

GeoLite2データ
フリー版はこちらからダウンロード
https://dev.maxmind.com/geoip/geoip2/geolite2/
GeoLite2 City CSVを使う

手順

1
GeoLite2-City-Locations-en.csv (city_locaion)
GeoLite2-City-Blocks-IPv4.csv (city_blocks_ipv4)
をBigQueryにインポート
2.
突き合せのためのテーブルを作成

create table yumano.geolite2_city as 
select * from
 yumano.city_location as a 
inner join  
(SELECT
network,
NET.IPV4_TO_INT64(NET.IP_FROM_STRING(REGEXP_EXTRACT( network, '(.*)/' ) ))   as startIpNum, 
CAST(NET.IPV4_TO_INT64(NET.IP_FROM_STRING(REGEXP_EXTRACT( network, '(.*)/' ))) + POW(2, 32- CAST(REGEXP_EXTRACT(network, '/(.*)')  as  INT64 )) -1  as Int64)  as   endIpNum ,
CAST(TRUNC(NET.IPV4_TO_INT64(NET.IP_FROM_STRING(REGEXP_EXTRACT( network, '(.*)/' )))/(256*256)) as Int64)   as ClassB, 
ifnull(geoname_id, registered_country_geoname_id  )
as uniq_geoname_id, is_anonymous_proxy,is_satellite_provider,postal_code
  FROM city_blocks_ipv4 )  as b 
 on uniq_geoname_id = a.geoname_id
  1. IPアドレスが入っているテーブルとJOIN
select 
    ACCESS_IP_ADDRESS,
    IFNULL(country_name, 'Other') AS country_name,
    IFNULL(city_name, 'Other') AS city_name
FROM (
  SELECT
    ACCESS_IP_ADDRESS,
    NET.IPV4_TO_INT64(NET.IP_FROM_STRING(access_ip_address)) AS clientIpNum,
    TRUNC(NET.IPV4_TO_INT64(NET.IP_FROM_STRING(access_ip_address))/(256*256)) AS classB
  FROM
    yumano.IP_LIST ) AS a
LEFT OUTER JOIN
  yumano.geolite2_city AS b
ON
  a.classB = b.classB
  AND a.clientIpNum BETWEEN b.startIpNum AND b.endIpNum

参考

4
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
4