Canopy Expressをインストール
https://store.enthought.com/downloads/
canopy-1.5.2-osx-64.dmg
300MBぐらい
ドラッグ・アンド・ドロップでインストールして、実行許可を与えたら初期設定が始まる。
デフォルトのPythonを変えてよい?と聞かれYes。
iTermでpythonと打ってもどうも変わっていない。
% which python
/usr/bin/python
.bash_profileに追加されている
# Added by Canopy installer on 2015-02-15
# VIRTUAL_ENV_DISABLE_PROMPT can be set to '' to make bashprompt show that Canopy is active, otherwise 1
VIRTUAL_ENV_DISABLE_PROMPT=1 source /Users/takeru/Library/Enthought/Canopy_64bit/User/bin/activate
zshだからだ。
% VIRTUAL_ENV_DISABLE_PROMPT=1 source /Users/takeru/Library/Enthought/Canopy_64bit/User/bin/activate
% which python
/Users/takeru/Library/Enthought/Canopy_64bit/User/bin/python
変わった。
単にそれを実行すればPATHを変えてくれるようだ。
なんか出た!
http://www.usa.gov/About/developer-resources/1usagov.shtml#How_to_Access_The_Data
http://1usagov.measuredvoice.com/2013/
なんか更新止まってるみたいだけど古いのは取れる。
20:29:59 tkrimac2:~/proj/pydata% gunzip usagov_bitly_data2013-05-17-1368832207.gz
20:30:03 tkrimac2:~/proj/pydata% ls
usagov_bitly_data2013-05-17-1368832207
20:30:23 tkrimac2:~/proj/pydata% ipython --pylab
Python 2.7.6 | 64-bit | (default, Sep 15 2014, 17:43:19)
Type "copyright", "credits" or "license" for more information.
IPython 2.3.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
Using matplotlib backend: MacOSX
In [1]: path="usagov_bitly_data2013-05-17-1368832207"
In [2]: import json
In [3]: records = [json.loads(line) for line in open(path)]
In [4]: records[0]
Out[4]:
{u'a': u'Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; HTC_PN071 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30',
u'al': u'en-US',
u'c': u'US',
u'cy': u'Anaheim',
u'g': u'15r91',
u'gr': u'CA',
u'h': u'10OBm3W',
u'hc': 1365701422,
u'hh': u'j.mp',
u'l': u'pontifier',
u'll': [33.816101, -117.979401],
u'nk': 0,
u'r': u'direct',
u't': 1368832205,
u'tz': u'America/Los_Angeles',
u'u': u'http://www.nsa.gov/'}
In [5]: records[0]['tz']
Out[5]: u'America/Los_Angeles'
In [6]: print records[0]['tz']
America/Los_Angeles
In [11]: time_zones = [rec['tz'] for rec in records if 'tz' in rec]
In [12]: time_zones[:10]
Out[12]:
[u'America/Los_Angeles',
u'',
u'America/Phoenix',
u'America/Chicago',
u'',
u'America/Indianapolis',
u'America/Chicago',
u'',
u'Australia/NSW',
u'']
tmp.pyを作って
# tmp.py
print "a"
In [18]: import tmp
a
(tmp.pyの"a"を"aaa"に書き換え)
In [19]: import tmp
(読み込まれない)
In [23]: reload(tmp)
aaa
In [27]: reload(tmp)
Out[27]: <module 'tmp' from 'tmp.py'>
In [28]: tmp.get_counts(time_zones)
Out[28]:
{u'': 636,
u'Africa/Cairo': 3,
u'Africa/Casablanca': 1,
u'Africa/Ceuta': 6,
u'Africa/Gaborone': 1,
u'Africa/Johannesburg': 2,
u'America/Anchorage': 8,
u'America/Argentina/Buenos_Aires': 11,
...
In [29]: reload(tmp)
Out[29]: <module 'tmp' from 'tmp.py'>
In [30]: tmp.get_counts2(time_zones)
Out[30]: defaultdict(<type 'int'>, {u'': 636, u'Europe/Lisbon': 8, u'America/Bogota': 16, u'America/Edmonton': 9, u'Australia/Tasmania': 1, u'Europe/Tallinn': 1, u'Asia/Calcutta': 6, u'Australia/South': 4, u'Europe/Skopje': 1, u'Europe/Copenhagen': 4, u'America/St_Lucia': 1, u'Europe/Amsterdam': 15, u'Europe/Zaporozhye': 1, u'America/Phoenix': 40, u'Europe/Moscow': 35, u'America/El_Salvador': 2, u'Europe/Madrid': 21,
In [31]: counts = tmp.get_counts(time_zones)
In [32]: counts["Asia/Tokyo"]
Out[32]: 102
In [33]: counts2 = tmp.get_counts2(time_zones)
In [34]: counts["Asia/Tokyo"]
Out[34]: 102
In [37]: tmp.top_counts(counts)
Out[37]:
[(40, u'America/Phoenix'),
(50, u'America/Indianapolis'),
(85, u'Europe/London'),
(89, u'America/Denver'),
(102, u'Asia/Tokyo'),
(184, u'America/Puerto_Rico'),
(421, u'America/Los_Angeles'),
(636, u''),
(686, u'America/Chicago'),
(903, u'America/New_York')]
# tmp.py
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
from collections import defaultdict
def get_counts2(sequence):
counts = defaultdict(int)
for x in sequence:
counts[x] += 1
return counts
def top_counts(count_dict, n=10):
value_key_pairs = [(count,tz) for tz,count in count_dict.items()]
value_key_pairs.sort()
return value_key_pairs[-n:]
from collections import Counter
In [41]: counts3 = tmp.Counter(time_zones)
In [42]: counts3
Out[42]: Counter({u'America/New_York': 903, u'America/Chicago': 686, u'': 636, u'America/Los_Angeles': 421, u'America/Puerto_Rico': 184, u'Asia/Tokyo': 102, u'America/Denver': 89, u'Europe/London': 85, u'America/Indianapolis': 50, u'America/Phoenix': 40, u'Europe/Moscow': 35, u'America/Rainy_River': 33, u'Australia/NSW': 32, u'America/Sao_Paulo': 29, u'Europe/Paris': 27, u'Europe/Berlin': 24, u'America/Vancouver': 23,
In [43]: counts3.most_common(10)
Out[43]:
[(u'America/New_York', 903),
(u'America/Chicago', 686),
(u'', 636),
(u'America/Los_Angeles', 421),
(u'America/Puerto_Rico', 184),
(u'Asia/Tokyo', 102),
(u'America/Denver', 89),
(u'Europe/London', 85),
(u'America/Indianapolis', 50),
(u'America/Phoenix', 40)]
In [44]: from pandas import DataFrame,Series
In [45]: import pandas as pd; import numpy as np
In [46]: frame = DataFrame(records)
In [47]: frame
Out[47]:
_heartbeat_ a \
0 NaN Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; H...
1 NaN Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...
2 NaN Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20...
3 NaN Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; S...
4 NaN Opera/9.80 (Android; Opera Mini/7.5.33286/29.3...
5 NaN Mozilla/5.0 (compatible; MSIE 10.0; Windows NT...
6 NaN Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
7 NaN Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 li...
8 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
9 NaN Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
10 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
11 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
12 NaN Mozilla/5.0 (iPad; CPU OS 6_1_2 like Mac OS X)...
13 NaN Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20...
14 NaN Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...
15 NaN Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; r...
16 NaN Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
17 NaN Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
18 NaN Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
19 NaN Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
20 NaN Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
21 NaN Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
22 NaN Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
23 NaN Mozilla/5.0 (Linux; U; Android 2.3.7; en-us; L...
24 NaN Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3...
25 NaN Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
26 NaN Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
27 NaN Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
28 NaN Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
29 NaN Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
... ... ...
3929 NaN Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
3930 NaN Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; r...
3931 NaN Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
3932 NaN Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.3...
3933 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
3934 NaN Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
3935 NaN Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
3936 NaN Mozilla/5.0 (compatible; Genieo/1.0 http://www...
3937 NaN Mozilla/5.0 (Linux; U; Android 4.0.3; en-gb; H...
3938 NaN Mozilla/5.0 (Linux; U; Android 4.1.2; es-es; G...
3939 NaN ShortLinkTranslate
3940 NaN Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; L...
3941 NaN Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X)...
3942 NaN Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X)...
3943 NaN Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
3944 NaN Mozilla/5.0 (Linux; U; Android 2.3.4; en-us; D...
3945 NaN Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
3946 NaN Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...
3947 NaN Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
3948 NaN Mozilla/5.0 (Linux; Android 4.0.4; SO-03D Buil...
3949 NaN Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) G...
3950 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like ...
3951 NaN Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
3952 NaN Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X)...
3953 1368835801 NaN
3954 NaN Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; M...
3955 NaN Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
3956 NaN ShortLinkTranslate
3957 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like ...
3958 NaN Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
al c cy g \
0 en-US US Anaheim 15r91
1 en-us None NaN ifIpBW
2 en-US,en;q=0.5 US Fort Huachuca 10DaxOu
3 en-US US Houston TysVFU
4 en None NaN 10IGW7m
5 en-US US Mishawaka 13GrCeP
6 en-US,en;q=0.5 US Hammond YmtpnZ
7 en-us None NaN 13oM0hV
8 en-us AU Sydney 15r91
9 en-US,en;q=0.8 None NaN 109LtDc
10 en-us US Middletown 109ar5F
11 en-us US Germantown 107xZnW
12 en-us US Richmond 19AcekS
13 en-US,en;q=0.5 US Portland 16mY628
14 en-us US Aurora YRyW8K
15 en-US,en;q=0.5 US Houston 18NUp44
16 en-US,en;q=0.8 US Muskego YmtpnZ
17 en-us US Arvada ZPictr
18 en-us US Bend 11C6yJk
19 en-US,en;q=0.5 US Laurel 15RP5hF
20 en-us US Seattle 12yP2Cx
21 en-US,en;q=0.8 None NaN 109ar5F
22 en-US,en;q=0.8 US Durand YmtpnZ
23 en-us,en;q=0.9 None NaN 107xZnW
24 NaN None NaN YmtpnZ
25 en-us,en;q=0.5 None NaN ifIpBW
26 en-us,en;q=0.5 None NaN ifIpBW
27 en-us,en;q=0.5 None NaN ifIpBW
28 en-us,en;q=0.5 None NaN ifIpBW
29 en-us,en;q=0.5 None NaN ifIpBW
... ... ... ... ...
3929 ja,en-US;q=0.8,en;q=0.6 None NaN 10Kc32m
3930 fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3 CH Chambesy 14bmsHn
3931 en-us US Rockville 14bmsHn
3932 en-US,en;q=0.8 None NaN 15r91
3933 en-us US Ann Arbor 1084Psg
3934 en-US,en;q=0.5 US Thomasville 15r91
3935 en-US,en;q=0.8 US Coronado Ntl Forest 186NWQK
3936 NaN US Manhattan Beach YYv1XQ
3937 en-GB, en-US None NaN 12AyUk2
3938 es-ES, en-US None NaN 12AyUk2
3939 NaN JP Kashiwa YPnFn4
3940 en-US None NaN 15r91
3941 en-us US Marshfield YmtpnZ
3942 en-US,en;q=0.8 US Vaughn 16uqtLe
3943 en-US US Orlando 10WMBv9
3944 en-US US Lakewood 14bmsHn
3945 en-US,en;q=0.5 US Boone 10X5IW8
3946 en-US,en;q=0.8 None NaN YmtpnZ
3947 en-US,en;q=0.8 US Logan 107xZnW
3948 ja,en-US;q=0.8,en;q=0.6 JP Tokyo 15TFyGK
3949 en-gb,en;q=0.5 US Castro Valley 11C6yJk
3950 en-us US Fayetteville 10ydrrV
3951 en-us US Salem 107xZnW
3952 en-us US Grayson 107xZnW
3953 NaN NaN NaN NaN
3954 en-US US Mobile 10WWSaR
3955 en-us US Brookfield YmtpnZ
3956 NaN JP Tsukuba YPnFn4
3957 en-us None NaN 17B6VoC
3958 en-us PR Guaynabo 10WMBv9
gr h hc hh kw l \
0 CA 10OBm3W 1365701422 j.mp NaN pontifier
1 NaN ifIpBW 1302189369 1.usa.gov NaN bitly
2 AZ 10DaxOt 1368814585 1.usa.gov NaN jaxstrong
3 TX TChsoQ 1354719206 1.usa.gov NaN o_5004fs3lvd
4 NaN 10IGW7l 1368738258 1.usa.gov NaN peacecorps
5 IN 13GrCeP 1368130510 1.usa.gov NaN bitly
6 WI YmtpnZ 1363711958 1.usa.gov NaN bitly
7 NaN 15PUeH0 1368714329 go.nasa.gov NaN nasatwitter
8 02 10OBm3W 1365701422 j.mp NaN pontifier
9 NaN 109LtDb 1368821840 go.nasa.gov NaN nasatwitter
10 OH 109ar5E 1368803813 1.usa.gov NaN usairforce
11 MD 107xZnW 1368815450 1.usa.gov NaN bitly
12 KY 19AcekR 1368738410 1.usa.gov NaN peacecorps
13 OR 16mY627 1368743779 1.usa.gov NaN pbierce
14 IL YRyW8K 1368475960 1.usa.gov NaN bitly
15 TX 18NUoNR 1368727073 1.usa.gov NaN o_1fs5ea3lim
16 WI YmtpnZ 1363711958 1.usa.gov NaN bitly
17 CO ZPictq 1366901428 1.usa.gov NaN o_d63rn9enb
18 OR 19oVtZN 1368558078 1.usa.gov NaN raylahood
19 MD 16Ewvc4 1368818301 1.usa.gov NaN rebroth
20 WA 12yP2Cw 1368741846 1.usa.gov NaN o_6vo5h05abv
21 NaN 109ar5E 1368803813 1.usa.gov NaN usairforce
22 WI YmtpnZ 1363711958 1.usa.gov NaN bitly
23 NaN 107xZnW 1368815450 1.usa.gov NaN bitly
24 NaN YmtpnZ 1363711958 1.usa.gov NaN bitly
25 NaN ifIpBW 1302189369 1.usa.gov NaN bitly
26 NaN ifIpBW 1302189369 1.usa.gov NaN bitly
27 NaN ifIpBW 1302189369 1.usa.gov NaN bitly
28 NaN ifIpBW 1302189369 1.usa.gov NaN bitly
29 NaN ifIpBW 1302189369 1.usa.gov NaN bitly
... ... ... ... ... ... ...
3929 NaN 10Kc32l 1368809020 go.nasa.gov NaN nasatwitter
3930 07 14bmsHn 1368223558 1.usa.gov NaN bitly
3931 MD 14bmsHn 1368223558 1.usa.gov NaN bitly
3932 NaN 10OBm3W 1365701422 j.mp NaN pontifier
3933 MI 1084Psg 1368755511 j.mp NaN bitly
3934 NC 10OBm3W 1365701422 j.mp NaN pontifier
3935 AZ 186NWQK 1368828881 1.usa.gov NaN bitly
3936 CA YYv1XQ 1368710703 1.usa.gov NaN bitly
3937 NaN 12AyUk1 1368808362 go.nasa.gov NaN nasatwitter
3938 NaN 12AyUk1 1368808362 go.nasa.gov NaN nasatwitter
3939 04 YPnFn3 1368833354 1.usa.gov NaN hayano
3940 NaN 10OBm3W 1365701422 j.mp NaN pontifier
3941 WI YmtpnZ 1363711958 1.usa.gov NaN bitly
3942 WA 16uqtLd 1368455001 1.usa.gov NaN o_33avl0ri1b
3943 FL 10WMBv9 1368826755 1.usa.gov NaN bitly
3944 OH 14bmsHn 1368223558 1.usa.gov NaN bitly
3945 NC 10X5IW7 1368834590 1.usa.gov NaN inws
3946 NaN YmtpnZ 1363711958 1.usa.gov NaN bitly
3947 UT 107xZnW 1368815450 1.usa.gov NaN bitly
3948 40 15TFyGJ 1368810402 go.nasa.gov NaN nasatwitter
3949 CA 19oVtZN 1368558078 1.usa.gov NaN raylahood
3950 GA 10ydrrU 1368806410 1.usa.gov NaN fsanewmedia
3951 VA 107xZnW 1368815450 1.usa.gov NaN bitly
3952 KY 107xZnW 1368815450 1.usa.gov NaN bitly
3953 NaN NaN NaN NaN NaN NaN
3954 AL 10WWSaQ 1368830835 1.usa.gov NaN inws
3955 WI YmtpnZ 1363711958 1.usa.gov NaN bitly
3956 14 YPnFn3 1368833354 1.usa.gov NaN hayano
3957 NaN 16n91ZK 1368746683 go.nasa.gov NaN nasatwitter
3958 00 10WMBv9 1368826755 1.usa.gov NaN bitly
ll nk \
0 [33.816101, -117.979401] 0
1 NaN 0
2 [31.5273, -110.360703] 1
3 [29.7633, -95.363297] 1
4 NaN 0
5 [41.612301, -86.1381] 0
6 [45.007, -92.459099] 1
7 NaN 0
8 [-33.8615, 151.205505] 0
9 NaN 0
10 [39.515099, -84.3983] 1
11 [39.131699, -77.288002] 0
12 [37.766602, -84.303101] 1
13 [45.529499, -122.643204] 1
14 [41.760601, -88.320099] 0
15 [29.7633, -95.363297] 0
16 [42.877602, -88.133797] 0
17 [39.802799, -105.087502] 1
18 [44.074402, -121.257401] 1
19 [39.135799, -76.872002] 0
20 [47.606201, -122.3321] 1
21 NaN 0
22 [44.590698, -91.891197] 0
23 NaN 0
24 NaN 0
25 NaN 0
26 NaN 0
27 NaN 0
28 NaN 0
29 NaN 0
... ... ..
3929 NaN 1
3930 [46.242401, 6.1435] 0
3931 [39.089199, -77.183502] 1
3932 NaN 0
3933 [42.216702, -83.740601] 1
3934 [35.882599, -80.082001] 0
3935 [31.9582, -110.693001] 0
3936 [33.889301, -118.401001] 0
3937 NaN 1
3938 NaN 1
3939 [35.854401, 139.968903] 0
3940 NaN 0
3941 [44.6688, -90.171799] 0
3942 [47.314499, -122.778503] 0
3943 [28.3899, -81.4366] 0
3944 [41.481701, -81.802399] 0
3945 [36.219101, -81.656303] 0
3946 NaN 1
3947 [41.641201, -111.896599] 0
3948 [35.685001, 139.751404] 1
3949 [37.709, -122.088501] 0
3950 [33.481098, -84.479797] 0
3951 [37.2906, -80.101402] 0
3952 [38.336399, -82.992401] 0
3953 NaN NaN
3954 [30.657499, -88.1586] 1
3955 [43.060799, -88.1558] 0
3956 [36.083302, 140.116699] 0
3957 NaN 0
3958 [18.3876, -66.110802] 1
r t \
0 direct 1368832205
1 http://www.usa.gov/ 1368832207
2 http://www.facebook.com/l.php?u=http%3A%2F%2F1... 1368832209
3 http://m.facebook.com/l.php?u=http%3A%2F%2F1.u... 1368832209
4 http://t.co/CDO9hLTtNT 1368832208
5 direct 1368832209
6 http://www.bwsd.k12.wi.us/SitePages/Home.aspx 1368832210
7 http://t.co/YIsVhFDLj2 1368832211
8 direct 1368832213
9 http://t.co/yPSKO2t5v1 1368832215
10 https://m.facebook.com 1368832215
11 http://t.co/u8qVCKx8RK 1368832218
12 http://www.facebook.com/l.php?u=http%3A%2F%2F1... 1368832219
13 http://t.co/T8EyBbUBJ8 1368832219
14 http://www.z2systems.com/np/clients/kca/news.j... 1368832219
15 http://t.co/s307mx2qGk 1368832220
16 http://www.cudahy.k12.wi.us/ 1368832220
17 direct 1368832220
18 direct 1368832222
19 http://t.co/Dv6Jqbwu8H 1368832222
20 http://t.co/7K9urpYyc6 1368832224
21 https://www.facebook.com/ 1368832225
22 http://www.alma.k12.wi.us/ 1368832224
23 http://m.facebook.com/l.php?u=http%3A%2F%2F1.u... 1368832224
24 http://www.wabeno.k12.wi.us/ 1368832226
25 http://addthis.com/hemmings 1368832226
26 http://addthis.com/hemmings 1368832226
27 http://addthis.com/hemmings 1368832227
28 http://addthis.com/hemmings 1368832227
29 http://addthis.com/hemmings 1368832227
... ... ...
3929 http://t.co/HgiLLFRDtE 1368835778
3930 direct 1368835778
3931 direct 1368835778
3932 direct 1368835780
3933 http://t.co/orOTdRX5aF 1368835780
3934 direct 1368835782
3935 https://www.facebook.com/ 1368835783
3936 direct 1368835785
3937 direct 1368835786
3938 http://t.co/q6402O6lFC 1368835786
3939 direct 1368835787
3940 direct 1368835789
3941 http://www.colby.k12.wi.us/ 1368835791
3942 http://fwp.mt.gov/ 1368835793
3943 http://www.elnuevodia.com/brillanteexplosionen... 1368835794
3944 direct 1368835794
3945 http://www.facebook.com/l.php?u=http%3A%2F%2F1... 1368835793
3946 http://www.cudahy.k12.wi.us/ 1368835795
3947 https://www.facebook.com/ 1368835795
3948 direct 1368835795
3949 http://www.cahighspeedrail.ca.gov/ 1368835797
3950 http://t.co/psBn8njvIB 1368835797
3951 http://www.facebook.com/l.php?u=http%3A%2F%2F1... 1368835798
3952 http://www.facebook.com/l.php?u=http%3A%2F%2F1... 1368835801
3953 NaN NaN
3954 http://m.facebook.com/l.php?u=http%3A%2F%2F1.u... 1368835802
3955 http://www.cudahy.k12.wi.us/ 1368835803
3956 direct 1368835804
3957 http://t.co/XLS75r3BCB 1368835805
3958 http://www.elnuevodia.com/brillanteexplosionen... 1368835806
tz u
0 America/Los_Angeles http://www.nsa.gov/
1 http://answers.usa.gov/system/selfservice.cont...
2 America/Phoenix http://www.saj.usace.army.mil/Media/NewsReleas...
3 America/Chicago https://nationalregistry.fmcsa.dot.gov/
4 http://www.peacecorps.gov/learn/howvol/ab530gr...
5 America/Indianapolis https://petitions.whitehouse.gov/petition/repe...
6 America/Chicago http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
7 http://www.nasa.gov/multimedia/imagegallery/im...
8 Australia/NSW http://www.nsa.gov/
9 http://www.nasa.gov/mission_pages/sunearth/new...
10 America/New_York http://www.dodlive.mil/index.php/2013/05/the-2...
11 America/New_York http://doggett.house.gov/index.php/news/571-do...
12 America/New_York http://www.peacecorps.gov/learn/howvol/ab530gr...
13 America/Los_Angeles http://www.fws.gov/cno/press/release.cfm?rid=493
14 America/Chicago http://www.cancer.gov/PublishedContent/Images/...
15 America/Chicago http://www.army.mil/article/103380/
16 America/Chicago http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
17 America/Denver http://www.nws.noaa.gov/com/weatherreadynation...
18 America/Los_Angeles http://fastlane.dot.gov/2013/05/new-locomotive...
19 America/New_York http://apod.nasa.gov/apod/ap130517.html
20 America/Los_Angeles http://www.ice.gov/news/releases/1305/130516sa...
21 http://www.dodlive.mil/index.php/2013/05/the-2...
22 America/Chicago http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
23 http://doggett.house.gov/index.php/news/571-do...
24 http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
25 http://answers.usa.gov/system/selfservice.cont...
26 http://answers.usa.gov/system/selfservice.cont...
27 http://answers.usa.gov/system/selfservice.cont...
28 http://answers.usa.gov/system/selfservice.cont...
29 http://answers.usa.gov/system/selfservice.cont...
... ... ...
3929 http://www.nasa.gov/mission_pages/station/expe...
3930 Europe/Zurich http://gsaauctions.gov/gsaauctions/aucdsclnk?s...
3931 America/New_York http://gsaauctions.gov/gsaauctions/aucdsclnk?s...
3932 http://www.nsa.gov/
3933 America/New_York http://science.nasa.gov/science-news/science-a...
3934 America/New_York http://www.nsa.gov/
3935 America/Phoenix http://cms3.tucsonaz.gov/files/police/media-re...
3936 America/Los_Angeles http://www.irs.gov/uac/Newsroom/Tax-Relief-for...
3937 http://www.jpl.nasa.gov/news/news.php?release=...
3938 http://www.jpl.nasa.gov/news/news.php?release=...
3939 Asia/Tokyo http://www.doe.gov/articles/energy-department-...
3940 http://www.nsa.gov/
3941 America/Chicago http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
3942 America/Los_Angeles http://fwp.mt.gov/hunting/hunterAccess/openFie...
3943 America/New_York http://science.nasa.gov/media/medialibrary/201...
3944 America/New_York http://gsaauctions.gov/gsaauctions/aucdsclnk?s...
3945 America/New_York http://inws.wrh.noaa.gov/weather/alertinfo/103...
3946 http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
3947 America/Denver http://doggett.house.gov/index.php/news/571-do...
3948 Asia/Tokyo http://www.nasa.gov/mission_pages/mer/news/mer...
3949 America/Los_Angeles http://fastlane.dot.gov/2013/05/new-locomotive...
3950 America/New_York http://studentaid.ed.gov/repay-loans/understan...
3951 America/New_York http://doggett.house.gov/index.php/news/571-do...
3952 America/New_York http://doggett.house.gov/index.php/news/571-do...
3953 NaN NaN
3954 America/Chicago http://inws.wrh.noaa.gov/weather/alertinfo/103...
3955 America/Chicago http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
3956 Asia/Tokyo http://www.doe.gov/articles/energy-department-...
3957 http://www.jpl.nasa.gov/news/news.php?release=...
3958 America/Puerto_Rico http://science.nasa.gov/media/medialibrary/201...
[3959 rows x 18 columns]
In [48]:
In [48]: frame['tz'][:10]
Out[48]:
0 America/Los_Angeles
1
2 America/Phoenix
3 America/Chicago
4
5 America/Indianapolis
6 America/Chicago
7
8 Australia/NSW
9
Name: tz, dtype: object
In [49]: tz_counts = frame['tz'].value_counts()
In [50]: tz_counts[:10]
Out[50]:
America/New_York 903
America/Chicago 686
636
America/Los_Angeles 421
America/Puerto_Rico 184
Asia/Tokyo 102
America/Denver 89
Europe/London 85
America/Indianapolis 50
America/Phoenix 40
dtype: int64
In [51]:
In [51]: clean_tz = frame['tz'].fillna("Missing")
In [52]: clean_tz[clean_tz == ""] = "Unknown"
In [53]: tz_counts = clean_tz.value_counts()
In [54]: tz_counts[:10]
Out[54]:
America/New_York 903
America/Chicago 686
Unknown 636
America/Los_Angeles 421
America/Puerto_Rico 184
Missing 120
Asia/Tokyo 102
America/Denver 89
Europe/London 85
America/Indianapolis 50
dtype: int64
In [55]: tz_counts[:10].plot(kind='barh',rot=0)
Out[55]: <matplotlib.axes._subplots.AxesSubplot at 0x10d503810>
http://gyazo.com/aeb47e0bfc1155ea5f68569fbaa861ad
In [56]: frame["a"]
Out[56]:
0 Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; H...
1 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...
2 Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20...
3 Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; S...
4 Opera/9.80 (Android; Opera Mini/7.5.33286/29.3...
5 Mozilla/5.0 (compatible; MSIE 10.0; Windows NT...
6 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
7 Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 li...
8 Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
9 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
10 Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
11 Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
12 Mozilla/5.0 (iPad; CPU OS 6_1_2 like Mac OS X)...
13 Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20...
14 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...
...
3944 Mozilla/5.0 (Linux; U; Android 2.3.4; en-us; D...
3945 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
3946 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...
3947 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
3948 Mozilla/5.0 (Linux; Android 4.0.4; SO-03D Buil...
3949 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) G...
3950 Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like ...
3951 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
3952 Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X)...
3953 NaN
3954 Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; M...
3955 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
3956 ShortLinkTranslate
3957 Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like ...
3958 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
Name: a, Length: 3959, dtype: object
In [57]: frame["a"][30]
Out[57]: u'Mozilla/5.0 (iPod; CPU iPhone OS 6_1_2 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B146 Safari/8536.25'
In [58]: results = Series([x.split()[0] for x in frame.a.dropna()])
In [59]: results[:5]
Out[59]:
0 Mozilla/5.0
1 Mozilla/4.0
2 Mozilla/5.0
3 Mozilla/5.0
4 Opera/9.80
dtype: object
In [61]: results.value_counts()[:8]
Out[61]:
Mozilla/5.0 3251
Mozilla/4.0 322
CakePHP 38
ShortLinkTranslate 36
TVersity 30
Opera/9.80 28
Dalvik/1.6.0 19
Xenu 15
dtype: int64
In [68]: frame
...
[3959 rows x 18 columns]
In [69]: cframe = frame[frame.a.notnull()]
In [70]: cframe
Out[70]:
...
[3839 rows x 18 columns]
In [71]: operating_system = np.where(cframe["a"].str.contains("Windows"), "Windows", "Not Windows")
In [72]: operating_system[:5]
Out[72]:
array(['Not Windows', 'Windows', 'Windows', 'Not Windows', 'Not Windows'],
dtype='|S11')
In [73]: by_tz_os = cframe.groupby(["tz", operating_system])
In [74]: by_tz_os.size()
Out[74]:
tz
Not Windows 484
Windows 152
Africa/Cairo Windows 3
Africa/Casablanca Windows 1
Africa/Ceuta Not Windows 4
Windows 2
Africa/Gaborone Windows 1
Africa/Johannesburg Not Windows 2
America/Anchorage Not Windows 5
Windows 3
America/Argentina/Buenos_Aires Not Windows 4
Windows 7
America/Argentina/Catamarca Not Windows 1
America/Argentina/Cordoba Windows 2
America/Asuncion Windows 1
...
Europe/Sofia Not Windows 1
Europe/Stockholm Not Windows 2
Windows 2
Europe/Tallinn Not Windows 1
Europe/Vienna Not Windows 3
Windows 3
Europe/Warsaw Not Windows 1
Windows 1
Europe/Zaporozhye Windows 1
Europe/Zurich Not Windows 4
Windows 1
Pacific/Auckland Not Windows 1
Windows 8
Pacific/Honolulu Not Windows 7
Windows 5
Length: 170, dtype: int64
In [75]: agg_counts = by_tz_os.size().unstack().fillna(0)
In [76]: agg_counts[:10]
Out[76]:
Not Windows Windows
tz
484 152
Africa/Cairo 0 3
Africa/Casablanca 0 1
Africa/Ceuta 4 2
Africa/Gaborone 0 1
Africa/Johannesburg 2 0
America/Anchorage 5 3
America/Argentina/Buenos_Aires 4 7
America/Argentina/Catamarca 1 0
America/Argentina/Cordoba 0 2
In [77]: indexer = agg_counts.sum(1).argsort()
In [78]: indexer[:10]
Out[78]:
tz
55
Africa/Cairo 101
Africa/Casablanca 100
Africa/Ceuta 36
Africa/Gaborone 97
Africa/Johannesburg 42
America/Anchorage 43
America/Argentina/Buenos_Aires 44
America/Argentina/Catamarca 47
America/Argentina/Cordoba 50
dtype: int64
In [79]: count_subset = agg_counts.take(indexer)[-10:]
In [80]: count_subset
Out[80]:
Not Windows Windows
tz
America/Phoenix 22 18
America/Indianapolis 29 21
Europe/London 62 23
America/Denver 41 48
Asia/Tokyo 88 14
America/Puerto_Rico 93 91
America/Los_Angeles 207 214
484 152
America/Chicago 343 343
America/New_York 550 353
In [81]: count_subset.plot(kind="barh", stacked=True)
Out[81]: <matplotlib.axes._subplots.AxesSubplot at 0x10d5095d0>
http://gyazo.com/6aa7c587338d9ca695967a05b5d4baaa
In [82]: normed_subset = count_subset.div(count_subset.sum(1), axis=0)
In [83]: normed_subset.plot(kind="barh", stacked=True)
Out[83]: <matplotlib.axes._subplots.AxesSubplot at 0x112774050>
http://gyazo.com/3035df33d844332cde38c8505d05174d
本の内容と違う。Not Windowsのほうが多い。