LoginSignup
2
3

More than 5 years have passed since last update.

Pythonによるデータ分析入門(Day1)

Last updated at Posted at 2015-02-15

Canopy Expressをインストール
https://store.enthought.com/downloads/
canopy-1.5.2-osx-64.dmg
300MBぐらい

ドラッグ・アンド・ドロップでインストールして、実行許可を与えたら初期設定が始まる。

デフォルトのPythonを変えてよい?と聞かれYes。

iTermでpythonと打ってもどうも変わっていない。

% which python
/usr/bin/python

.bash_profileに追加されている

# Added by Canopy installer on 2015-02-15
# VIRTUAL_ENV_DISABLE_PROMPT can be set to '' to make bashprompt show that Canopy is active, otherwise 1
VIRTUAL_ENV_DISABLE_PROMPT=1 source /Users/takeru/Library/Enthought/Canopy_64bit/User/bin/activate

zshだからだ。

% VIRTUAL_ENV_DISABLE_PROMPT=1 source /Users/takeru/Library/Enthought/Canopy_64bit/User/bin/activate
% which python
/Users/takeru/Library/Enthought/Canopy_64bit/User/bin/python

変わった。
単にそれを実行すればPATHを変えてくれるようだ。

なんか出た!

http://www.usa.gov/About/developer-resources/1usagov.shtml#How_to_Access_The_Data
http://1usagov.measuredvoice.com/2013/
なんか更新止まってるみたいだけど古いのは取れる。

20:29:59 tkrimac2:~/proj/pydata% gunzip usagov_bitly_data2013-05-17-1368832207.gz
20:30:03 tkrimac2:~/proj/pydata% ls
usagov_bitly_data2013-05-17-1368832207

20:30:23 tkrimac2:~/proj/pydata% ipython --pylab
Python 2.7.6 | 64-bit | (default, Sep 15 2014, 17:43:19)
Type "copyright", "credits" or "license" for more information.

IPython 2.3.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
Using matplotlib backend: MacOSX

In [1]: path="usagov_bitly_data2013-05-17-1368832207"

In [2]: import json

In [3]: records = [json.loads(line) for line in open(path)]

In [4]: records[0]
Out[4]:
{u'a': u'Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; HTC_PN071 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30',
 u'al': u'en-US',
 u'c': u'US',
 u'cy': u'Anaheim',
 u'g': u'15r91',
 u'gr': u'CA',
 u'h': u'10OBm3W',
 u'hc': 1365701422,
 u'hh': u'j.mp',
 u'l': u'pontifier',
 u'll': [33.816101, -117.979401],
 u'nk': 0,
 u'r': u'direct',
 u't': 1368832205,
 u'tz': u'America/Los_Angeles',
 u'u': u'http://www.nsa.gov/'}

In [5]: records[0]['tz']
Out[5]: u'America/Los_Angeles'

In [6]: print records[0]['tz']
America/Los_Angeles
In [11]: time_zones = [rec['tz'] for rec in records if 'tz' in rec]

In [12]: time_zones[:10]
Out[12]:
[u'America/Los_Angeles',
 u'',
 u'America/Phoenix',
 u'America/Chicago',
 u'',
 u'America/Indianapolis',
 u'America/Chicago',
 u'',
 u'Australia/NSW',
 u'']

tmp.pyを作って

#tmp.py
print "a"
In [18]: import tmp
a

(tmp.pyの"a"を"aaa"に書き換え)

In [19]: import tmp
(読み込まれない)

In [23]: reload(tmp)
aaa
In [27]: reload(tmp)
Out[27]: <module 'tmp' from 'tmp.py'>

In [28]: tmp.get_counts(time_zones)
Out[28]:
{u'': 636,
 u'Africa/Cairo': 3,
 u'Africa/Casablanca': 1,
 u'Africa/Ceuta': 6,
 u'Africa/Gaborone': 1,
 u'Africa/Johannesburg': 2,
 u'America/Anchorage': 8,
 u'America/Argentina/Buenos_Aires': 11,
...
In [29]: reload(tmp)
Out[29]: <module 'tmp' from 'tmp.py'>

In [30]: tmp.get_counts2(time_zones)
Out[30]: defaultdict(<type 'int'>, {u'': 636, u'Europe/Lisbon': 8, u'America/Bogota': 16, u'America/Edmonton': 9, u'Australia/Tasmania': 1, u'Europe/Tallinn': 1, u'Asia/Calcutta': 6, u'Australia/South': 4, u'Europe/Skopje': 1, u'Europe/Copenhagen': 4, u'America/St_Lucia': 1, u'Europe/Amsterdam': 15, u'Europe/Zaporozhye': 1, u'America/Phoenix': 40, u'Europe/Moscow': 35, u'America/El_Salvador': 2, u'Europe/Madrid': 21,
In [31]: counts = tmp.get_counts(time_zones)

In [32]: counts["Asia/Tokyo"]
Out[32]: 102

In [33]: counts2 = tmp.get_counts2(time_zones)

In [34]: counts["Asia/Tokyo"]
Out[34]: 102
In [37]: tmp.top_counts(counts)
Out[37]:
[(40, u'America/Phoenix'),
 (50, u'America/Indianapolis'),
 (85, u'Europe/London'),
 (89, u'America/Denver'),
 (102, u'Asia/Tokyo'),
 (184, u'America/Puerto_Rico'),
 (421, u'America/Los_Angeles'),
 (636, u''),
 (686, u'America/Chicago'),
 (903, u'America/New_York')]
#tmp.py

def get_counts(sequence):
    counts = {}
    for x in sequence:
        if x in counts:
            counts[x] += 1
        else:
            counts[x] = 1
    return counts



from collections import defaultdict

def get_counts2(sequence):
    counts = defaultdict(int)
    for x in sequence:
        counts[x] += 1
    return counts


def top_counts(count_dict, n=10):
    value_key_pairs = [(count,tz) for tz,count in count_dict.items()]
    value_key_pairs.sort()
    return value_key_pairs[-n:]


from collections import Counter

In [41]: counts3 = tmp.Counter(time_zones)

In [42]: counts3
Out[42]: Counter({u'America/New_York': 903, u'America/Chicago': 686, u'': 636, u'America/Los_Angeles': 421, u'America/Puerto_Rico': 184, u'Asia/Tokyo': 102, u'America/Denver': 89, u'Europe/London': 85, u'America/Indianapolis': 50, u'America/Phoenix': 40, u'Europe/Moscow': 35, u'America/Rainy_River': 33, u'Australia/NSW': 32, u'America/Sao_Paulo': 29, u'Europe/Paris': 27, u'Europe/Berlin': 24, u'America/Vancouver': 23,
In [43]: counts3.most_common(10)
Out[43]:
[(u'America/New_York', 903),
 (u'America/Chicago', 686),
 (u'', 636),
 (u'America/Los_Angeles', 421),
 (u'America/Puerto_Rico', 184),
 (u'Asia/Tokyo', 102),
 (u'America/Denver', 89),
 (u'Europe/London', 85),
 (u'America/Indianapolis', 50),
 (u'America/Phoenix', 40)]
In [44]: from pandas import DataFrame,Series

In [45]: import pandas as pd; import numpy as np

In [46]: frame = DataFrame(records)

In [47]: frame
Out[47]:
      _heartbeat_                                                  a  \
0             NaN  Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; H...
1             NaN  Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...
2             NaN  Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20...
3             NaN  Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; S...
4             NaN  Opera/9.80 (Android; Opera Mini/7.5.33286/29.3...
5             NaN  Mozilla/5.0 (compatible; MSIE 10.0; Windows NT...
6             NaN  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
7             NaN  Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 li...
8             NaN  Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
9             NaN  Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
10            NaN  Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
11            NaN  Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
12            NaN  Mozilla/5.0 (iPad; CPU OS 6_1_2 like Mac OS X)...
13            NaN  Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20...
14            NaN  Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...
15            NaN  Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; r...
16            NaN  Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
17            NaN  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
18            NaN  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
19            NaN  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
20            NaN  Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
21            NaN  Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
22            NaN  Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
23            NaN  Mozilla/5.0 (Linux; U; Android 2.3.7; en-us; L...
24            NaN  Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3...
25            NaN  Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
26            NaN  Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
27            NaN  Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
28            NaN  Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
29            NaN  Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
...           ...                                                ...
3929          NaN  Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
3930          NaN  Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; r...
3931          NaN  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
3932          NaN  Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.3...
3933          NaN  Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
3934          NaN  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
3935          NaN  Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
3936          NaN  Mozilla/5.0 (compatible; Genieo/1.0 http://www...
3937          NaN  Mozilla/5.0 (Linux; U; Android 4.0.3; en-gb; H...
3938          NaN  Mozilla/5.0 (Linux; U; Android 4.1.2; es-es; G...
3939          NaN                                 ShortLinkTranslate
3940          NaN  Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; L...
3941          NaN  Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X)...
3942          NaN  Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X)...
3943          NaN  Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
3944          NaN  Mozilla/5.0 (Linux; U; Android 2.3.4; en-us; D...
3945          NaN  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
3946          NaN  Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...
3947          NaN  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
3948          NaN  Mozilla/5.0 (Linux; Android 4.0.4; SO-03D Buil...
3949          NaN  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) G...
3950          NaN  Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like ...
3951          NaN  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
3952          NaN  Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X)...
3953   1368835801                                                NaN
3954          NaN  Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; M...
3955          NaN  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
3956          NaN                                 ShortLinkTranslate
3957          NaN  Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like ...
3958          NaN  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...

                                       al     c                   cy        g  \
0                                   en-US    US              Anaheim    15r91
1                                   en-us  None                  NaN   ifIpBW
2                          en-US,en;q=0.5    US        Fort Huachuca  10DaxOu
3                                   en-US    US              Houston   TysVFU
4                                      en  None                  NaN  10IGW7m
5                                   en-US    US            Mishawaka  13GrCeP
6                          en-US,en;q=0.5    US              Hammond   YmtpnZ
7                                   en-us  None                  NaN  13oM0hV
8                                   en-us    AU               Sydney    15r91
9                          en-US,en;q=0.8  None                  NaN  109LtDc
10                                  en-us    US           Middletown  109ar5F
11                                  en-us    US           Germantown  107xZnW
12                                  en-us    US             Richmond  19AcekS
13                         en-US,en;q=0.5    US             Portland  16mY628
14                                  en-us    US               Aurora   YRyW8K
15                         en-US,en;q=0.5    US              Houston  18NUp44
16                         en-US,en;q=0.8    US              Muskego   YmtpnZ
17                                  en-us    US               Arvada   ZPictr
18                                  en-us    US                 Bend  11C6yJk
19                         en-US,en;q=0.5    US               Laurel  15RP5hF
20                                  en-us    US              Seattle  12yP2Cx
21                         en-US,en;q=0.8  None                  NaN  109ar5F
22                         en-US,en;q=0.8    US               Durand   YmtpnZ
23                         en-us,en;q=0.9  None                  NaN  107xZnW
24                                    NaN  None                  NaN   YmtpnZ
25                         en-us,en;q=0.5  None                  NaN   ifIpBW
26                         en-us,en;q=0.5  None                  NaN   ifIpBW
27                         en-us,en;q=0.5  None                  NaN   ifIpBW
28                         en-us,en;q=0.5  None                  NaN   ifIpBW
29                         en-us,en;q=0.5  None                  NaN   ifIpBW
...                                   ...   ...                  ...      ...
3929              ja,en-US;q=0.8,en;q=0.6  None                  NaN  10Kc32m
3930  fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3    CH             Chambesy  14bmsHn
3931                                en-us    US            Rockville  14bmsHn
3932                       en-US,en;q=0.8  None                  NaN    15r91
3933                                en-us    US            Ann Arbor  1084Psg
3934                       en-US,en;q=0.5    US          Thomasville    15r91
3935                       en-US,en;q=0.8    US  Coronado Ntl Forest  186NWQK
3936                                  NaN    US      Manhattan Beach   YYv1XQ
3937                         en-GB, en-US  None                  NaN  12AyUk2
3938                         es-ES, en-US  None                  NaN  12AyUk2
3939                                  NaN    JP              Kashiwa   YPnFn4
3940                                en-US  None                  NaN    15r91
3941                                en-us    US           Marshfield   YmtpnZ
3942                       en-US,en;q=0.8    US               Vaughn  16uqtLe
3943                                en-US    US              Orlando  10WMBv9
3944                                en-US    US             Lakewood  14bmsHn
3945                       en-US,en;q=0.5    US                Boone  10X5IW8
3946                       en-US,en;q=0.8  None                  NaN   YmtpnZ
3947                       en-US,en;q=0.8    US                Logan  107xZnW
3948              ja,en-US;q=0.8,en;q=0.6    JP                Tokyo  15TFyGK
3949                       en-gb,en;q=0.5    US        Castro Valley  11C6yJk
3950                                en-us    US         Fayetteville  10ydrrV
3951                                en-us    US                Salem  107xZnW
3952                                en-us    US              Grayson  107xZnW
3953                                  NaN   NaN                  NaN      NaN
3954                                en-US    US               Mobile  10WWSaR
3955                                en-us    US           Brookfield   YmtpnZ
3956                                  NaN    JP              Tsukuba   YPnFn4
3957                                en-us  None                  NaN  17B6VoC
3958                                en-us    PR             Guaynabo  10WMBv9

       gr        h          hc           hh   kw             l  \
0      CA  10OBm3W  1365701422         j.mp  NaN     pontifier
1     NaN   ifIpBW  1302189369    1.usa.gov  NaN         bitly
2      AZ  10DaxOt  1368814585    1.usa.gov  NaN     jaxstrong
3      TX   TChsoQ  1354719206    1.usa.gov  NaN  o_5004fs3lvd
4     NaN  10IGW7l  1368738258    1.usa.gov  NaN    peacecorps
5      IN  13GrCeP  1368130510    1.usa.gov  NaN         bitly
6      WI   YmtpnZ  1363711958    1.usa.gov  NaN         bitly
7     NaN  15PUeH0  1368714329  go.nasa.gov  NaN   nasatwitter
8      02  10OBm3W  1365701422         j.mp  NaN     pontifier
9     NaN  109LtDb  1368821840  go.nasa.gov  NaN   nasatwitter
10     OH  109ar5E  1368803813    1.usa.gov  NaN    usairforce
11     MD  107xZnW  1368815450    1.usa.gov  NaN         bitly
12     KY  19AcekR  1368738410    1.usa.gov  NaN    peacecorps
13     OR  16mY627  1368743779    1.usa.gov  NaN       pbierce
14     IL   YRyW8K  1368475960    1.usa.gov  NaN         bitly
15     TX  18NUoNR  1368727073    1.usa.gov  NaN  o_1fs5ea3lim
16     WI   YmtpnZ  1363711958    1.usa.gov  NaN         bitly
17     CO   ZPictq  1366901428    1.usa.gov  NaN   o_d63rn9enb
18     OR  19oVtZN  1368558078    1.usa.gov  NaN     raylahood
19     MD  16Ewvc4  1368818301    1.usa.gov  NaN       rebroth
20     WA  12yP2Cw  1368741846    1.usa.gov  NaN  o_6vo5h05abv
21    NaN  109ar5E  1368803813    1.usa.gov  NaN    usairforce
22     WI   YmtpnZ  1363711958    1.usa.gov  NaN         bitly
23    NaN  107xZnW  1368815450    1.usa.gov  NaN         bitly
24    NaN   YmtpnZ  1363711958    1.usa.gov  NaN         bitly
25    NaN   ifIpBW  1302189369    1.usa.gov  NaN         bitly
26    NaN   ifIpBW  1302189369    1.usa.gov  NaN         bitly
27    NaN   ifIpBW  1302189369    1.usa.gov  NaN         bitly
28    NaN   ifIpBW  1302189369    1.usa.gov  NaN         bitly
29    NaN   ifIpBW  1302189369    1.usa.gov  NaN         bitly
...   ...      ...         ...          ...  ...           ...
3929  NaN  10Kc32l  1368809020  go.nasa.gov  NaN   nasatwitter
3930   07  14bmsHn  1368223558    1.usa.gov  NaN         bitly
3931   MD  14bmsHn  1368223558    1.usa.gov  NaN         bitly
3932  NaN  10OBm3W  1365701422         j.mp  NaN     pontifier
3933   MI  1084Psg  1368755511         j.mp  NaN         bitly
3934   NC  10OBm3W  1365701422         j.mp  NaN     pontifier
3935   AZ  186NWQK  1368828881    1.usa.gov  NaN         bitly
3936   CA   YYv1XQ  1368710703    1.usa.gov  NaN         bitly
3937  NaN  12AyUk1  1368808362  go.nasa.gov  NaN   nasatwitter
3938  NaN  12AyUk1  1368808362  go.nasa.gov  NaN   nasatwitter
3939   04   YPnFn3  1368833354    1.usa.gov  NaN        hayano
3940  NaN  10OBm3W  1365701422         j.mp  NaN     pontifier
3941   WI   YmtpnZ  1363711958    1.usa.gov  NaN         bitly
3942   WA  16uqtLd  1368455001    1.usa.gov  NaN  o_33avl0ri1b
3943   FL  10WMBv9  1368826755    1.usa.gov  NaN         bitly
3944   OH  14bmsHn  1368223558    1.usa.gov  NaN         bitly
3945   NC  10X5IW7  1368834590    1.usa.gov  NaN          inws
3946  NaN   YmtpnZ  1363711958    1.usa.gov  NaN         bitly
3947   UT  107xZnW  1368815450    1.usa.gov  NaN         bitly
3948   40  15TFyGJ  1368810402  go.nasa.gov  NaN   nasatwitter
3949   CA  19oVtZN  1368558078    1.usa.gov  NaN     raylahood
3950   GA  10ydrrU  1368806410    1.usa.gov  NaN   fsanewmedia
3951   VA  107xZnW  1368815450    1.usa.gov  NaN         bitly
3952   KY  107xZnW  1368815450    1.usa.gov  NaN         bitly
3953  NaN      NaN         NaN          NaN  NaN           NaN
3954   AL  10WWSaQ  1368830835    1.usa.gov  NaN          inws
3955   WI   YmtpnZ  1363711958    1.usa.gov  NaN         bitly
3956   14   YPnFn3  1368833354    1.usa.gov  NaN        hayano
3957  NaN  16n91ZK  1368746683  go.nasa.gov  NaN   nasatwitter
3958   00  10WMBv9  1368826755    1.usa.gov  NaN         bitly

                            ll  nk  \
0     [33.816101, -117.979401]   0
1                          NaN   0
2       [31.5273, -110.360703]   1
3        [29.7633, -95.363297]   1
4                          NaN   0
5        [41.612301, -86.1381]   0
6         [45.007, -92.459099]   1
7                          NaN   0
8       [-33.8615, 151.205505]   0
9                          NaN   0
10       [39.515099, -84.3983]   1
11     [39.131699, -77.288002]   0
12     [37.766602, -84.303101]   1
13    [45.529499, -122.643204]   1
14     [41.760601, -88.320099]   0
15       [29.7633, -95.363297]   0
16     [42.877602, -88.133797]   0
17    [39.802799, -105.087502]   1
18    [44.074402, -121.257401]   1
19     [39.135799, -76.872002]   0
20      [47.606201, -122.3321]   1
21                         NaN   0
22     [44.590698, -91.891197]   0
23                         NaN   0
24                         NaN   0
25                         NaN   0
26                         NaN   0
27                         NaN   0
28                         NaN   0
29                         NaN   0
...                        ...  ..
3929                       NaN   1
3930       [46.242401, 6.1435]   0
3931   [39.089199, -77.183502]   1
3932                       NaN   0
3933   [42.216702, -83.740601]   1
3934   [35.882599, -80.082001]   0
3935    [31.9582, -110.693001]   0
3936  [33.889301, -118.401001]   0
3937                       NaN   1
3938                       NaN   1
3939   [35.854401, 139.968903]   0
3940                       NaN   0
3941     [44.6688, -90.171799]   0
3942  [47.314499, -122.778503]   0
3943       [28.3899, -81.4366]   0
3944   [41.481701, -81.802399]   0
3945   [36.219101, -81.656303]   0
3946                       NaN   1
3947  [41.641201, -111.896599]   0
3948   [35.685001, 139.751404]   1
3949     [37.709, -122.088501]   0
3950   [33.481098, -84.479797]   0
3951     [37.2906, -80.101402]   0
3952   [38.336399, -82.992401]   0
3953                       NaN NaN
3954     [30.657499, -88.1586]   1
3955     [43.060799, -88.1558]   0
3956   [36.083302, 140.116699]   0
3957                       NaN   0
3958     [18.3876, -66.110802]   1

                                                      r           t  \
0                                                direct  1368832205
1                                   http://www.usa.gov/  1368832207
2     http://www.facebook.com/l.php?u=http%3A%2F%2F1...  1368832209
3     http://m.facebook.com/l.php?u=http%3A%2F%2F1.u...  1368832209
4                                http://t.co/CDO9hLTtNT  1368832208
5                                                direct  1368832209
6         http://www.bwsd.k12.wi.us/SitePages/Home.aspx  1368832210
7                                http://t.co/YIsVhFDLj2  1368832211
8                                                direct  1368832213
9                                http://t.co/yPSKO2t5v1  1368832215
10                               https://m.facebook.com  1368832215
11                               http://t.co/u8qVCKx8RK  1368832218
12    http://www.facebook.com/l.php?u=http%3A%2F%2F1...  1368832219
13                               http://t.co/T8EyBbUBJ8  1368832219
14    http://www.z2systems.com/np/clients/kca/news.j...  1368832219
15                               http://t.co/s307mx2qGk  1368832220
16                         http://www.cudahy.k12.wi.us/  1368832220
17                                               direct  1368832220
18                                               direct  1368832222
19                               http://t.co/Dv6Jqbwu8H  1368832222
20                               http://t.co/7K9urpYyc6  1368832224
21                            https://www.facebook.com/  1368832225
22                           http://www.alma.k12.wi.us/  1368832224
23    http://m.facebook.com/l.php?u=http%3A%2F%2F1.u...  1368832224
24                         http://www.wabeno.k12.wi.us/  1368832226
25                          http://addthis.com/hemmings  1368832226
26                          http://addthis.com/hemmings  1368832226
27                          http://addthis.com/hemmings  1368832227
28                          http://addthis.com/hemmings  1368832227
29                          http://addthis.com/hemmings  1368832227
...                                                 ...         ...
3929                             http://t.co/HgiLLFRDtE  1368835778
3930                                             direct  1368835778
3931                                             direct  1368835778
3932                                             direct  1368835780
3933                             http://t.co/orOTdRX5aF  1368835780
3934                                             direct  1368835782
3935                          https://www.facebook.com/  1368835783
3936                                             direct  1368835785
3937                                             direct  1368835786
3938                             http://t.co/q6402O6lFC  1368835786
3939                                             direct  1368835787
3940                                             direct  1368835789
3941                        http://www.colby.k12.wi.us/  1368835791
3942                                 http://fwp.mt.gov/  1368835793
3943  http://www.elnuevodia.com/brillanteexplosionen...  1368835794
3944                                             direct  1368835794
3945  http://www.facebook.com/l.php?u=http%3A%2F%2F1...  1368835793
3946                       http://www.cudahy.k12.wi.us/  1368835795
3947                          https://www.facebook.com/  1368835795
3948                                             direct  1368835795
3949                 http://www.cahighspeedrail.ca.gov/  1368835797
3950                             http://t.co/psBn8njvIB  1368835797
3951  http://www.facebook.com/l.php?u=http%3A%2F%2F1...  1368835798
3952  http://www.facebook.com/l.php?u=http%3A%2F%2F1...  1368835801
3953                                                NaN         NaN
3954  http://m.facebook.com/l.php?u=http%3A%2F%2F1.u...  1368835802
3955                       http://www.cudahy.k12.wi.us/  1368835803
3956                                             direct  1368835804
3957                             http://t.co/XLS75r3BCB  1368835805
3958  http://www.elnuevodia.com/brillanteexplosionen...  1368835806

                        tz                                                  u
0      America/Los_Angeles                                http://www.nsa.gov/
1                           http://answers.usa.gov/system/selfservice.cont...
2          America/Phoenix  http://www.saj.usace.army.mil/Media/NewsReleas...
3          America/Chicago            https://nationalregistry.fmcsa.dot.gov/
4                           http://www.peacecorps.gov/learn/howvol/ab530gr...
5     America/Indianapolis  https://petitions.whitehouse.gov/petition/repe...
6          America/Chicago  http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
7                           http://www.nasa.gov/multimedia/imagegallery/im...
8            Australia/NSW                                http://www.nsa.gov/
9                           http://www.nasa.gov/mission_pages/sunearth/new...
10        America/New_York  http://www.dodlive.mil/index.php/2013/05/the-2...
11        America/New_York  http://doggett.house.gov/index.php/news/571-do...
12        America/New_York  http://www.peacecorps.gov/learn/howvol/ab530gr...
13     America/Los_Angeles   http://www.fws.gov/cno/press/release.cfm?rid=493
14         America/Chicago  http://www.cancer.gov/PublishedContent/Images/...
15         America/Chicago                http://www.army.mil/article/103380/
16         America/Chicago  http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
17          America/Denver  http://www.nws.noaa.gov/com/weatherreadynation...
18     America/Los_Angeles  http://fastlane.dot.gov/2013/05/new-locomotive...
19        America/New_York            http://apod.nasa.gov/apod/ap130517.html
20     America/Los_Angeles  http://www.ice.gov/news/releases/1305/130516sa...
21                          http://www.dodlive.mil/index.php/2013/05/the-2...
22         America/Chicago  http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
23                          http://doggett.house.gov/index.php/news/571-do...
24                          http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
25                          http://answers.usa.gov/system/selfservice.cont...
26                          http://answers.usa.gov/system/selfservice.cont...
27                          http://answers.usa.gov/system/selfservice.cont...
28                          http://answers.usa.gov/system/selfservice.cont...
29                          http://answers.usa.gov/system/selfservice.cont...
...                    ...                                                ...
3929                        http://www.nasa.gov/mission_pages/station/expe...
3930         Europe/Zurich  http://gsaauctions.gov/gsaauctions/aucdsclnk?s...
3931      America/New_York  http://gsaauctions.gov/gsaauctions/aucdsclnk?s...
3932                                                      http://www.nsa.gov/
3933      America/New_York  http://science.nasa.gov/science-news/science-a...
3934      America/New_York                                http://www.nsa.gov/
3935       America/Phoenix  http://cms3.tucsonaz.gov/files/police/media-re...
3936   America/Los_Angeles  http://www.irs.gov/uac/Newsroom/Tax-Relief-for...
3937                        http://www.jpl.nasa.gov/news/news.php?release=...
3938                        http://www.jpl.nasa.gov/news/news.php?release=...
3939            Asia/Tokyo  http://www.doe.gov/articles/energy-department-...
3940                                                      http://www.nsa.gov/
3941       America/Chicago  http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
3942   America/Los_Angeles  http://fwp.mt.gov/hunting/hunterAccess/openFie...
3943      America/New_York  http://science.nasa.gov/media/medialibrary/201...
3944      America/New_York  http://gsaauctions.gov/gsaauctions/aucdsclnk?s...
3945      America/New_York  http://inws.wrh.noaa.gov/weather/alertinfo/103...
3946                        http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
3947        America/Denver  http://doggett.house.gov/index.php/news/571-do...
3948            Asia/Tokyo  http://www.nasa.gov/mission_pages/mer/news/mer...
3949   America/Los_Angeles  http://fastlane.dot.gov/2013/05/new-locomotive...
3950      America/New_York  http://studentaid.ed.gov/repay-loans/understan...
3951      America/New_York  http://doggett.house.gov/index.php/news/571-do...
3952      America/New_York  http://doggett.house.gov/index.php/news/571-do...
3953                   NaN                                                NaN
3954       America/Chicago  http://inws.wrh.noaa.gov/weather/alertinfo/103...
3955       America/Chicago  http://pld.dpi.wi.gov/files/pld/images/LinkWI.png
3956            Asia/Tokyo  http://www.doe.gov/articles/energy-department-...
3957                        http://www.jpl.nasa.gov/news/news.php?release=...
3958   America/Puerto_Rico  http://science.nasa.gov/media/medialibrary/201...

[3959 rows x 18 columns]

In [48]:
In [48]: frame['tz'][:10]
Out[48]:
0     America/Los_Angeles
1
2         America/Phoenix
3         America/Chicago
4
5    America/Indianapolis
6         America/Chicago
7
8           Australia/NSW
9
Name: tz, dtype: object

In [49]: tz_counts = frame['tz'].value_counts()

In [50]: tz_counts[:10]
Out[50]:
America/New_York        903
America/Chicago         686
                        636
America/Los_Angeles     421
America/Puerto_Rico     184
Asia/Tokyo              102
America/Denver           89
Europe/London            85
America/Indianapolis     50
America/Phoenix          40
dtype: int64

In [51]:
In [51]: clean_tz = frame['tz'].fillna("Missing")

In [52]: clean_tz[clean_tz == ""] = "Unknown"

In [53]: tz_counts = clean_tz.value_counts()

In [54]: tz_counts[:10]
Out[54]:
America/New_York        903
America/Chicago         686
Unknown                 636
America/Los_Angeles     421
America/Puerto_Rico     184
Missing                 120
Asia/Tokyo              102
America/Denver           89
Europe/London            85
America/Indianapolis     50
dtype: int64
In [55]: tz_counts[:10].plot(kind='barh',rot=0)
Out[55]: <matplotlib.axes._subplots.AxesSubplot at 0x10d503810>

http://gyazo.com/aeb47e0bfc1155ea5f68569fbaa861ad

In [56]: frame["a"]
Out[56]:
0     Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; H...
1     Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...
2     Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20...
3     Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; S...
4     Opera/9.80 (Android; Opera Mini/7.5.33286/29.3...
5     Mozilla/5.0 (compatible; MSIE 10.0; Windows NT...
6     Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
7     Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 li...
8     Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
9     Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
10    Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
11    Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...
12    Mozilla/5.0 (iPad; CPU OS 6_1_2 like Mac OS X)...
13    Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20...
14    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...
...
3944    Mozilla/5.0 (Linux; U; Android 2.3.4; en-us; D...
3945    Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...
3946    Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...
3947    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
3948    Mozilla/5.0 (Linux; Android 4.0.4; SO-03D Buil...
3949    Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) G...
3950    Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like ...
3951    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
3952    Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X)...
3953                                                  NaN
3954    Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; M...
3955    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)...
3956                                   ShortLinkTranslate
3957    Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like ...
3958    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
Name: a, Length: 3959, dtype: object

In [57]: frame["a"][30]
Out[57]: u'Mozilla/5.0 (iPod; CPU iPhone OS 6_1_2 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B146 Safari/8536.25'

In [58]: results = Series([x.split()[0] for x in frame.a.dropna()])

In [59]: results[:5]
Out[59]:
0    Mozilla/5.0
1    Mozilla/4.0
2    Mozilla/5.0
3    Mozilla/5.0
4     Opera/9.80
dtype: object



In [61]: results.value_counts()[:8]
Out[61]:
Mozilla/5.0           3251
Mozilla/4.0            322
CakePHP                 38
ShortLinkTranslate      36
TVersity                30
Opera/9.80              28
Dalvik/1.6.0            19
Xenu                    15
dtype: int64
In [68]: frame
...
[3959 rows x 18 columns]

In [69]: cframe = frame[frame.a.notnull()]

In [70]: cframe
Out[70]:

...

[3839 rows x 18 columns]
In [71]: operating_system = np.where(cframe["a"].str.contains("Windows"), "Windows", "Not Windows")

In [72]: operating_system[:5]
Out[72]:
array(['Not Windows', 'Windows', 'Windows', 'Not Windows', 'Not Windows'],
      dtype='|S11')

In [73]: by_tz_os = cframe.groupby(["tz", operating_system])

In [74]: by_tz_os.size()
Out[74]:
tz
                                Not Windows    484
                                Windows        152
Africa/Cairo                    Windows          3
Africa/Casablanca               Windows          1
Africa/Ceuta                    Not Windows      4
                                Windows          2
Africa/Gaborone                 Windows          1
Africa/Johannesburg             Not Windows      2
America/Anchorage               Not Windows      5
                                Windows          3
America/Argentina/Buenos_Aires  Not Windows      4
                                Windows          7
America/Argentina/Catamarca     Not Windows      1
America/Argentina/Cordoba       Windows          2
America/Asuncion                Windows          1
...
Europe/Sofia       Not Windows    1
Europe/Stockholm   Not Windows    2
                   Windows        2
Europe/Tallinn     Not Windows    1
Europe/Vienna      Not Windows    3
                   Windows        3
Europe/Warsaw      Not Windows    1
                   Windows        1
Europe/Zaporozhye  Windows        1
Europe/Zurich      Not Windows    4
                   Windows        1
Pacific/Auckland   Not Windows    1
                   Windows        8
Pacific/Honolulu   Not Windows    7
                   Windows        5
Length: 170, dtype: int64

In [75]: agg_counts = by_tz_os.size().unstack().fillna(0)

In [76]: agg_counts[:10]
Out[76]:
                                Not Windows  Windows
tz
                                        484      152
Africa/Cairo                              0        3
Africa/Casablanca                         0        1
Africa/Ceuta                              4        2
Africa/Gaborone                           0        1
Africa/Johannesburg                       2        0
America/Anchorage                         5        3
America/Argentina/Buenos_Aires            4        7
America/Argentina/Catamarca               1        0
America/Argentina/Cordoba                 0        2


In [77]: indexer = agg_counts.sum(1).argsort()

In [78]: indexer[:10]
Out[78]:
tz
                                   55
Africa/Cairo                      101
Africa/Casablanca                 100
Africa/Ceuta                       36
Africa/Gaborone                    97
Africa/Johannesburg                42
America/Anchorage                  43
America/Argentina/Buenos_Aires     44
America/Argentina/Catamarca        47
America/Argentina/Cordoba          50
dtype: int64


In [79]: count_subset = agg_counts.take(indexer)[-10:]

In [80]: count_subset
Out[80]:
                      Not Windows  Windows
tz
America/Phoenix                22       18
America/Indianapolis           29       21
Europe/London                  62       23
America/Denver                 41       48
Asia/Tokyo                     88       14
America/Puerto_Rico            93       91
America/Los_Angeles           207      214
                              484      152
America/Chicago               343      343
America/New_York              550      353
In [81]: count_subset.plot(kind="barh", stacked=True)
Out[81]: <matplotlib.axes._subplots.AxesSubplot at 0x10d5095d0>

http://gyazo.com/6aa7c587338d9ca695967a05b5d4baaa

In [82]: normed_subset = count_subset.div(count_subset.sum(1), axis=0)

In [83]: normed_subset.plot(kind="barh", stacked=True)
Out[83]: <matplotlib.axes._subplots.AxesSubplot at 0x112774050>

http://gyazo.com/3035df33d844332cde38c8505d05174d

本の内容と違う。Not Windowsのほうが多い。

2
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
3