Appleが下記ページで配布しているCore MLモデル(.mlmodel
)6種を比較してみました。
Machine Learning - Apple Developer
ちなみにInputs, Outputsの欄は「name
/ type : description」というフォーマットで書いています。(.mlmodel
ファイルをXcodeプロジェクトに突っ込んで調べました)
MobileNet
MobileNets are based on a streamlined architecture that have depth-wise separable convolutions to build lightweight, deep neural networks.
Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more.
- サイズ: 17.1 MB
- ライセンス: Apache License. Version 2.0 http://www.apache.org/licenses/LICENSE-2.0
- Inputs
-
image
/ Image (Color 224 x 224) - Input image to be classified
-
- Outputs
-
classLabelProbs
/ Dictionary (String → Double) - Probability of each category -
classLabel
/ String - Most likely image category
-
SqueezeNet
Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more.
With an overall footprint of only 5 MB, SqueezeNet has a similar level of accuracy as AlexNet but with 50 times fewer parameters.
- サイズ: 5 MB
- License: BSD License. More information available at https://github.com/DeepScale/SqueezeNet/blob/master/LICENSE
- Inputs
-
image
/ Image (Color 227 x 227) - Input image to be classified
-
- Outputs
-
classLabelProbs
/ Dictionary (String → Double) - Probability of each category -
classLabel
/ String - Most likely image category
Places205-GoogLeNet
Detects the scene of an image from 205 categories such as an airport terminal, bedroom, forest, coast, and more.
- サイズ: 24.8 MB
- ライセンス: Creative Common License. More information available at http://places.csail.mit.edu
- Inputs
-
sceneImage
/ Image (Color 224 x 224) - Input image of scene to be classified
-
- Outputs
-
sceneLabelProbs
/ Dictionary (String → Double) - Probability of each scene -
sceneLabel
/ String - Most likely scene label
-
ResNet50
Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more.
The top-5 error from the original publication is 7.8%.
- サイズ: 102.6 MB
- ライセンス: MIT License. More information available at https://github.com/fchollet/keras/blob/master/LICENSE
- Inputs
-
image
/ Image (Color 224 x 224) - Input image to be classified
-
- Outputs
-
classLabelProbs
/ Dictionary (String → Double) - Probability of each category -
classLabel
/ String - Most likely image category
-
Inception v3
Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more.
The top-5 error from the original publication is 5.6%.
- サイズ: 94.7 MB
- ライセンス: MIT License. More information available at https://github.com/fchollet/keras/blob/master/LICENSE
- Inputs
-
image
/ Image (Color 299 x 299) - Input image to be classified
-
- Outputs
-
classLabelProbs
/ Dictionary (String → Double) - Probability of each category -
classLabel
/ String - Most likely image category
-
VGG16
Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more.
The top-5 error from the original publication is 7.4%.
- サイズ: 553.5 MB
- ライセンス: Creative Commons Attribution 4.0 International(CC BY 4.0). More information available at https://creativecommons.org/licenses/by/4.0/
- Inputs
-
image
/ Image (Color 224 x 224) - Input image to be classified
-
- Outputs
-
classLabelProbs
/ Dictionary (String → Double) - Probability of each category -
classLabel
/ String - Most likely image category
-
うち5種類はimagenetの1000クラス分類
なんとなく「さまざまな種類のモデルが配布されている」と思ってましたが、こうしてまじめに見てみると、なんと6種類のうち実に5種類はimagenetの1000クラスの一般物体(木、動物、食べ物、乗り物、人、etc...)を分類するモデルでした。
これらの5種類の違いとなるモデルサイズ、入力画像サイズ、ライセンスについて一覧にしてみます。
モデル名 | モデルサイズ | 入力画像サイズ | ライセンス |
---|---|---|---|
MobileNet | 17.1 MB | 224 x 224 | Apache 2.0 |
SqueezeNet | 5 MB | 227 x 227 | BSD |
ResNet50 | 102.6 MB | 224 x 224 | MIT |
Inception v3 | 94.7 MB | 299 x 299 | Apache 2.0 |
VGG16 | 553.5 MB | 224 x 224 | CC 4.0 |
シーン分類を行う「Places205-GoogLeNet」
唯一他と毛色の違うモデルが「Places205-GoogLeNet」(モデルファイル名GoogLeNetPlaces
)です。入力画像から空港、駅、森、等々205種類の「シーン」を検出します。
ラベルファイルは下記URLにあります。
どんなシーンが認識できるのかパッとわかるように、ここに一覧を載せておきます。
abbey
airport_terminal
alley
amphitheater
amusement_park
aquarium
aqueduct
arch
art_gallery
art_studio
assembly_line
attic
auditorium
apartment_building/outdoor
badlands
ballroom
bamboo_forest
banquet_hall
bar
baseball_field
basement
basilica
bayou
beauty_salon
bedroom
boardwalk
boat_deck
bookstore
botanical_garden
bowling_alley
boxing_ring
bridge
building_facade
bus_interior
butchers_shop
butte
bakery/shop
cafeteria
campsite
candy_store
canyon
castle
cemetery
chalet
classroom
closet
clothing_store
coast
cockpit
coffee_shop
conference_center
conference_room
construction_site
corn_field
corridor
cottage_garden
courthouse
courtyard
creek
crevasse
crosswalk
cathedral/outdoor
church/outdoor
dam
dining_room
dock
dorm_room
driveway
desert/sand
desert/vegetation
dinette/home
doorway/outdoor
engine_room
excavation
fairway
fire_escape
fire_station
food_court
forest_path
forest_road
formal_garden
fountain
field/cultivated
field/wild
galley
game_room
garbage_dump
gas_station
gift_shop
golf_course
harbor
herb_garden
highway
home_office
hospital
hospital_room
hot_spring
hotel_room
hotel/outdoor
ice_cream_parlor
iceberg
igloo
islet
ice_skating_rink/outdoor
inn/outdoor
jail_cell
kasbah
kindergarden_classroom
kitchen
kitchenette
laundromat
lighthouse
living_room
lobby
locker_room
mansion
marsh
martial_arts_gym
mausoleum
medina
motel
mountain
mountain_snowy
music_studio
market/outdoor
monastery/outdoor
museum/indoor
nursery
ocean
office
office_building
orchard
pagoda
palace
pantry
parking_lot
parlor
pasture
patio
pavilion
phone_booth
picnic_area
playground
plaza
pond
pulpit
racecourse
raft
railroad_track
rainforest
reception
residential_neighborhood
restaurant
restaurant_kitchen
restaurant_patio
rice_paddy
river
rock_arch
rope_bridge
ruin
runway
sandbar
schoolhouse
sea_cliff
shed
shoe_shop
shopfront
shower
ski_resort
ski_slope
sky
skyscraper
slum
snowfield
staircase
supermarket
swamp
stadium/baseball
stadium/football
stage/indoor
subway_station/platform
swimming_pool/outdoor
television_studio
topiary_garden
tower
train_railway
tree_farm
trench
temple/east_asia
temple/south_asia
track/outdoor
train_station/platform
underwater/coral_reef
valley
vegetable_garden
veranda
viaduct
volcano
waiting_room
water_tower
watering_hole
wheat_field
wind_farm
windmill
yard
他のモデル
今回比較してみたのはあくまで公式配布モデル。Core MLも発表されて1年近く経ち、サードパーティ製モデルも多く公開されています。次回はそのへんを色々発掘して比較してみたいと思います。