iOS
画像処理
MachineLearning
DeepLearning
Swift

Core MLの公式配布モデル6種の比較

Appleが下記ページで配布しているCore MLモデル(.mlmodel)6種を比較してみました。

Machine Learning - Apple Developer

ちなみにInputs, Outputsの欄は「name / type : description」というフォーマットで書いています。(.mlmodelファイルをXcodeプロジェクトに突っ込んで調べました)

MobileNet

MobileNets are based on a streamlined architecture that have depth-wise separable convolutions to build lightweight, deep neural networks.

Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more.

  • サイズ: 17.1 MB
  • ライセンス: Apache License. Version 2.0 http://www.apache.org/licenses/LICENSE-2.0
  • Inputs
    • image / Image (Color 224 x 224) - Input image to be classified
  • Outputs
    • classLabelProbs / Dictionary (String → Double) - Probability of each category
    • classLabel / String - Most likely image category

SqueezeNet

Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more.

With an overall footprint of only 5 MB, SqueezeNet has a similar level of accuracy as AlexNet but with 50 times fewer parameters.

  • サイズ: 5 MB
  • License: BSD License. More information available at https://github.com/DeepScale/SqueezeNet/blob/master/LICENSE
  • Inputs
    • image / Image (Color 227 x 227) - Input image to be classified
  • Outputs
    • classLabelProbs / Dictionary (String → Double) - Probability of each category
    • classLabel / String - Most likely image category

Places205-GoogLeNet

Detects the scene of an image from 205 categories such as an airport terminal, bedroom, forest, coast, and more.

  • サイズ: 24.8 MB
  • ライセンス: Creative Common License. More information available at http://places.csail.mit.edu
  • Inputs
    • sceneImage / Image (Color 224 x 224) - Input image of scene to be classified
  • Outputs
    • sceneLabelProbs / Dictionary (String → Double) - Probability of each scene
    • sceneLabel / String - Most likely scene label

ResNet50

Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more.

The top-5 error from the original publication is 7.8%.

  • サイズ: 102.6 MB
  • ライセンス: MIT License. More information available at https://github.com/fchollet/keras/blob/master/LICENSE
  • Inputs
    • image / Image (Color 224 x 224) - Input image to be classified
  • Outputs
    • classLabelProbs / Dictionary (String → Double) - Probability of each category
    • classLabel / String - Most likely image category

Inception v3

Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more.

The top-5 error from the original publication is 5.6%.

  • サイズ: 94.7 MB
  • ライセンス: MIT License. More information available at https://github.com/fchollet/keras/blob/master/LICENSE
  • Inputs
    • image / Image (Color 299 x 299) - Input image to be classified
  • Outputs
    • classLabelProbs / Dictionary (String → Double) - Probability of each category
    • classLabel / String - Most likely image category

VGG16

Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more.

The top-5 error from the original publication is 7.4%.

  • サイズ: 553.5 MB
  • ライセンス: Creative Commons Attribution 4.0 International(CC BY 4.0). More information available at https://creativecommons.org/licenses/by/4.0/
  • Inputs
    • image / Image (Color 224 x 224) - Input image to be classified
  • Outputs
    • classLabelProbs / Dictionary (String → Double) - Probability of each category
    • classLabel / String - Most likely image category

うち5種類はimagenetの1000クラス分類

なんとなく「さまざまな種類のモデルが配布されている」と思ってましたが、こうしてまじめに見てみると、なんと6種類のうち実に5種類はimagenetの1000クラスの一般物体(木、動物、食べ物、乗り物、人、etc...)を分類するモデルでした。

https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a

これらの5種類の違いとなるモデルサイズ、入力画像サイズ、ライセンスについて一覧にしてみます。

モデル名 モデルサイズ 入力画像サイズ ライセンス
MobileNet 17.1 MB 224 x 224 Apache 2.0
SqueezeNet 5 MB 227 x 227 BSD
ResNet50 102.6 MB 224 x 224 MIT
Inception v3 94.7 MB 299 x 299 Apache 2.0
VGG16 553.5 MB 224 x 224 CC 4.0

シーン分類を行う「Places205-GoogLeNet」

唯一他と毛色の違うモデルが「Places205-GoogLeNet」(モデルファイル名GoogLeNetPlaces)です。入力画像から空港、駅、森、等々205種類の「シーン」を検出します。

ラベルファイルは下記URLにあります。

http://places.csail.mit.edu/IndoorOutdoor_places205.csv

どんなシーンが認識できるのかパッとわかるように、ここに一覧を載せておきます。

abbey
airport_terminal
alley
amphitheater
amusement_park
aquarium
aqueduct
arch
art_gallery
art_studio
assembly_line
attic
auditorium
apartment_building/outdoor
badlands
ballroom
bamboo_forest
banquet_hall
bar
baseball_field
basement
basilica
bayou
beauty_salon
bedroom
boardwalk
boat_deck
bookstore
botanical_garden
bowling_alley
boxing_ring
bridge
building_facade
bus_interior
butchers_shop
butte
bakery/shop
cafeteria
campsite
candy_store
canyon
castle
cemetery
chalet
classroom
closet
clothing_store
coast
cockpit
coffee_shop
conference_center
conference_room
construction_site
corn_field
corridor
cottage_garden
courthouse
courtyard
creek
crevasse
crosswalk
cathedral/outdoor
church/outdoor
dam
dining_room
dock
dorm_room
driveway
desert/sand
desert/vegetation
dinette/home
doorway/outdoor
engine_room
excavation
fairway
fire_escape
fire_station
food_court
forest_path
forest_road
formal_garden
fountain
field/cultivated
field/wild
galley
game_room
garbage_dump
gas_station
gift_shop
golf_course
harbor
herb_garden
highway
home_office
hospital
hospital_room
hot_spring
hotel_room
hotel/outdoor
ice_cream_parlor
iceberg
igloo
islet
ice_skating_rink/outdoor
inn/outdoor
jail_cell
kasbah
kindergarden_classroom
kitchen
kitchenette
laundromat
lighthouse
living_room
lobby
locker_room
mansion
marsh
martial_arts_gym
mausoleum
medina
motel
mountain
mountain_snowy
music_studio
market/outdoor
monastery/outdoor
museum/indoor
nursery
ocean
office
office_building
orchard
pagoda
palace
pantry
parking_lot
parlor
pasture
patio
pavilion
phone_booth
picnic_area
playground
plaza
pond
pulpit
racecourse
raft
railroad_track
rainforest
reception
residential_neighborhood
restaurant
restaurant_kitchen
restaurant_patio
rice_paddy
river
rock_arch
rope_bridge
ruin
runway
sandbar
schoolhouse
sea_cliff
shed
shoe_shop
shopfront
shower
ski_resort
ski_slope
sky
skyscraper
slum
snowfield
staircase
supermarket
swamp
stadium/baseball
stadium/football
stage/indoor
subway_station/platform
swimming_pool/outdoor
television_studio
topiary_garden
tower
train_railway
tree_farm
trench
temple/east_asia
temple/south_asia
track/outdoor
train_station/platform
underwater/coral_reef
valley
vegetable_garden
veranda
viaduct
volcano
waiting_room
water_tower
watering_hole
wheat_field
wind_farm
windmill
yard

他のモデル

今回比較してみたのはあくまで公式配布モデル。Core MLも発表されて1年近く経ち、サードパーティ製モデルも多く公開されています。次回はそのへんを色々発掘して比較してみたいと思います。