#目的
深層学習による画像認識の結果をみたときに、**凄すぎる!と思う場合と何故(これが検出できない)?!**と思う場合がある。
前者は、結構、部分勝負で検出しているだけの場合があるが。。。
以下のモデルで、現場力を比較してみる。
- Vision Transformer (ViT)
- ResNet50
ともに、Imagenet学習済モデル。
比較の観点 | |
---|---|
第1ラウンド | 重なった?映り込み(想定)画像 |
第2ラウンド | 寄りの画像 |
#第1ラウンド:重なった?映り込み(想定)画像
##対象画像
以下の記事で、2つのわかりやすい画像を重ねた画像をつくった。
Attention Is All You Need の 『Positional Encoding』の作用の仕組みを探る(まだ、始め)
これを検出させてみる。
これは、現実世界では、ガラスへの映り込みとか、いろいろなケースであり得る画像だと思う。
##結果
両方ともダメ。
一応100位とか、200位とかでは検出しているが。。。
ViTでは、bananaが1番にはなっているが、pandaが全くだめ。
予想以上にダメ、がっかり。 (この絵文字じゃないな。。。) |
---|
##Vision Transformer (ViT)
使ったソース:
https://github.com/lukemelas/PyTorch-Pretrained-ViT
モデル:B_16_imagenet1k
※大きいほうでないもの。大きいほうのモデルだと頑張るんだろうか?
[954] banana ★★ (62.86%)
[955] jackfruit, jak, jack (24.47%)
[956] custard apple (1.86%)
[64] green mamba (1.15%)
[953] pineapple, ananas (0.64%)
[951] lemon (0.43%)
[55] green snake, grass snake (0.27%)
[950] orange (0.22%)
[114] slug (0.20%)
[59] vine snake (0.19%)
[63] Indian cobra, Naja naja (0.18%)
[986] yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum (0.15%)
[952] fig (0.15%)
[47] African chameleon, Chamaeleo chamaeleon (0.14%)
[113] snail (0.14%)
[957] pomegranate (0.14%)
[66] horned viper, cerastes, sand viper, horned asp, Cerastes cornutus (0.11%)
[31] tree frog, tree-frog (0.10%)
[599] honeycomb (0.10%)
[96] toucan (0.09%)
[93] hornbill (0.08%)
[944] artichoke, globe artichoke (0.07%)
[990] buckeye, horse chestnut, conker (0.07%)
[738] pot, flowerpot (0.07%)
[940] spaghetti squash (0.07%)
[38] banded gecko (0.06%)
[948] Granny Smith (0.06%)
[997] bolete (0.06%)
[995] earthstar (0.06%)
[61] boa constrictor, Constrictor constrictor (0.06%)
[949] strawberry (0.06%)
[364] three-toed sloth, ai, Bradypus tridactylus (0.06%)
[380] titi, titi monkey (0.05%)
[68] sidewinder, horned rattlesnake, Crotalus cerastes (0.05%)
[701] parachute, chute (0.05%)
[382] squirrel monkey, Saimiri sciureus (0.05%)
[60] night snake, Hypsiglena torquata (0.04%)
[994] stinkhorn, carrion fungus (0.04%)
[62] rock python, rock snake, Python sebae (0.04%)
[989] hip, rose hip, rosehip (0.04%)
[384] indri, indris, Indri indri, Indri brevicaudatus (0.04%)
[947] mushroom (0.04%)
[988] acorn (0.04%)
[133] bittern (0.03%)
[126] isopod (0.03%)
[998] ear, spike, capitulum (0.03%)
[46] green lizard, Lacerta viridis (0.03%)
[54] hognose snake, puff adder, sand viper (0.03%)
[110] flatworm, platyhelminth (0.03%)
[82] ruffed grouse, partridge, Bonasa umbellus (0.03%)
[410] apiary, bee house (0.03%)
[112] conch (0.03%)
[356] weasel (0.03%)
[387] lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (0.03%)
[95] jacamar (0.03%)
[378] capuchin, ringtail, Cebus capucinus (0.02%)
[993] gyromitra (0.02%)
[117] chambered nautilus, pearly nautilus, nautilus (0.02%)
[306] rhinoceros beetle (0.02%)
[404] airliner (0.02%)
[116] chiton, coat-of-mail shell, sea cradle, polyplacophore (0.02%)
[924] guacamole (0.02%)
[43] frilled lizard, Chlamydosaurus kingi (0.02%)
[5] electric ray, crampfish, numbfish, torpedo (0.02%)
[105] koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus (0.02%)
[48] Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis (0.02%)
[299] meerkat, mierkat (0.02%)
[52] thunder snake, worm snake, Carphophis amoenus (0.02%)
[83] prairie chicken, prairie grouse, prairie fowl (0.02%)
[669] mosquito net (0.02%)
[32] tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui (0.02%)
[16] bulbul (0.02%)
[319] dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk (0.02%)
[417] balloon (0.02%)
[506] coil, spiral, volute, whorl, helix (0.02%)
[325] sulphur butterfly, sulfur butterfly (0.02%)
[86] partridge (0.02%)
[936] head cabbage (0.02%)
[941] acorn squash (0.02%)
[938] cauliflower (0.02%)
[45] Gila monster, Heloderma suspectum (0.02%)
[987] corn (0.02%)
[996] hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa (0.02%)
[390] eel (0.02%)
[81] ptarmigan (0.02%)
[40] American chameleon, anole, Anolis carolinensis (0.02%)
[326] lycaenid, lycaenid butterfly (0.02%)
[331] hare (0.02%)
[88] macaw (0.02%)
[377] marmoset (0.02%)
[843] swing (0.02%)
[73] barn spider, Araneus cavaticus (0.02%)
[130] flamingo (0.02%)
[363] armadillo (0.02%)
[70] harvestman, daddy longlegs, Phalangium opilio (0.02%)
[53] ringneck snake, ring-necked snake, ring snake (0.02%)
[295] American black bear, black bear, Ursus americanus, Euarctos americanus (0.02%)
[972] cliff, drop, drop-off (0.02%)
[21] kite (0.02%)
[392] rock beauty, Holocanthus tricolor (0.02%)
[26] common newt, Triturus vulgaris (0.02%)
[335] fox squirrel, eastern fox squirrel, Sciurus niger (0.02%)
[67] diamondback, diamondback rattlesnake, Crotalus adamanteus (0.02%)
[588] hamper (0.02%)
[908] wing (0.02%)
[367] chimpanzee, chimp, Pan troglodytes (0.02%)
[405] airship, dirigible (0.01%)
[298] mongoose (0.01%)
[728] plastic bag (0.01%)
[65] sea snake (0.01%)
[51] triceratops (0.01%)
[101] tusker (0.01%)
[374] langur (0.01%)
[76] tarantula (0.01%)
[991] coral fungus (0.01%)
[365] orangutan, orang, orangutang, Pongo pygmaeus (0.01%)
[388] giant panda ★★, panda, panda bear, coon bear, Ailuropoda melanoleuca (0.01%)
[943] cucumber, cuke (0.01%)
[30] bullfrog, Rana catesbeiana (0.01%)
[109] brain coral (0.01%)
[80] black grouse (0.01%)
[946] cardoon (0.01%)
[322] ringlet, ringlet butterfly (0.01%)
[315] mantis, mantid (0.01%)
[316] cicada, cicala (0.01%)
[376] proboscis monkey, Nasalis larvatus (0.01%)
[385] Indian elephant, Elephas maximus (0.01%)
[168] redbone (0.01%)
[92] bee eater (0.01%)
[379] howler monkey, howler (0.01%)
[90] lorikeet (0.01%)
[383] Madagascar cat, ring-tailed lemur, Lemur catta (0.01%)
[607] jack-o'-lantern
##ResNet50
使ったソース:https://keras.io/ja/applications/
モデル:ResNet50(weights='imagenet')
[('n03188531', 'diaper', 0.06685701),
('n04209133', 'shower_cap', 0.06336665),
('n01748264', 'Indian_cobra', 0.053833533),
('n03958227', 'plastic_bag', 0.0477978),
('n07880968', 'burrito', 0.0431565),
('n04033995', 'quilt', 0.03513578),
('n07718747', 'artichoke', 0.02602887),
('n04235860', 'sleeping_bag', 0.025143128),
('n07860988', 'dough', 0.024247095),
('n01667778', 'terrapin', 0.022566179),
('n01664065', 'loggerhead', 0.021968575),
('n07714571', 'head_cabbage', 0.021425359),
('n02074367', 'dugong', 0.020652529),
('n01440764', 'tench', 0.019023761),
('n01698640', 'American_alligator', 0.016179482),
('n03888257', 'parachute', 0.013034657),
('n01677366', 'common_iguana', 0.012845791),
('n01749939', 'green_mamba', 0.012838124),
('n01944390', 'snail', 0.010219791),
('n01669191', 'box_turtle', 0.009841825),
('n01695060', 'Komodo_dragon', 0.00868533),
('n01667114', 'mud_turtle', 0.008071985),
('n01737021', 'water_snake', 0.0073151086),
('n03627232', 'knot', 0.006370477),
('n07697313', 'cheeseburger', 0.0062961983),
('n03871628', 'packet', 0.0060379985),
('n03485794', 'handkerchief', 0.006006749),
('n04367480', 'swab', 0.00592959),
('n01697457', 'African_crocodile', 0.005014863),
('n03868863', 'oxygen_mask', 0.004961956),
('n02110958', 'pug', 0.0048995228),
('n04399382', 'teddy', 0.004425834),
('n04133789', 'sandal', 0.004231881),
('n03938244', 'pillow', 0.0041891537),
('n01768244', 'trilobite', 0.004171828),
('n01986214', 'hermit_crab', 0.0040465007),
('n02869837', 'bonnet', 0.003970811),
('n02909870', 'bucket', 0.003894355),
('n01883070', 'wombat', 0.0038887928),
('n07684084', 'French_loaf', 0.003779268),
('n07695742', 'pretzel', 0.0037406099),
('n07579787', 'plate', 0.00372636),
('n07753592', 'banana' ★★, 0.0037171696),
('n01753488', 'horned_viper', 0.0037020848),
('n07717410', 'acorn_squash', 0.0036752466),
('n01755581', 'diamondback', 0.0035176787),
('n01704323', 'triceratops', 0.0033780576),
('n01955084', 'chiton', 0.0032795744),
('n03786901', 'mortar', 0.0032332388),
('n07697537', 'hotdog', 0.003205056),
('n02134084', 'ice_bear', 0.0031007996),
('n02398521', 'hippopotamus', 0.0030010697),
('n09468604', 'valley', 0.0028716042),
('n01744401', 'rock_python', 0.0028035499),
('n02526121', 'eel', 0.0027678132),
('n07760859', 'custard_apple', 0.002687465),
('n01692333', 'Gila_monster', 0.0026803724),
('n04458633', 'totem_pole', 0.0025927152),
('n02129604', 'tiger', 0.0025757342),
('n02093428', 'American_Staffordshire_terrier', 0.0024840261),
('n01943899', 'conch', 0.0024245891),
('n02795169', 'barrel', 0.0023302436),
('n02480495', 'orangutan', 0.0022904908),
('n02093256', 'Staffordshire_bullterrier', 0.0022238945),
('n03788365', 'mosquito_net', 0.0020587973),
('n03887697', 'paper_towel', 0.0019693126),
('n02108089', 'boxer', 0.001964624),
('n04493381', 'tub', 0.0018422267),
('n02640242', 'sturgeon', 0.0018405094),
('n02808440', 'bathtub', 0.0018323634),
('n01729977', 'green_snake', 0.0017895664),
('n09332890', 'lakeside', 0.0017504566),
('n01675722', 'banded_gecko', 0.0017276495),
('n03929855', 'pickelhaube', 0.0017105255),
('n02443114', 'polecat', 0.0016949566),
('n01751748', 'sea_snake', 0.0016778206),
('n02085936', 'Maltese_dog', 0.0016648371),
('n02099601', 'golden_retriever', 0.0016648251),
('n12144580', 'corn', 0.0016592229),
('n01629819', 'European_fire_salamander', 0.0016194135),
('n03935335', 'piggy_bank', 0.00161333),
('n04259630', 'sombrero', 0.0016044596),
('n03014705', 'chest', 0.0016021407),
('n13133613', 'ear', 0.001600591),
('n02837789', 'bikini', 0.001596331),
('n09256479', 'coral_reef', 0.0015835938),
('n07584110', 'consomme', 0.0015825942),
('n02504458', 'African_elephant', 0.0015425562),
('n01491361', 'tiger_shark', 0.0015347875),
('n03792972', 'mountain_tent', 0.0015005224),
('n03991062', 'pot', 0.0014792152),
('n01740131', 'night_snake', 0.0014612696),
('n01819313', 'sulphur-crested_cockatoo', 0.0014487966),
('n02095570', 'Lakeland_terrier', 0.0014417097),
('n03065424', 'coil', 0.0014214374),
('n02090379', 'redbone', 0.001331581),
('n04251144', 'snorkel', 0.0013039334),
('n02107312', 'miniature_pinscher', 0.0012931975),
('n04522168', 'vase', 0.001289689),
('n01990800', 'isopod', 0.0012486882),
('n02098413', 'Lhasa', 0.0012427429),
('n01983481', 'American_lobster', 0.0012367677),
('n01665541', 'leatherback_turtle', 0.0012171009),
('n01978287', 'Dungeness_crab', 0.0012116744),
('n02088466', 'bloodhound', 0.0012091378),
('n02444819', 'otter', 0.0012075476),
('n02786058', 'Band_Aid', 0.0011942227),
('n01945685', 'slug', 0.0011502523),
('n04162706', 'seat_belt', 0.001143836),
('n07715103', 'cauliflower', 0.0011376633),
('n04336792', 'stretcher', 0.0011328654),
('n01689811', 'alligator_lizard', 0.0010975883),
('n04398044', 'teapot', 0.0010951292),
('n01917289', 'brain_coral', 0.0010893045),
('n03724870', 'mask', 0.0010842477),
('n02363005', 'beaver', 0.0010828231),
('n04371774', 'swing', 0.0010819857),
('n02493509', 'titi', 0.0010802051),
('n02106550', 'Rottweiler', 0.0010796139),
('n03710637', 'maillot', 0.0010739941),
('n07930864', 'cup', 0.0010585586),
('n01641577', 'bullfrog', 0.0010573524),
('n03623198', 'knee_pad', 0.0010541431),
('n01644900', 'tailed_frog', 0.001051543),
('n04560804', 'water_jug', 0.0010370798),
('n07754684', 'jackfruit', 0.00102939),
('n07614500', 'ice_cream', 0.0009983778),
('n02319095', 'sea_urchin', 0.000994882),
('n02106382', 'Bouvier_des_Flandres', 0.0009813908),
('n03496892', 'harvester', 0.0009761827),
('n02747177', 'ashcan', 0.00097544893),
('n03388043', 'fountain', 0.00097304664),
('n02097047', 'miniature_schnauzer', 0.0009551877),
('n04229816', 'ski_mask', 0.00093830726),
('n02102480', 'Sussex_spaniel', 0.00093589386),
('n02514041', 'barracouta', 0.00093452935),
('n02808304', 'bath_towel', 0.0009257423),
('n02807133', 'bathing_cap', 0.00092556357),
('n01685808', 'whiptail', 0.0009166785),
('n02097658', 'silky_terrier', 0.0009142907),
('n04613696', 'yurt', 0.0009025597),
('n02951358', 'canoe', 0.0009017497),
('n01914609', 'sea_anemone', 0.0008968752),
('n04201297', 'shoji', 0.00088736176),
('n04090263', 'rifle', 0.00088089885),
('n01756291', 'sidewinder', 0.0008525167),
('n03992509', "potter's_wheel", 0.00085035834),
('n01688243', 'frilled_lizard', 0.0008264672),
('n02096585', 'Boston_bull', 0.0008254309),
('n02098286', 'West_Highland_white_terrier', 0.0008137851),
('n02782093', 'balloon', 0.0008128203),
('n07753275', 'pineapple', 0.000812721),
('n03590841', "jack-o'-lantern", 0.00081251684),
('n01984695', 'spiny_lobster', 0.00080373336),
('n03355925', 'flagpole', 0.0007974661),
('n02481823', 'chimpanzee', 0.0007972197),
('n01818515', 'macaw', 0.0007904845),
('n01980166', 'fiddler_crab', 0.00077911804),
('n07693725', 'bagel', 0.00077835674),
('n07749582', 'lemon', 0.000773825),
('n03127747', 'crash_helmet', 0.0007644384),
('n07742313', 'Granny_Smith', 0.00075247313),
('n04525038', 'velvet', 0.00074524723),
('n03873416', 'paddle', 0.0007378166),
('n02494079', 'squirrel_monkey', 0.0007339725),
('n02107574', 'Greater_Swiss_Mountain_dog', 0.00073071686),
('n01873310', 'platypus', 0.0007291971),
('n09421951', 'sandbar', 0.00072215486),
('n02087394', 'Rhodesian_ridgeback', 0.00072002097),
('n02112137', 'chow', 0.0007164359),
('n04540053', 'volleyball', 0.00069893955),
('n02804414', 'bassinet', 0.000693986),
('n03637318', 'lampshade', 0.0006926213),
('n02504013', 'Indian_elephant', 0.0006885995),
('n01742172', 'boa_constrictor', 0.0006875997),
('n02108915', 'French_bulldog', 0.00068659685),
('n04355933', 'sunglass', 0.0006827656),
('n02088238', 'basset', 0.00068103475),
('n15075141', 'toilet_tissue', 0.00068031746),
('n09246464', 'cliff', 0.00067776826),
('n02606052', 'rock_beauty', 0.0006772798),
('n02281406', 'sulphur_butterfly', 0.00067677),
('n13054560', 'bolete', 0.0006723335),
('n04326547', 'stone_wall', 0.0006707245),
('n01735189', 'garter_snake', 0.00066899415),
('n02105162', 'malinois', 0.0006637893),
('n03998194', 'prayer_rug', 0.00066320336),
('n01496331', 'electric_ray', 0.00066319166),
('n07248320', 'book_jacket', 0.0006618011),
('n01981276', 'king_crab', 0.00066063984),
('n01494475', 'hammerhead', 0.0006595979),
('n02085620', 'Chihuahua', 0.0006546478),
('n03916031', 'perfume', 0.0006438952),
('n02097474', 'Tibetan_terrier', 0.0006358973),
('n02099849', 'Chesapeake_Bay_retriever', 0.00062768563),
('n04462240', 'toyshop', 0.00062643486),
('n02454379', 'armadillo', 0.00062574615),
('n03825788', 'nipple', 0.00062463415),
('n02096177', 'cairn', 0.0006217132),
('n04557648', 'water_bottle', 0.0006131043),
('n01847000', 'drake', 0.00060888304),
('n04532106', 'vestment', 0.00060648005),
('n02492660', 'howler_monkey', 0.00059765484),
('n02361337', 'marmot', 0.0005971751),
('n02510455', 'giant_panda' ★★, 0.0005930799),
('n02104365', 'schipperke', 0.00059168396),
('n02132136', 'brown_bear', 0.00058264274),
('n02088364', 'beagle', 0.00057981495),
('n03131574', 'crib', 0.00056277745),
('n02493793', 'spider_monkey', 0.000553812),
('n01739381', 'vine_snake', 0.00055204815),
('n02910353', 'buckle', 0.00055191625)]
#第2ラウンド:寄りの画像
##対象画像
以下の記事で、使った「寄り」の画像。
全22行のコードでGradCAM。tf_explainは、使い易いかも、お薦め!
##結果
両方とも、まあまあ。
ViTでは、tabby, tabby catが1番にはなっているので、いい?
##Vision Transformer (ViT)
使ったソース:
https://github.com/lukemelas/PyTorch-Pretrained-ViT
モデル:B_16_imagenet1k
[281] tabby, tabby cat(31.92%)
[285] Egyptian cat(29.05%)
[282] tiger cat(28.46%)
[287] lynx, catamount (2.49%)
[283] Persian cat (1.06%)
[284] Siamese cat, Siamese (0.47%)
[478] carton (0.13%)
[728] plastic bag (0.12%)
[678] neck brace (0.09%)
[903] wig (0.08%)
[292] tiger, Panthera tigris (0.08%)
[722] ping-pong ball (0.07%)
[332] Angora, Angora rabbit (0.06%)
[620] laptop, laptop computer(0.06%)
[700] paper towel (0.06%)
##ResNet50
[('n02124075', 'Egyptian_cat', 0.5787574),
('n02123159', 'tiger_cat', 0.32389614),
('n02123045', 'tabby', 0.06504008),
('n02127052', 'lynx', 0.020585762),
('n02123394', 'Persian_cat', 0.0019484512),
('n04589890', 'window_screen', 0.001253177),
('n03958227', 'plastic_bag', 0.0010556503),
('n02123597', 'Siamese_cat', 0.0009966693),
('n02129604', 'tiger', 0.00061496574),
('n04235860', 'sleeping_bag', 0.00025958236),
('n07714990', 'broccoli', 0.00025066984),
('n04023962', 'punching_bag', 0.00023452094),
('n02971356', 'carton', 0.00023311013),
('n03724870', 'mask', 0.00018257847),
('n02808304', 'bath_towel', 0.00015606672),
#まとめ
特にありません。
このレベルだと、これらの検出器は、部品という感じですね。前後の処理がだいぶないと、現場力がない。
(augmentationとかも、なんだかな。。。という気持ちになる。)
こういうのに強いネットワークとかモデルも、既に、どこかにあるんでしょうね。