1
0

Florence-2

Florence-2は、MicrosoftがMITライセンスオープンソースであるvision-languageモデルです。

様々なタスクが実行可能で、Task Promptは以下が選択可能だった。
image.png

Huggingface Demo

サンプル1

【Object Detection】
Input:
image1.jpg

Output:

{'<OD>': {'bboxes': [[34.23999786376953, 160.0800018310547, 597.4400024414062, 371.7599792480469], [456.0, 97.68000030517578, 580.1599731445312, 261.8399963378906], [450.8800048828125, 276.7200012207031, 554.5599975585938, 370.79998779296875], [95.68000030517578, 280.55999755859375, 198.72000122070312, 371.2799987792969]], 'labels': ['car', 'door', 'wheel', 'wheel']}}

image (1).jpg

サンプル2

【OCR with Region】
Input:
image2.jpg

Output:

{'<OCR_WITH_REGION>': {'quad_boxes': [[167.0435028076172, 50.25, 375.7974853515625, 50.25, 375.7974853515625, 114.75, 167.0435028076172, 114.75], [144.8784942626953, 120.75, 375.7974853515625, 120.75, 375.7974853515625, 149.25, 144.8784942626953, 149.25], [115.86249542236328, 165.25, 376.6034851074219, 166.25, 376.6034851074219, 184.25, 115.86249542236328, 183.25], [239.9864959716797, 184.25, 376.6034851074219, 186.25, 376.6034851074219, 204.25, 239.9864959716797, 202.25], [266.1814880371094, 441.25, 376.6034851074219, 441.25, 376.6034851074219, 456.25, 266.1814880371094, 456.25], [252.0764923095703, 460.25, 376.6034851074219, 460.25, 376.6034851074219, 475.25, 252.0764923095703, 475.25]], 'labels': ['</s>CUDA', 'FOR ENGINEERS', 'An Introduction to High-Performance', 'Parallel Computing', 'DUANE STORTI', 'METE YURTOGLU']}}

image(2).jpg

サンプルは2つともいい感じですね。

追加OCRサンプル

【OCR with Region】
hardware_requirements.PNG

{'<OCR_WITH_REGION>': {'quad_boxes': [[6.959499835968018, 18.912500381469727, 233.2554931640625, 18.912500381469727, 233.2554931640625, 48.23750305175781, 6.959499835968018, 48.23750305175781], [13.245499610900879, 69.9124984741211, 432.1625061035156, 69.9124984741211, 432.1625061035156, 86.91250610351562, 13.245499610900879, 86.91250610351562], [28.511499404907227, 91.5875015258789, 165.00750732421875, 91.5875015258789, 165.00750732421875, 108.16250610351562, 28.511499404907227, 108.16250610351562], [13.245499610900879, 119.63750457763672, 433.95849609375, 119.63750457763672, 433.95849609375, 136.6374969482422, 13.245499610900879, 136.6374969482422], [13.245499610900879, 169.78750610351562, 187.45750427246094, 168.9375, 187.45750427246094, 187.63751220703125, 13.245499610900879, 188.4875030517578], [45.57350158691406, 231.83750915527344, 243.58250427246094, 231.83750915527344, 243.58250427246094, 248.41250610351562, 45.57350158691406, 248.41250610351562], [13.245499610900879, 258.6125183105469, 426.7745056152344, 259.4624938964844, 426.7745056152344, 277.3125, 13.245499610900879, 276.4624938964844], [28.511499404907227, 281.13751220703125, 208.5605010986328, 281.13751220703125, 208.5605010986328, 300.26251220703125, 28.511499404907227, 299.4125061035156], [13.245499610900879, 309.1875, 392.2015075683594, 309.1875, 392.2015075683594, 326.6125183105469, 13.245499610900879, 326.6125183105469], [28.511499404907227, 332.9875183105469, 206.76449584960938, 332.9875183105469, 206.76449584960938, 350.8375244140625, 28.511499404907227, 350.8375244140625], [13.245499610900879, 359.76251220703125, 174.885498046875, 359.76251220703125, 174.885498046875, 377.6125183105469, 13.245499610900879, 377.6125183105469], [8.306500434875488, 398.01251220703125, 319.9125061035156, 398.01251220703125, 319.9125061035156, 412.4625244140625, 8.306500434875488, 412.4625244140625]], 'labels': ['</s>ハードドウェア明付', '> NVIDIA Jetson Nano™ 高清港をギャント おまたにはJetson Nano', '2GB用統技行キャンド*', '> MicroSDカード(接設計:64 GB UHS-1, 高小/J: 32 GB UHs-', '> Micro-B USBケープル', '> JetsonNano高把記絲訂キャット', '>カメラ(Logitech C270 USB Webcam おたにおは Raspberry', 'Piカメメラモシュールv2)', '> SDカートドスロッ トを使用スポスペン PC おきたはノード PC', '(Windows, Mac, Linux)', '> 第語: NVIDIA JetBot', '*Jetsón TK1 ブラックであけれは、この Jetsonでも安安限定用品社']}}

image3.jpg
日本語は弱いみたいですね。(学習データの詳細とか見ていませんがあまり日本語は含まれていなさそうな予感)

###追加Object Detectionサンプル
【Object Detection】
Input:
e69f59e5c1bd3f695aedd61ca487199f.jpg

Output:

{'<OD>': {'bboxes': [[163.41749572753906, 18.25349998474121, 200.2725067138672, 39.195499420166016], [285.7275085449219, 55.04349899291992, 322.177490234375, 69.47650146484375], [134.66250610351562, 237.8614959716797, 169.4925079345703, 265.02947998046875], [191.3625030517578, 170.5074920654297, 218.09249877929688, 197.1094970703125], [262.6424865722656, 205.31649780273438, 289.37249755859375, 223.42849731445312], [151.2675018310547, 208.14649963378906, 175.16250610351562, 223.14549255371094], [264.26251220703125, 169.09249877929688, 285.7275085449219, 183.52549743652344], [135.87750244140625, 173.0544891357422, 157.34249877929688, 185.78948974609375], [343.23748779296875, 165.9794921875, 363.8924865722656, 178.14849853515625], [213.6374969482422, 137.6794891357422, 232.2675018310547, 149.56549072265625], [152.07749938964844, 147.86749267578125, 171.1125030517578, 158.62149047851562], [277.2225036621094, 143.33949279785156, 292.61248779296875, 153.52749633789062], [134.66250610351562, 102.87049865722656, 149.6475067138672, 111.64349365234375], [270.74249267578125, 254.27549743652344, 295.447509765625, 272.6705017089844], [140.33250427246094, 260.78448486328125, 163.0124969482422, 279.4624938964844], [152.8874969482422, 218.33448791503906, 173.1374969482422, 236.4464874267578], [345.6675109863281, 193.99649047851562, 361.86749267578125, 206.44850158691406], [25.717500686645508, 192.864501953125, 104.2874984741211, 282.57550048828125], [172.7324981689453, 202.76950073242188, 235.1024932861328, 282.57550048828125], [7.087500095367432, 143.62249755859375, 50.82749938964844, 226.2584991455078], [336.3525085449219, 237.0124969482422, 380.9024963378906, 282.57550048828125], [0.20250000059604645, 241.54049682617188, 40.70249938964844, 282.57550048828125], [295.447509765625, 251.72850036621094, 346.4775085449219, 282.57550048828125], [0.20250000059604645, 89.85250091552734, 19.237499237060547, 141.92449951171875], [0.20250000059604645, 6.367499828338623, 404.3924865722656, 282.57550048828125], [118.05750274658203, 208.14649963378906, 193.79249572753906, 282.57550048828125], [107.52750396728516, 173.33749389648438, 167.46749877929688, 246.63449096679688], [235.1024932861328, 235.031494140625, 310.4324951171875, 282.57550048828125]], 'labels': ['bus', 'car', 'helmet', 'helmet', 'helmet', 'helmet', 'helmet', 'helmet', 'helmet', 'helmet', 'helmet', 'helmet', 'helmet', 'human face', 'human face', 'human face', 'human face', 'motorcycle', 'motorcycle', 'motorcycle', 'motorcycle', 'motorcycle', 'motorcycle', 'motorcycle', 'person', 'person', 'person', 'person']}}

image4.jpg
まずまずですかね?

まとめ

1つのモデルでいろいろなタスクができるのはすごい!
もう少し精度が上がると嬉しい!(個人でFine Tuningやれという話かもしれない)

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0