Azureが非常に便利なので深めて行こうと思います。
今回はAzureのOCRをGoogleコラボで動かしてみます。
!pip install azure-cognitiveservices-vision-computervision
import os
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from google.colab import files
import time
os.environ["AZURE_OPENAI_KEY"] = ""
os.environ["AZURE_OPENAI_ENDPOINT"] = ""
key = os.environ["AZURE_OPENAI_KEY"]
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
credentials = CognitiveServicesCredentials(key)
client = ComputerVisionClient(endpoint, credentials)
uploaded = files.upload()
for filename in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(name=filename, length=len(uploaded[filename])))
# アップロードした画像ファイルを使用
image_path = filename # アップロードしたファイルの名前
# SDK call
with open(image_path, "rb") as image_stream:
rawHttpResponse = client.read_in_stream(image_stream, language="en", raw=True)
# Get ID from returned headers
operationLocation = rawHttpResponse.headers["Operation-Location"]
numberOfCharsInOperationId = 36
idLocation = len(operationLocation) - numberOfCharsInOperationId
operationId = operationLocation[idLocation:]
# Wait for the operation to complete (this can be adjusted based on your needs)
time.sleep(10)
# SDK call
result = client.get_read_result(operationId)
# Check the status
if result.status == OperationStatusCodes.succeeded:
for line in result.analyze_result.read_results[0].lines:
print(line.text)
else:
print("Operation status:", result.status)
綺麗な英語の画像であればほぼ100%読み込みができます。日本語の場合はlanguage="ja"に変更すればOKです。手書きもそこそこ読み取れました。