0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Amazon Textractを使ってみる

Posted at

PDFから文字を抽出してみた。
日本語は失敗する。

C:\Users\XX>mkdir testtextract

C:\Users\XX>cd testtextract

C:\Users\XX\testtextract>npm init -y
...省略...

C:\Users\XX\testtextract>npm install @aws-sdk/client-textract
...省略...

C:\Users\XX\testtextract>node test.js
(node:8260) NOTE: The AWS SDK for JavaScript (v2) is in maintenance mode.
SDK releases are limited to address critical bug fixes and security issues only.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the blog post at https://a.co/cUPnyil
(Use `node --trace-warnings ...` to show where the warning was created)
抽出されたテキスト:
Hello World!
This is a test document.
Amazon Textract works great!
Testing OCR functionality.
1
EXT
EXT
B
test.js
const AWS = require('aws-sdk');
const fs = require('fs');

// AWSの設定
AWS.config.update({region: 'us-east-1'}); // リージョンを適切に設定してください
const textract = new AWS.Textract();

async function extractTextFromPDF(filePath) {
try {
const fileContent = fs.readFileSync(filePath);

const params = {
Document: {
Bytes: fileContent
}
};

const result = await textract.detectDocumentText(params).promise();

let extractedText = '';
result.Blocks.forEach(block => {
if (block.BlockType === "LINE") {
extractedText += block.Text + '\n';
}
});

console.log('抽出されたテキスト:');
console.log(extractedText);
} catch (error) {
console.error('エラーが発生しました:', error);
}
}

// 使用例
const filePath = 'test.pdf';
extractTextFromPDF(filePath);
test.pdf
Hello World!
This is a test document.
Amazon Textract works great!
Testing OCR functionality.
日本語開始
改行
改行
日本語終了
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?