Help us understand the problem. What is going on with this article?

RailsでPDFのテキストを読み込んでみる pdf-reader編

More than 1 year has passed since last update.

gemのインストール

  • Gemfileにgem 'pdf-reader'を追加してbundle install

Pdfの読み込み

  • Railsのタスクに登録して見る
lib/tasks/read_pdf.rake
namespace :read_pdf do
  desc 'PDF読み込み' # rake -T で表示する説明
  task read: :environment do
    # Report.pdfの読み込み処理
    reader = PDF::Reader.new('xxxx.pdf') # 読み込むPDF名

    reader.pages.each do |page|
      puts page.text # 読み込んだテキストの出力
    end
  end
end

rails -T でタスクに登録されているか確認する。

$ rails -T
~~~~~略~~~~~
rails read_pdf:read                      # PDF読み込み
~~~~~略~~~~~
$

ちゃんと登録されているようなので、rails read_pdf:read で実行する。
こちらのPDFサンプルを読み込んでみる

$ rails -T
1%' αϯϓϧσʔλ




 ͓஌Βͤ 1%' αϯϓϧσʔλ͓஌Βͤ 1%' αϯϓϧσʔλ͓஌Βͤ 1%'

αϯϓϧσʔλ͓஌Βͤ 1%' αϯϓϧσʔλ͓஌Βͤ 1%' αϯϓϧσʔ
λ͓஌Βͤ 1%' αϯϓϧσʔλ͓஌Βͤ 1%' αϯϓϧσʔλ͓஌Βͤ

1%' αϯϓϧσʔλ͓஌Βͤ 1%' αϯϓϧσʔλ͓஌Βͤ 1%' αϯϓϧ

σʔλ͓஌Βͤ 1%' αϯϓϧσʔλ

                    ͓஌Βͤ 1%' αϯϓϧσʔλ͓஌Βͤ
                    1%' αϯϓϧσʔλ͓஌Βͤ 1%' α

                    ϯϓϧσʔλ͓஌Βͤ 1%' αϯϓϧ

                    σʔλ͓஌Βͤ 1%' αϯϓϧσʔλ͓
                    ஌Βͤ 1%' αϯϓϧσʔλ͓஌Βͤ

                    1%' αϯϓϧσʔλ͓஌Βͤ 1%' α

                    ϯϓϧσʔλ͓஌Βͤ 1%' αϯϓϧ
                    σʔλ͓஌Βͤ 1%' αϯϓϧσʔλ͓

                    ஌Βͤ 1%' αϯϓϧσʔλ




 ͓஌Βͤ 1%' αϯϓϧσʔλ͓஌Βͤ

1%' αϯϓϧσʔλ͓஌Βͤ 1%' α

ϯϓϧσʔλ͓஌Βͤ 1%' αϯϓϧ
σʔλ͓஌Βͤ 1%' αϯϓϧσʔλ͓

஌Βͤ 1%' αϯϓϧσʔλ͓஌Βͤ

1%' αϯϓϧσʔλ͓஌Βͤ 1%' α
ϯϓϧσʔλ͓஌Βͤ 1%' αϯϓϧ

σʔλ͓஌Βͤ 1%' αϯϓϧσʔλ͓

஌Βͤ 1%' αϯϓϧσʔλ
$

めっちゃ文字化けした。。。
GitHubの説明を見たら、PDFのエンコードによらずUTF-8でに変換されると。。。

エンコードを指定できなさそうなので、他のGemを探すことに。。。

Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away