Help us understand the problem. What is going on with this article?

LibreOfficeでドキュメントコンバータを作ろう

More than 1 year has passed since last update.

はじめに

オフィスファイルの変換や各種ドキュメント形式の変換をWindows以外で行おうとするといろいろと手間がかかることが多いです。
サービスの中に組み込む場合などは、各種Excel形式を扱えるようにするライブラリを入れたりする必要があったりして、非常に手間がかかることが多いとおもいます。そこで、できるだけ簡単にファイルを変換する仕組みが作れないかということで、試行錯誤したのがこちらです。

環境とツール

  • OS : CentOS 7.5
  • Tool : LibreOffice 6.1.0.3

セットアップ

基本的にGUIは使いませんので、X-Windowやウインドウマネージャは不要です。

cd /opt
curl -L -O "http://download.documentfoundation.org/libreoffice/stable/6.1.0/rpm/x86_64/LibreOffice_6.1.0_Linux_x86-64_rpm_langpack_ja.tar.gz"
curl -L -O "http://download.documentfoundation.org/libreoffice/stable/6.1.0/rpm/x86_64/LibreOffice_6.1.0_Linux_x86-64_rpm_helppack_ja.tar.gz"
curl -L -O "http://download.documentfoundation.org/libreoffice/stable/6.1.0/rpm/x86_64/LibreOffice_6.1.0_Linux_x86-64_rpm.tar.gz"
tar zxf LibreOffice_6.1.0_Linux_x86-64_rpm_langpack_ja.tar.gz
tar zxf LibreOffice_6.1.0_Linux_x86-64_rpm_helppack_ja.tar.gz
tar zxf LibreOffice_6.1.0_Linux_x86-64_rpm.tar.gz
rpm -Uvh LibreOffice_6.1.0.3_Linux_x86-64_rpm/RPMS/*.rpm
rpm -Uvh LibreOffice_6.1.0.3_Linux_x86-64_rpm_langpack_ja/RPMS/*.rpm
rpm -Uvh LibreOffice_6.1.0.3_Linux_x86-64_rpm_helppack_ja/RPMS/*.rpm

コンバートの基本

基本のコマンド形式はこちら、nolockcheck~nofirststartwizardまでは、オプションの順番があるので注意する
※指定する順番を変えると認識されないオプションが出るので注意すること

soffice --nolockcheck --nologo --headless --norestore --language=ja --nofirststartwizard --convert-to pdf --outdir "${DSTPATH}" "${SRCPATH}"
  • PHPなど別のインタプリタから呼ぶ場合

LibreOfficeは変換時に実行ユーザのホームディレクトリにテンポラリファイルを生成するため、apacheやnginxのような、非ログインユーザでの実行を行ってしまった場合、ホームディレクトリがないため変換に失敗してしまう。そのため、環境変数HOMEの設定を実行コマンド内に含めて実行するとこで、動作するようになる

 export HOME=/tmp;/opt/libreoffice6.1/program/soffice \
 --nolockcheck --nologo --headless \
 --norestore --language=ja --nofirststartwizard \
 --convert-to pdf \
 --outdir /outputpath \
 /input_path/filename.ext
  • オプションの意味
--nolockcheck : ファイルをロックしているかチェックする
--nologo : スプラッシュスクリーンを表示しない
--headless : ヘッドレスモード(GUIを表示しないモード)
--norestore : 前回の破損ファイルを開かないようにする
--language=ja : デフォルトで利用する言語の指定、langpackが導入されていないとエラーを起こすので注意、この場合はlangpack_ja
--nofirststartwizard : 初回起動ウィザードを起動しないようにする
--convert-to [変換先拡張子]:[フィルタ]:[エンコード] --outdir [出力パス] [変換元フルパス] : 変換ファイルと形式を指定
             、出力ファイル名は残念ながら設定できない、[入力ファイル名].[変換先拡張子]で出力される
  • [変換先拡張子]:[フィルタ]:[エンコード]の組み合わせ例

フィルターは以下にある、.xcuを開き< node oor:name="フィルター名" oor:op="replace">の部分を確認する
https://github.com/LibreOffice/core/tree/master/filter/source/config/fragments/filters

--convert-to pdf ⇒ pdfに単純変換
--convert-to "pdf:writer_pdf_Export" ⇒ LibreOffice Writerを使ってPDFに変換
--convert-to "pdf:calc_pdf_Export" ⇒ LibreOffice Calcを使ってPDFに変換
--convert-to "html:XHTML Writer File:UTF8" ⇒ LibreOffice Writerを使ってhtml(UTF-8)に変換
--convert-to "html:HTML (StarCalc):UTF8" ⇒ LibreOffice Calcを使ってhtml(UTF-8)に変換
--convert-to "epub" ⇒ 電子ブック形式epubに変換
--convert-to "txt:Text (encoded):UTF8 ⇒ txtファイルのエンコードをUTF-8に変換
--convert-to "csv:Text - txt - csv (StarCalc)" ⇒ csvに単純変換

ただ、xlsxをcsvにする場合、一旦xmlかjsonにしてからcsvにしないと改行入りセルがひどいことになるっぽいです

LibreOfficeのコマンドラインヘルプ

LibreOffice 6.1.0.3 efb621ed25068d70781dc026f7e9c5187a4decd1

Usage: soffice [argument...]
       argument - switches, switch parameters and document URIs (filenames).

Using without special arguments:
Opens the start center, if it is used without any arguments.
   {file}              Tries to open the file (files) in the components
                       suitable for them.
   {file} {macro:///Library.Module.MacroName}
                       Opens the file and runs specified macros from
                       the file.

Getting help and information:
   --help | -h | -?    Shows this help and quits.
   --helpwriter        Opens built-in or online Help on Writer.
   --helpcalc          Opens built-in or online Help on Calc.
   --helpdraw          Opens built-in or online Help on Draw.
   --helpimpress       Opens built-in or online Help on Impress.
   --helpbase          Opens built-in or online Help on Base.
   --helpbasic         Opens built-in or online Help on Basic scripting
                       language.
   --helpmath          Opens built-in or online Help on Math.
   --version           Shows the version and quits.
   --nstemporarydirectory
                       (MacOS X sandbox only) Returns path of the temporary
                       directory for the current user and exits. Overrides
                       all other arguments.

General arguments:
   --quickstart[=no]   Activates[Deactivates] the Quickstarter service.
   --nolockcheck       Disables check for remote instances using one
                       installation.
   --infilter={filter} Force an input filter type if possible. For example:
                       --infilter="Calc Office Open XML"
                       --infilter="Text (encoded):UTF8,LF,,,"
   --pidfile={file}    Store soffice.bin pid to {file}.
   --display {display} Sets the DISPLAY environment variable on UNIX-like
                       platforms to the value {display} (only supported by a
                       start script).

User/programmatic interface control:
   --nologo            Disables the splash screen at program start.
   --minimized         Starts minimized. The splash screen is not displayed.
   --nodefault         Starts without displaying anything except the splash
                       screen (do not display initial window).
   --invisible         Starts in invisible mode. Neither the start-up logo nor
                       the initial program window will be visible. Application
                       can be controlled, and documents and dialogs can be
                       controlled and opened via the API. Using the parameter,
                       the process can only be ended using the taskmanager
                       (Windows) or the kill command (UNIX-like systems). It
                       cannot be used in conjunction with --quickstart.
   --headless          Starts in "headless mode" which allows using the
                       application without GUI. This special mode can be used
                       when the application is controlled by external clients
                       via the API.
   --norestore         Disables restart and file recovery after a system crash.
   --safe-mode         Starts in a safe mode, i.e. starts temporarily with a
                       fresh user profile and helps to restore a broken
                       configuration.
   --accept={UNO-URL}  Specifies an UNO-URL connect-string to create an UNO
                       acceptor through which other programs can connect to
                       access the API. UNO-URL is string the such kind
                   uno:connection-type,params;protocol-name,params;ObjectName.
   --unaccept={UNO-URL} Closes an acceptor that was created with --accept. Use
                       --unaccept=all to close all open acceptors.
   --language={lang}   Uses specified language, if language is not selected
                       yet for UI. The lang is a tag of the language in IETF
                       language tag.

Developer arguments:
   --terminate_after_init
                       Exit after initialization complete (no documents loaded).
   --eventtesting      Exit after loading documents.

New document creation arguments:
The arguments create an empty document of specified kind. Only one of them may
be used in one command line. If filenames are specified after an argument,
then it tries to open those files in the specified component.
   --writer            Creates an empty Writer document.
   --calc              Creates an empty Calc document.
   --draw              Creates an empty Draw document.
   --impress           Creates an empty Impress document.
   --base              Creates a new database.
   --global            Creates an empty Writer master (global) document.
   --math              Creates an empty Math document (formula).
   --web               Creates an empty HTML document.

File open arguments:
The arguments define how following filenames are treated. New treatment begins
after the argument and ends at the next argument. The default treatment is to
open documents for editing, and create new documents from document templates.
   -n                  Treats following files as templates for creation of new
                       documents.
   -o                  Opens following files for editing, regardless whether
                       they are templates or not.
   --pt {Printername}  Prints following files to the printer {Printername},
                       after which those files are closed. The splash screen
                       does not appear. If used multiple times, only last
                       {Printername} is effective for all documents of all
                       --pt runs. Also, --printer-name argument of
                       --print-to-file switch interferes with {Printername}.
   -p                  Prints following files to the default printer, after
                       which those files are closed. The splash screen does
                       not appear. If the file name contains spaces, then it
                       must be enclosed in quotation marks.
   --view              Opens following files in viewer mode (read-only).
   --show              Opens and starts the following presentation documents
                       of each immediately. Files are closed after the showing.
                       Files other than Impress documents are opened in
                       default mode , regardless of previous mode.
   --convert-to OutputFileExtension[:OutputFilterName]
     [--outdir output_dir] [--convert-images-to]
                       Batch convert files (implies --headless). If --outdir
                       isn't specified, then current working directory is used
                       as output_dir. If --convert-images-to is given, its
                       parameter is taken as the target MIME format for *all*
                       images written to the output format. If --convert-to is
                       used more than once, the last value of OutputFileExtension
                       [:OutputFilterName] is effective. If --outdir is used more
                       than once, only its last value is effective. For example:
                   --convert-to pdf *.odt
                   --convert-to epub *.doc
                   --convert-to pdf:writer_pdf_Export --outdir /home/user *.doc
                   --convert-to "html:XHTML Writer File:UTF8" *.doc
                   --convert-to "txt:Text (encoded):UTF8" *.doc
   --print-to-file [--printer-name printer_name] [--outdir output_dir]
                       Batch print files to file. If --outdir is not specified,
                       then current working directory is used as output_dir.
                       If --printer-name or --outdir used multiple times, only
                       last value of each is effective. Also, {Printername} of
                       --pt switch interferes with --printer-name.
   --cat               Dump text content of the following files to console
                       (implies --headless). Cannot be used with --convert-to.
   --script-cat        Dump text content of any scripts embedded in the files to console
                       (implies --headless). Cannot be used with --convert-to.
   -env:<VAR>[=<VALUE>] Set a bootstrap variable. For example: to set
                       a non-default user profile path:
                       -env:UserInstallation=file:///tmp/test

Ignored switches:
   -psn                Ignored (MacOS X only).
   -Embedding          Ignored (COM+ related; Windows only).
   --nofirststartwizard Does nothing, accepted only for backward compatibility.
   --protector {arg1} {arg2}
                       Used only in unit tests and should have two arguments.

さいごに

LibreOfficeでデータ変換をしてみたものの、PDFやEPUB以外への変換は微妙に試行錯誤が必要な感じですね。

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away