はじめに
オフィスファイルの変換や各種ドキュメント形式の変換をWindows以外で行おうとするといろいろと手間がかかることが多いです。
サービスの中に組み込む場合などは、各種Excel形式を扱えるようにするライブラリを入れたりする必要があったりして、非常に手間がかかることが多いとおもいます。そこで、できるだけ簡単にファイルを変換する仕組みが作れないかということで、試行錯誤したのがこちらです。
環境とツール
- OS : CentOS 7.5
- Tool : LibreOffice 6.1.0.3
セットアップ
基本的にGUIは使いませんので、X-Windowやウインドウマネージャは不要です。
cd /opt
curl -L -O "http://download.documentfoundation.org/libreoffice/stable/6.1.0/rpm/x86_64/LibreOffice_6.1.0_Linux_x86-64_rpm_langpack_ja.tar.gz"
curl -L -O "http://download.documentfoundation.org/libreoffice/stable/6.1.0/rpm/x86_64/LibreOffice_6.1.0_Linux_x86-64_rpm_helppack_ja.tar.gz"
curl -L -O "http://download.documentfoundation.org/libreoffice/stable/6.1.0/rpm/x86_64/LibreOffice_6.1.0_Linux_x86-64_rpm.tar.gz"
tar zxf LibreOffice_6.1.0_Linux_x86-64_rpm_langpack_ja.tar.gz
tar zxf LibreOffice_6.1.0_Linux_x86-64_rpm_helppack_ja.tar.gz
tar zxf LibreOffice_6.1.0_Linux_x86-64_rpm.tar.gz
rpm -Uvh LibreOffice_6.1.0.3_Linux_x86-64_rpm/RPMS/*.rpm
rpm -Uvh LibreOffice_6.1.0.3_Linux_x86-64_rpm_langpack_ja/RPMS/*.rpm
rpm -Uvh LibreOffice_6.1.0.3_Linux_x86-64_rpm_helppack_ja/RPMS/*.rpm
コンバートの基本
基本のコマンド形式はこちら、nolockcheck~nofirststartwizardまでは、オプションの順番があるので注意する
※指定する順番を変えると認識されないオプションが出るので注意すること
soffice --nolockcheck --nologo --headless --norestore --language=ja --nofirststartwizard --convert-to pdf --outdir "${DSTPATH}" "${SRCPATH}"
- PHPなど別のインタプリタから呼ぶ場合
LibreOfficeは変換時に実行ユーザのホームディレクトリにテンポラリファイルを生成するため、apacheやnginxのような、非ログインユーザでの実行を行ってしまった場合、ホームディレクトリがないため変換に失敗してしまう。そのため、環境変数HOMEの設定を実行コマンド内に含めて実行するとこで、動作するようになる
export HOME=/tmp;/opt/libreoffice6.1/program/soffice \
--nolockcheck --nologo --headless \
--norestore --language=ja --nofirststartwizard \
--convert-to pdf \
--outdir /outputpath \
/input_path/filename.ext
- オプションの意味
--nolockcheck : ファイルをロックしているかチェックする
--nologo : スプラッシュスクリーンを表示しない
--headless : ヘッドレスモード(GUIを表示しないモード)
--norestore : 前回の破損ファイルを開かないようにする
--language=ja : デフォルトで利用する言語の指定、langpackが導入されていないとエラーを起こすので注意、この場合はlangpack_ja
--nofirststartwizard : 初回起動ウィザードを起動しないようにする
--convert-to [変換先拡張子]:[フィルタ]:[エンコード] --outdir [出力パス] [変換元フルパス] : 変換ファイルと形式を指定
、出力ファイル名は残念ながら設定できない、[入力ファイル名].[変換先拡張子]で出力される
フィルターは以下にある、.xcuを開き< node oor:name="フィルター名" oor:op="replace">の部分を確認する
https://github.com/LibreOffice/core/tree/master/filter/source/config/fragments/filters
--convert-to pdf ⇒ pdfに単純変換
--convert-to "pdf:writer_pdf_Export" ⇒ LibreOffice Writerを使ってPDFに変換
--convert-to "pdf:calc_pdf_Export" ⇒ LibreOffice Calcを使ってPDFに変換
--convert-to "html:XHTML Writer File:UTF8" ⇒ LibreOffice Writerを使ってhtml(UTF-8)に変換
--convert-to "html:HTML (StarCalc):UTF8" ⇒ LibreOffice Calcを使ってhtml(UTF-8)に変換
--convert-to "epub" ⇒ 電子ブック形式epubに変換
--convert-to "txt:Text (encoded):UTF8 ⇒ txtファイルのエンコードをUTF-8に変換
--convert-to "csv:Text - txt - csv (StarCalc)" ⇒ csvに単純変換
ただ、xlsxをcsvにする場合、一旦xmlかjsonにしてからcsvにしないと改行入りセルがひどいことになるっぽいです
LibreOfficeのコマンドラインヘルプ
LibreOffice 6.1.0.3 efb621ed25068d70781dc026f7e9c5187a4decd1
Usage: soffice [argument...]
argument - switches, switch parameters and document URIs (filenames).
Using without special arguments:
Opens the start center, if it is used without any arguments.
{file} Tries to open the file (files) in the components
suitable for them.
{file} {macro:///Library.Module.MacroName}
Opens the file and runs specified macros from
the file.
Getting help and information:
--help | -h | -? Shows this help and quits.
--helpwriter Opens built-in or online Help on Writer.
--helpcalc Opens built-in or online Help on Calc.
--helpdraw Opens built-in or online Help on Draw.
--helpimpress Opens built-in or online Help on Impress.
--helpbase Opens built-in or online Help on Base.
--helpbasic Opens built-in or online Help on Basic scripting
language.
--helpmath Opens built-in or online Help on Math.
--version Shows the version and quits.
--nstemporarydirectory
(MacOS X sandbox only) Returns path of the temporary
directory for the current user and exits. Overrides
all other arguments.
General arguments:
--quickstart[=no] Activates[Deactivates] the Quickstarter service.
--nolockcheck Disables check for remote instances using one
installation.
--infilter={filter} Force an input filter type if possible. For example:
--infilter="Calc Office Open XML"
--infilter="Text (encoded):UTF8,LF,,,"
--pidfile={file} Store soffice.bin pid to {file}.
--display {display} Sets the DISPLAY environment variable on UNIX-like
platforms to the value {display} (only supported by a
start script).
User/programmatic interface control:
--nologo Disables the splash screen at program start.
--minimized Starts minimized. The splash screen is not displayed.
--nodefault Starts without displaying anything except the splash
screen (do not display initial window).
--invisible Starts in invisible mode. Neither the start-up logo nor
the initial program window will be visible. Application
can be controlled, and documents and dialogs can be
controlled and opened via the API. Using the parameter,
the process can only be ended using the taskmanager
(Windows) or the kill command (UNIX-like systems). It
cannot be used in conjunction with --quickstart.
--headless Starts in "headless mode" which allows using the
application without GUI. This special mode can be used
when the application is controlled by external clients
via the API.
--norestore Disables restart and file recovery after a system crash.
--safe-mode Starts in a safe mode, i.e. starts temporarily with a
fresh user profile and helps to restore a broken
configuration.
--accept={UNO-URL} Specifies an UNO-URL connect-string to create an UNO
acceptor through which other programs can connect to
access the API. UNO-URL is string the such kind
uno:connection-type,params;protocol-name,params;ObjectName.
--unaccept={UNO-URL} Closes an acceptor that was created with --accept. Use
--unaccept=all to close all open acceptors.
--language={lang} Uses specified language, if language is not selected
yet for UI. The lang is a tag of the language in IETF
language tag.
Developer arguments:
--terminate_after_init
Exit after initialization complete (no documents loaded).
--eventtesting Exit after loading documents.
New document creation arguments:
The arguments create an empty document of specified kind. Only one of them may
be used in one command line. If filenames are specified after an argument,
then it tries to open those files in the specified component.
--writer Creates an empty Writer document.
--calc Creates an empty Calc document.
--draw Creates an empty Draw document.
--impress Creates an empty Impress document.
--base Creates a new database.
--global Creates an empty Writer master (global) document.
--math Creates an empty Math document (formula).
--web Creates an empty HTML document.
File open arguments:
The arguments define how following filenames are treated. New treatment begins
after the argument and ends at the next argument. The default treatment is to
open documents for editing, and create new documents from document templates.
-n Treats following files as templates for creation of new
documents.
-o Opens following files for editing, regardless whether
they are templates or not.
--pt {Printername} Prints following files to the printer {Printername},
after which those files are closed. The splash screen
does not appear. If used multiple times, only last
{Printername} is effective for all documents of all
--pt runs. Also, --printer-name argument of
--print-to-file switch interferes with {Printername}.
-p Prints following files to the default printer, after
which those files are closed. The splash screen does
not appear. If the file name contains spaces, then it
must be enclosed in quotation marks.
--view Opens following files in viewer mode (read-only).
--show Opens and starts the following presentation documents
of each immediately. Files are closed after the showing.
Files other than Impress documents are opened in
default mode , regardless of previous mode.
--convert-to OutputFileExtension[:OutputFilterName]
[--outdir output_dir] [--convert-images-to]
Batch convert files (implies --headless). If --outdir
isn't specified, then current working directory is used
as output_dir. If --convert-images-to is given, its
parameter is taken as the target MIME format for *all*
images written to the output format. If --convert-to is
used more than once, the last value of OutputFileExtension
[:OutputFilterName] is effective. If --outdir is used more
than once, only its last value is effective. For example:
--convert-to pdf *.odt
--convert-to epub *.doc
--convert-to pdf:writer_pdf_Export --outdir /home/user *.doc
--convert-to "html:XHTML Writer File:UTF8" *.doc
--convert-to "txt:Text (encoded):UTF8" *.doc
--print-to-file [--printer-name printer_name] [--outdir output_dir]
Batch print files to file. If --outdir is not specified,
then current working directory is used as output_dir.
If --printer-name or --outdir used multiple times, only
last value of each is effective. Also, {Printername} of
--pt switch interferes with --printer-name.
--cat Dump text content of the following files to console
(implies --headless). Cannot be used with --convert-to.
--script-cat Dump text content of any scripts embedded in the files to console
(implies --headless). Cannot be used with --convert-to.
-env:<VAR>[=<VALUE>] Set a bootstrap variable. For example: to set
a non-default user profile path:
-env:UserInstallation=file:///tmp/test
Ignored switches:
-psn Ignored (MacOS X only).
-Embedding Ignored (COM+ related; Windows only).
--nofirststartwizard Does nothing, accepted only for backward compatibility.
--protector {arg1} {arg2}
Used only in unit tests and should have two arguments.
さいごに
LibreOfficeでデータ変換をしてみたものの、PDFやEPUB以外への変換は微妙に試行錯誤が必要な感じですね。