0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

GrammarlyでLaTeXの原稿の英文校正をしたい(1/2)

Posted at

GrammarlyでLaTeXの原稿の英文校正をしたい.

ステップ1

LaTeXのソースコードをそのままGrammarlyにかける.数式の箇所は99%は無視されているようだ.ラベルの中身は読まれているようだ.多くの箇所で校正にひっかかる.全体の英文校正の完成度の度合いは60%程度の印象.う〜ん.

ステップ2

LaTeXをPDFに変換し,そのPDFをWordで開くと,Wordファイルに変換される.この生成されたWordファイルをGrammarlyで英文校正にかける.ラベルが数字に置き換わるので,ラベルの内容に校正がひっかかることはなくなった.しかし今度は,数式の多くの箇所でが正しく変換されていない.崩れた数式の箇所で,英文校正がひっかかる.全体の英文校正の完成度の度合いは80%程度の印象.PDFからWordファイルに変換するフリーウェアを探すが変換精度が良いものが見つからない.どれも数式が崩れる.困った.

ステップ3

Padocを利用してLaTeXのソースコードをWordファイルに変換する.ほぼ良いような気もするが,数式の扱いにまだ不満.

ステップ4

LaTeXのソースコードから直接に,テキストに変換する方法を考えるてみる.フリーウェアを探すが適切なものが見つからない.しかたがないので,やっつけでいいから,自分でコードを書くことにする.

  • 方針1:\eqref{*}, \label{*}, \ref{*}の箇所は,(数字)と変換すれば,英文校正にひっかからない.数字は嘘でも良い.
  • 方針2:数式のコマンドは,$*$, \begin{align}*\end{align}のみとし,"*"と引用符(")でまとめる.引用符の内部の数式コマンドは,ほぼほぼ英文校正にひっかからないように見える.
  • 方針3:可読性を上げるために\mathbb{*}, \mathcal{*}, \bm{*}, {\bfseries *}をUnicodeの文字に変換する.文字コードは,Unicode Mathematical Alphanumeric Symbols とする.Grammarlyで文字化けはしない.

latex_grammarly.rbの作成

Rubyで下記のコードをUnicodeで latex_grammarly.rb と保存し,

cat filename.tex | ruby latex_grammarly.rb

と実行すると,変換されたテキストが標準出力に出力される.この出力されたテキストをGrammarlyにコピーして英文校正する.全体の英文校正の完成度の度合いは95%程度の印象.ほぼ満足.LaTeXコマンドは必要なものしか実装していない.必要に迫られたら追加すれば良い.

latex_grammarly.rb
#!/usr/bin/ruby

# Serif Bold "𝐚-𝐳𝐀-𝐙𝟎-𝟗"
serif_bold_hex_codes=[ "1d41a", "1d41b", "1d41c", "1d41d", "1d41e", "1d41f", "1d420", "1d421", "1d422", "1d423", "1d424", "1d425", "1d426", "1d427", "1d428", "1d429", "1d42a", "1d42b", "1d42c", "1d42d", "1d42e", "1d42f", "1d430", "1d431", "1d432", "1d433", "1d400", "1d401", "1d402", "1d403", "1d404", "1d405", "1d406", "1d407", "1d408", "1d409", "1d40a", "1d40b", "1d40c", "1d40d", "1d40e", "1d40f", "1d410", "1d411", "1d412", "1d413", "1d414", "1d415", "1d416", "1d417", "1d418", "1d419", "1d7ce", "1d7cf", "1d7d0", "1d7d1", "1d7d2", "1d7d3", "1d7d4", "1d7d5", "1d7d6", "1d7d7" ]
serif_bold_decimal_codes=serif_bold_hex_codes.map { |hex| hex.to_i(16) }
serif_bold_unicode_string=serif_bold_decimal_codes.pack("U*")

# Serif Italic "𝑎-ℎ-𝑧𝐴-𝑍"
serif_italic_hex_codes=[ "1d44e", "1d44f", "1d450", "1d451", "1d452", "1d453", "1d454", "210e", "1d456", "1d457", "1d458", "1d459", "1d45a", "1d45b", "1d45c", "1d45d", "1d45e", "1d45f", "1d460", "1d461", "1d462", "1d463", "1d464", "1d465", "1d466", "1d467", "1d434", "1d435", "1d436", "1d437", "1d438", "1d439", "1d43a", "1d43b", "1d43c", "1d43d", "1d43e", "1d43f", "1d440", "1d441", "1d442", "1d443", "1d444", "1d445", "1d446", "1d447", "1d448", "1d449", "1d44a", "1d44b", "1d44c", "1d44d" ]
serif_italic_decimal_codes=serif_italic_hex_codes.map { |hex| hex.to_i(16) }
serif_italic_unicode_string=serif_italic_decimal_codes.pack("U*")

# Serif Bold Italic "𝒂-𝒛𝑨-𝒁"
serif_bolditalic_hex_codes=[ "1d482", "1d483", "1d484", "1d485", "1d486", "1d487", "1d488", "1d489", "1d48a", "1d48b", "1d48c", "1d48d", "1d48e", "1d48f", "1d490", "1d491", "1d492", "1d493", "1d494", "1d495", "1d496", "1d497", "1d498", "1d499", "1d49a", "1d49b", "1d468", "1d469", "1d46a", "1d46b", "1d46c", "1d46d", "1d46e", "1d46f", "1d470", "1d471", "1d472", "1d473", "1d474", "1d475", "1d476", "1d477", "1d478", "1d479", "1d47a", "1d47b", "1d47c", "1d47d", "1d47e", "1d47f", "1d480", "1d481" ]
serif_bolditalic_decimal_codes=serif_bolditalic_hex_codes.map { |hex| hex.to_i(16) }
serif_bolditalic_unicode_string=serif_bolditalic_decimal_codes.pack("U*")

# Script Normal "𝒶𝒷𝒸𝒹ℯ𝒻ℊ𝒽𝒾𝒿𝓀𝓁𝓂𝓃ℴ𝓅𝓆𝓇𝓈𝓉𝓊𝓋𝓌𝓍𝓎𝓏𝒜ℬ𝒞𝒟ℰℱ𝒢ℋℐ𝒥𝒦ℒℳ𝒩𝒪𝒫𝒬ℛ𝒮𝒯𝒰𝒱𝒲𝒳𝒴𝒵"
script_normal_hex_codes=[ "1d4b6", "1d4b7", "1d4b8", "1d4b9", "212f", "1d4bb", "210a", "1d4bd", "1d4be", "1d4bf", "1d4c0", "1d4c1", "1d4c2", "1d4c3", "2134", "1d4c5", "1d4c6", "1d4c7", "1d4c8", "1d4c9", "1d4ca", "1d4cb", "1d4cc", "1d4cd", "1d4ce", "1d4cf", "1d49c", "212c", "1d49e", "1d49f", "2130", "2131", "1d4a2", "210b", "2110", "1d4a5", "1d4a6", "2112", "2133", "1d4a9", "1d4aa", "1d4ab", "1d4ac", "211b", "1d4ae", "1d4af", "1d4b0", "1d4b1", "1d4b2", "1d4b3", "1d4b4", "1d4b5" ]
script_normal_decimal_codes=script_normal_hex_codes.map { |hex| hex.to_i(16) }
script_normal_unicode_string=script_normal_decimal_codes.pack("U*")

# Double-Struck Bold "𝕒𝕓𝕔𝕕𝕖𝕗𝕘𝕙𝕚𝕛𝕜𝕝𝕞𝕟𝕠𝕡𝕢𝕣𝕤𝕥𝕦𝕧𝕨𝕩𝕪𝕫𝔸𝔹ℂ𝔻𝔼𝔽𝔾ℍ𝕀𝕁𝕂𝕃𝕄ℕ𝕆ℙℚℝ𝕊𝕋𝕌𝕍𝕎𝕏𝕐ℤ𝟘-𝟡"
dstruck_bold_hex_codes=["1d552", "1d553", "1d554", "1d555", "1d556", "1d557", "1d558", "1d559",  "1d55a", "1d55b", "1d55c", "1d55d", "1d55e", "1d55f", "1d560", "1d561", "1d562", "1d563", "1d564", "1d565", "1d566", "1d567", "1d568", "1d569",  "1d56a", "1d56b", "1d538", "1d539", "2102", "1d53b", "1d53c", "1d53d", "1d53e", "210d", "1d540", "1d541", "1d542", "1d543", "1d544", "2115", "1d546", "2119", "211a", "211d", "1d54a", "1d54b", "1d54c", "1d54d", "1d54e", "1d54f", "1d550", "2124", "1d7d8", "1d7d9", "1d7da", "1d7db", "1d7dc", "1d7dd", "1d7de", "1d7df", "1d7e0", "1d7e1" ]
dstruck_bold_decimal_codes=dstruck_bold_hex_codes.map { |hex| hex.to_i(16) }
dstruck_bold_unicode_string=dstruck_bold_decimal_codes.pack("U*")

# 英字をUnicodeに変換
def convert_unicode(str, unicode_string)
  converted_str=str.chars.map do |char|
    index=-1
    if char.ord>='a'.ord && char.ord<='z'.ord
      index=char.ord-'a'.ord
    elsif char.ord>='A'.ord && char.ord<='Z'.ord
      index=char.ord-'A'.ord+26
    elsif char.ord>='0'.ord && char.ord<='9'.ord
      index=char.ord-'0'.ord+26*2
    end
    if index>=0 && index<unicode_string.length
      unicode_string[index]
    else
      char
    end
  end.join
  converted_str
end

# LaTeXコマンドの切り分け
def parse_latex(str)
  ret=Array.new(3)
  ret[0]=""
  ret[1]=Array.new
  ret[2]=""
  if match=/^(.*?)(\\[a-zA-Z]+)(.*)$/.match(str)
    ret[0]=match[1]
    ret[1].push(match[2])
    str=match[3]
  else
    ret[0]=str
    str=''
  end
  while str.length>0 && match=/^(\s*)(\{.*)$/.match(str)
    str=match[2]
    c=0
    n=str.length
    i=0
    for i in 0..(n-1) do
      if str[i]=='{'
        c=c+1
      elsif str[i]=='}'
        c=c-1
      end
      if c==0
        break
      end
    end
    ret[1].push(str[0..i])
    if match=/^(#{Regexp.escape(str[0..i])})(.*)$/.match(str)
      str=match[2]
    end
  end
  ret[2]=str
  ret
end

# 不要コマンドのある行を1行すべて削除
purgeline=[
  '\documentclass', '\usepackage', '\begin{document}', '\end{document}',
  '\begin{thebibliography}', '\end{thebibliography}',
  '\begin{center}', '\end{center}',
  '\begin{figure}', '\end{figure}',
  '\begin{table}', '\end{table}',
  '\begin{enumerate}', '\end{enumerate}',
  '\begin{itemize}', '\end{itemize}',
  '\end{abstract}', '\end{theorem}', '\end{algorithm}',
  '\includegraphics', '\setlength',
  '\newcommand', '\renewcommand*', '\renewcommand',
  '\newtheorem',
  '\date'
]

# コマンドを置換
cmd={
  '\begin{pmatrix}'=>'[', '\end{pmatrix}'=>']',
  "\\'e"=>'é', "\\'a"=>'á', '\`e'=>'è', '\`a'=>'à', '\"e'=>'ë', '\"a'=>'ä',
  '\@'=>'', '\,'=>' ', '---'=>'ー', '--'=>'-', '~'=>' ',
  '\{'=>'{', '\}'=>'}'
}

# '\'+英字のコマンドを置換
cmdalphabet={'\alpha'=>'α', '\beta'=>'β', '\gamma'=>'γ', '\delta'=>'δ', '\epsilon'=>'ϵ', '\zeta'=>'ζ', '\eta'=>'η', '\theta'=>'θ', '\iota'=>'ι', '\kappa'=>'κ', '\lambda'=>'λ', '\mu'=>'μ', '\nu'=>'ν', '\xi'=>'ξ', '\pi'=>'π', '\rho'=>'ρ', '\sigma'=>'σ', '\tau'=>'τ', '\upsilon'=>'υ', '\phi'=>'ϕ', '\chi'=>'χ', '\psi'=>'ψ', '\omega'=>'ω', '\varepsilon'=>'ε', '\vartheta'=>'ϑ', '\varrho'=>'ϱ', '\varsigma'=>'ς', '\varphi'=>'φ', '\Gamma'=>'Γ', '\Delta'=>'Δ', '\Theta'=>'Θ', '\Lambda'=>'Λ', '\Xi'=>'Ξ', '\Pi'=>'Π', '\Sigma'=>'Σ', '\Upsilon'=>'Υ', '\Phi'=>'Φ', '\Psi'=>'Ψ', '\Omega'=>'Ω', '\hbar'=>'ℏ', '\partial'=>'∂', '\infty'=>'∞', '\in'=>'∊', '\ni'=>'∍', '\simeq'=>'≃', '\pm'=>'±', '\times'=>'×', '\sum'=>'Σ', '\prod'=>'Π', '\int'=>'∫', '\sqrt'=>'√', '\top'=>'T', '\mapsto'=>'↦', '\to'=>'→', '\rightarrow'=>'→', '\Rightarrow'=>'⇒', '\leftarrow'=>'←', '\Leftarrow'=>'⇐', '\left'=>'', '\right'=>'', '\le'=>'≤', '\ge'=>'≥', '\quad'=>' ', '\dots'=>'…', '\ldots'=>'…', '\cdot'=>'・', '\cdots'=>'…', '\vdots'=>'⋮', '\ddots'=>'⋱', '\maketitle'=>'', '\par'=>'', '\centering'=>'', '\upshape'=>'', '\item'=>"\n"+'・' }

# \cmd{*}をUnicode文字の*に置換
fonts1={
  '\bm'=>serif_bolditalic_unicode_string,
  '\mathbb'=>dstruck_bold_unicode_string,
  '\mathcal'=>script_normal_unicode_string
}

# {\cmd *}をUnicode文字の*に置換
fonts2={
  '\bfseries'=>serif_bold_unicode_string
}

# \cmd{*}を*に置換
plainitems=[ '\text' ]

# $*$を"*"に置換
dquoteitems={ '$'=>'$', '$$'=>'$$', '\('=>'\)' }

# \cmd{*}を[*]に置換
sblacketitems=[ '\bibitem' ]

# コマンドのある行を1行を太字に変換
boldline={
  '\begin{abstract}'=>'Abstract', '\begin{theorem}'=>'Theorem', '\begin{algorithm}'=>'Algorithm'
}

# コマンドを太字コマンド:引数1 (引数2) ... に置換
bolditems={
  '\acknowledgments'=>'Acknowledgments',
  '\references'=>'References',
  '\email'=>'E-mail',
  '\group'=>'Activity Group',
  '\affiliation'=>'Affiliation',
  '\author'=>'Author',
  '\authorinfo'=>'Author',
  '\abstract'=>'Abstract',
  '\keywords'=>'Keywords',
  '\title'=>'Title',
  '\chapter'=>'Chapter',
  '\section'=>'Section',
  '\subsection'=>'Subsection',
  '\subsubsection'=>'\subsubsection',
  '\caption'=>'Caption',
}

# 標準入力から読み込み,各行毎に処理をする
eqnumber=0
labels=Hash.new
lines=STDIN.readlines
n=lines.length
mode=0
for i in 0..(n-1) do
  # 各行の改行コードは削除
  lines[i].chomp!
  # 不要コマンドのある行を1行すべて削除
  purgeline.each do |a|
    if /^#{Regexp.escape(a)}.*$/.match(lines[i])
      lines[i]=''
    end
  end
  
  
  # コマンドのある行を太字に変換
  boldline.keys.each do |a|
    if /^#{Regexp.escape(a)}.*$/.match(lines[i])
      lines[i]='['+convert_unicode(boldline[a], serif_bold_unicode_string)+']'+"\n"
    end
  end
 
  # 各行で%で以下は削除
  lines[i].gsub!(/%.*$/,'')
  # ~\\[*]を削除
  lines[i].gsub!(/~\\\\\[.+\]/,'')
  # ~\\を削除
  lines[i].gsub!(/~\\\\/,'')
  # \\[*]を削除
  lines[i].gsub!(/\\\\\[.+\]/,'')
  # \\を;に置換
  lines[i].gsub!(/\\\\/,';')
  # '\'+英字のコマンドを置換
  cmdalphabet.keys.each do |key|
    lines[i].gsub!(/#{Regexp.escape(key)} /,cmdalphabet[key])
    lines[i].gsub!(/#{Regexp.escape(key)}(?![a-zA-Z])/,cmdalphabet[key])
  end
  # コマンドを置換
  cmd.keys.each do |key|
    lines[i].gsub!(/#{Regexp.escape(key)}/,cmd[key])
  end
  # \eqref{*}を(番号)に置換
  while match=/\\eqref\{([^\}]*)\}/.match(lines[i])
    if !labels.key?(match[1])
      eqnumber=eqnumber+1
      labels[match[1]]=eqnumber
    end
    lines[i].sub!(match[0],"(#{labels[match[1]]})")
  end
  # \ref{*}を(番号)に置換
  while match=/\\ref\{([^\}]*)\}/.match(lines[i])
    if !labels.key?(match[1])
      eqnumber=eqnumber+1
      labels[match[1]]=eqnumber
    end
    lines[i].sub!(match[0],"#{labels[match[1]]}")
  end
  # \label{*}を…(番号)に置換
  while match=/\\label\{([^\}]*)\}/.match(lines[i])
    if !labels.key?(match[1])
      eqnumber=eqnumber+1
      labels[match[1]]=eqnumber
    end
#    lines[i].sub!(match[0],"………(#{labels[match[1]]})")
    lines[i].sub!(match[0],'')
  end
  # \cite{*}を[*]に置換.\cite[※]{*}を[* (※)]に置換
  lines[i].gsub!(/\\cite\{([^\}]*)\}/,'[\1]')
  lines[i].gsub!(/\\cite\[([^\]]*)\]\{([^\}]*)\}/,'[\2 (\1)]')
  # \cmd{*}を*に置換
  plainitems.each do |cmd|
    lines[i].gsub!(/#{Regexp.escape(cmd)}\{([^\}]*)\}/,'\1')
  end
  # \cmd{*}を[*]に置換
  sblacketitems.each do |cmd|
    lines[i].gsub!(/#{Regexp.escape(cmd)}\{([^\}]*)\}/,'[\1]')
  end
  # $*$を"*"に置換
  dquoteitems.keys.each do |cmd|
    lines[i].gsub!(/#{Regexp.escape(cmd)}([^\$]*)#{Regexp.escape(dquoteitems[cmd])}/,'"\1"')
  end
  # \cmd{*}をUnicode文字の*に置換
  fonts1.keys.each do |cmd|
    while match=lines[i].match(/#{Regexp.escape(cmd)}\{([^\{\}]*)\}/)
      a=convert_unicode(match[1], fonts1[cmd])
      lines[i].sub!(match[0],a)
    end
  end
  # {\cmd *}をUnicode文字の*に置換
  fonts2.keys.each do |cmd|
    while match=lines[i].match(/\{#{Regexp.escape(cmd)}(\s*)([^\{\}]*)\}/)
      a=convert_unicode(match[2], fonts2[cmd])
      lines[i].sub!(match[0],a)
    end
  end
  # \cmd{*}{*}...を探す
  str=lines[i]
  lines[i]=''
  while str.length>0
    # コマンドで分割
    a=parse_latex(str)
    # コマンドより前部分を出力
    lines[i]=lines[i]+a[0]
    # コマンドを太字コマンド:引数1 (引数2) ... に置換
    if a[1].length>0 && bolditems.keys.include?(a[1][0])
      b=convert_unicode(bolditems[a[1][0]], serif_bold_unicode_string)+':'
      if a[1].length>=2
        b=b+' '+a[1][1][1..-2]
      end
      if a[1].length>=3
        for j in 2..(a[1].length-1)
          b=b+' ('+a[1][j][1..-2]+')'
        end
      end
      str=b+a[2]
    # \frac{*}{※}を(*)/(※)に置換
    elsif a[1].length>0 && (a[1][0]=='\frac' || a[1][0]=='\dfrac')
      b='('+a[1][1..-1].map do |c|
        c=c[1..-2]
        if /^([0-9a-zA-Z]+)$/.match(c) || c.length==1
          c
        else
          '('+c+')'
        end
      end.join('/')+')'
      str=b+a[2]
    else
      # 未対応のコマンドの場合はそのまま出力
      lines[i]=lines[i]+a[1].join
      str=a[2]
    end
  end
  # \being{align}***\end{align}の置換
  if mode==0 && /^\\begin\{align.*$/=~lines[i]
      mode=1
      lines[i].gsub!(/^\\begin\{align.*$/,'[BEGIN]')
  end
  if mode==1 && /^\\end\{align.*$/=~lines[i]
      mode=0
      lines[i].gsub!(/^\\end\{align.*$/,'[END]')
  elsif mode==1 && (lines[i]=='' || /^\s*$/=~lines[i])
    lines[i]='[DEL]'
  elsif mode==1 && lines[i]!='[DEL]' && lines[i]!='[BEGIN]' && lines[i]!='[END]'
    lines[i].gsub!('&','')
    lines[i]='"'+lines[i]+'"'
  end
#  puts(lines[i])
end

# 不要な行を削除
lines.delete("[DEL]")
lines.delete("[BEGIN]")
lines.delete("[END]")

# 段落毎に1行に連結
output=Array.new
n=lines.length
mode=0
for i in 0..(n-1) do
  if mode==0 && lines[i]==""
    #output.append(lines[i])
    mode=0
  elsif mode==0 && lines[i]!=""
    output.append(lines[i])
    mode=1
  elsif mode==1 && lines[i]==""
    output.append(lines[i])
    mode=0
  elsif mode==1 && lines[i]!=""
    if output[-1][-1]=='"' && lines[i][0]=='"'
      output[-1]=output[-1].chop+" "+lines[i][1..-1]
    else
      output[-1]=output[-1]+" "+lines[i]
    end
    mode=1
  else
    #output.append(lines[i])
    #output[-1]=output[-1]+" "+lines[i]
  end
end

# 出力
m=output.length
for i in 0..(m-1) do
  puts(output[i])
end

#EOF
0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?