LoginSignup
1
1

More than 5 years have passed since last update.

Embulk(エンバルク) 0.5.0で導入されたmax_quoted_size_limitについての覚書

Posted at

CSVパーサーのmax_quoted_size_limitについて

Embulk 0.5.0でCSVパーサーに導入されたmax_quoted_size_lmitdについて

このオプションはある行のカラム内で、クォート文字がありその中でデリミタ文字(,)等が出現した場合に
クォートの閉じ忘れかどうかを何バイトまで先読みしてチェックをするかを指定するパラメータです。

設定ファイル

max_quoted_size_limitを6に設定

in:
  type: file
  path_prefix: /path/to/test
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    escape: ''
    header_line: false
    columns:
    - {name: c0, type: string}
    - {name: c1, type: string}
    - {name: c2, type: string}
    - {name: c3, type: string}
    - {name: c4, type: string}
    max_quoted_size_limit: 6
exec: {}
out: {type: stdout}

検証データ

1,b,c,d,e
2,",123456",b,c,error line
3,",12345",b,c,safe line
  • 1行目のデータは正常なデータ(クォート無し)
  • 2行目のデータは、クォートは,の後6バイトデータが続き7バイト目にとじクォート
    • エラーになることを期待
    • 検証しやすいのでこのようにした。
  • 3行目は、6文字目に閉じクォート

実行

2行目でエラーが検出されています。

% embulk preview config.yml
2015-03-05 09:47:07.679 +0900: Embulk v0.5.0
2015-03-05 09:47:09.060 +0900 [INFO] (preview): Listing local files at directory '/path/to' filtering filename by prefix 'test'
2015-03-05 09:47:09.069 +0900 [INFO] (preview): Loading files [/path/to/test.csv]
2015-03-05 09:47:09.203 +0900 [WARN] (preview): Skipped (line 2): 2,",123456",b,c,error line
org.embulk.standards.CsvTokenizer$QuotedSizeLimitExceededException: The size of the quoted value exceeds the limit size (6)
    at org.embulk.standards.CsvTokenizer.nextColumn(CsvTokenizer.java:278)
    at org.embulk.standards.CsvParserPlugin.nextColumn(CsvParserPlugin.java:216)
    at org.embulk.standards.CsvParserPlugin.access$000(CsvParserPlugin.java:30)
    at org.embulk.standards.CsvParserPlugin$1.stringColumn(CsvParserPlugin.java:175)
    at org.embulk.spi.Column.visit(Column.java:57)
    at org.embulk.spi.Schema.visitColumns(Schema.java:48)
    at org.embulk.standards.CsvParserPlugin.run(CsvParserPlugin.java:132)
    at org.embulk.spi.FileInputRunner.run(FileInputRunner.java:145)
    at org.embulk.exec.PreviewExecutor$2$1.run(PreviewExecutor.java:106)
    at org.embulk.spi.util.Filters$RecursiveControl.transaction(Filters.java:83)
    at org.embulk.spi.util.Filters.transaction(Filters.java:36)
    at org.embulk.exec.PreviewExecutor$2.run(PreviewExecutor.java:96)
    at org.embulk.spi.FileInputRunner$RunnerControl$1$1.run(FileInputRunner.java:117)
    at org.embulk.standards.CsvParserPlugin.transaction(CsvParserPlugin.java:89)
    at org.embulk.spi.FileInputRunner$RunnerControl$1.run(FileInputRunner.java:111)
    at org.embulk.spi.util.Decoders$RecursiveControl.transaction(Decoders.java:77)
    at org.embulk.spi.util.Decoders.transaction(Decoders.java:33)
    at org.embulk.spi.FileInputRunner$RunnerControl.run(FileInputRunner.java:108)
    at org.embulk.standards.LocalFileInputPlugin.resume(LocalFileInputPlugin.java:80)
    at org.embulk.standards.LocalFileInputPlugin.transaction(LocalFileInputPlugin.java:70)
    at org.embulk.spi.FileInputRunner.transaction(FileInputRunner.java:63)
    at org.embulk.exec.PreviewExecutor.doPreview(PreviewExecutor.java:93)
    at org.embulk.exec.PreviewExecutor.access$000(PreviewExecutor.java:27)
    at org.embulk.exec.PreviewExecutor$1.run(PreviewExecutor.java:67)
    at org.embulk.exec.PreviewExecutor$1.run(PreviewExecutor.java:63)
    at org.embulk.spi.Exec.doWith(Exec.java:21)
    at org.embulk.exec.PreviewExecutor.preview(PreviewExecutor.java:63)
    at org.embulk.command.Runner.preview(Runner.java:240)
    at org.embulk.command.Runner.main(Runner.java:100)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:470)
    at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:328)
    at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:71)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:346)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:204)
    at org.jruby.ast.CallTwoArgNode.interpret(CallTwoArgNode.java:59)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.RescueNode.executeBody(RescueNode.java:221)
    at org.jruby.ast.RescueNode.interpret(RescueNode.java:116)
    at org.jruby.ast.BeginNode.interpret(BeginNode.java:83)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.ast.CaseNode.interpret(CaseNode.java:138)
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
    at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
    at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:203)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
    at classpath_3a_embulk.command.embulk.__file__(classpath:embulk/command/embulk.rb:43)
    at classpath_3a_embulk.command.embulk.load(classpath:embulk/command/embulk.rb)
    at org.jruby.Ruby.runScript(Ruby.java:866)
    at org.jruby.Ruby.runScript(Ruby.java:859)
    at org.jruby.Ruby.runNormally(Ruby.java:728)
    at org.jruby.Ruby.runFromMain(Ruby.java:577)
    at org.jruby.Main.doRunFromMain(Main.java:395)
    at org.jruby.Main.internalRun(Main.java:290)
    at org.jruby.Main.run(Main.java:217)
    at org.jruby.Main.main(Main.java:197)
    at org.embulk.cli.Main.main(Main.java:13)
+-----------+-----------+-----------+-----------+-----------+
| c0:string | c1:string | c2:string | c3:string | c4:string |
+-----------+-----------+-----------+-----------+-----------+
|         1 |         b |         c |         d |         e |
|         3 |    ,12345 |         b |         c | safe line |
+-----------+-----------+-----------+-----------+-----------+
1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1