ChatGPTを使って構造化されたフォームのスクレイピングを行う。
ケース1: フォーム
回答
This code creates an HTML table with several rows and columns. The table has a border of 2px solid #808080 color. There are several rows with cells that span across multiple columns or rows using the rowspan and colspan attributes. The table contains several cells with text and empty cells with no content.
The table element is used to create a table in HTML. The tr element is used to create a row in the table and the td element is used to create a cell in the table. The th element is used to create a table header cell. The style attribute is used to apply CSS styles to an element. In this case, the style attribute is used to set the border of the table to 2px solid #808080 color.
ケース2: 正規化されたテーブルの出力(key-value)
質問
Can the html table above be normalized? All data is presented as a key-value table with key and value columns. All headers and values must be normalized. Nested headers must be concatenated with '_'.
回答
key | value
----------------|----------------
name |
lastname_firstname | John Doe
address | somewhere in this world
email | hogehoge@hogehoge.org
phone | 012-3456-7890
id |
typeA_x | 1
typeA_y | 2
typeA_Z | 4
typeB | 3
評価
惜しい!1文字1セルの箇所は認識できているけれども、ネストされたヘッダーの認識が微妙に間違ってる。なんとか強制的にヘッダーを認識させる方法はないだろうか?
ケース3. 生成されるキーをマニュアル指定
質問
Can this html table above be normalized ? All data is presented as a key-value table which has column name_lastname, name_firstname, address, email, phone, id_typeA_x, id_typeA_y, id_typeB, id_typeA_z.
回答
key | value
------------------------|----------------
name_lastname | John
name_firstname | Doe
address | somewhere in this world
email | hogehoge@hogehoge.org
phone | 012-3456-7890
id_typeA_x | 1
id_typeA_y | 2
id_typeB | 3
id_typeA_z | 4
評価
生成されるカラムをマニュアル指定するとうまくゆく。なんとかこれを自動的にキレイにできないものか?