More than 5 years have passed since last update.

機械学習ツールを掘り下げる by 日経 xTECH ビジネスAI③Advent Calendar 2019

危険すぎるAI「GPT-2」は実際どこまで自然な文を生成できるのか

Last updated at 2020-03-09Posted at 2019-12-25

はじめに

2018年のBERT以降、高性能な自然言語処理モデルが次々に登場しています。その中でも特に話題となったのが、2019年2月にOpen AIが発表したGPT-2（Generative Pre-Training 2）です。

GPT-2は、同じくOpen AIが開発したGPTに対して、文章生成に特化した学習をさせたモデルです。発表当初、あまりに自然な文章を生成できるとして、フェイクニュースなどの危険性から、小規模モデル（それでもパラメータ数は1億2400万）のみが公開され、以後段階的に大きなモデルが公開されています。

このGPT-2ですが、現在は全てのモデルが公開されており、Dockerコンテナまで用意されているため、誰でも簡単に試すことができます。ということで実際にいくつかの文章を入力して、どの程度自然な文章を生成できるのか確認してみます。

参考までに、公開履歴は以下の通りです。

公開履歴

2019年2月 GPT-2発表＆小規模モデル公開

パラメータ数：1億2400万

AIによる自動文章作成ツールがあまりにも高精度のテキストを簡単に作り出してしまうため開発陣から「危険過ぎる」と問題視される - GIGAZINE

2019年5月中規模モデル公開

パラメータ数：3億5500万

2019年8月大規模モデル公開

パラメータ数：7億7400万

GPT-2: 6-Month Follow-Up
「危険すぎる」と言われたAIの自動文章作成ツール「GPT-2」のモデルが新たに公開へ - GIGAZINE

2019年11月最大サイズのモデル公開

パラメータ数：15億5800万

GPT-2: 1.5B Release
悪用の懸念もあるOpenAIの文章生成言語モデル「GPT-2」最終版リリース - ZDNet Japan
「危険すぎる」と話題の文章生成AI「GPT-2」の最新版がリリース、実際に使えるデモサイトも登場 - GIGAZINE
OpenAI、まことしやかなフェイクニュースも簡単生成の言語モデル「GPT-2」の完全版を公開 - ITmedia NEWS

実験

インストール

以下に従って進めていきます。
https://github.com/openai/gpt-2/blob/master/DEVELOPERS.md

cloneしてくる。

$ git clone https://github.com/openai/gpt-2.git && cd gpt-2

docker buildする。GPU版は以下。

$ docker build --tag gpt-2 -f Dockerfile.gpu .

CPU版は以下。

$ docker build --tag gpt-2 -f Dockerfile.cpu .

docker runする。公式の例では--runtime=nvidiaオプションがついていますが、今回はNVIDIA GPUがないため以下のコマンドでGPT-2コンテナを実行します。

$ docker run -it gpt-2 bash

以下はコンテナ内での操作です。

$ export PYTHONIOENCODING=UTF-8

これだけだとエラーが出たため、tensorflowをアップデートしました。

$ pip install -U tensorflow

実行方法

コンテナ内には「0から文章を生成するスクリプト」と、「任意の文章の続きを生成するスクリプト」が用意されています。

前者は以下で実行できます。

$ python3 src/generate_unconditional_samples.py | tee /tmp/samples

後者は以下で実行できます。

$ python3 src/interactive_conditional_samples.py

設定可能なコマンドライン引数は以下で確認できます。

$ python3 src/interactive_conditional_samples.py -- --help

上記コマンドによると、--top_k 40が推奨されるようです。--top_pというオプションもありましたが、説明がなくよくわかりませんでした。

モデルは--model_nameで指定できます。

モデルの指定

$ python3 src/interactive_conditional_samples.py --model_name=124M --top_k=40 #小規模
$ python3 src/interactive_conditional_samples.py --model_name=355M --top_k=40 #中規模
$ python3 src/interactive_conditional_samples.py --model_name=774M --top_k=40 #大規模
$ python3 src/interactive_conditional_samples.py --model_name=1558M --top_k=40 #超大規模

検証

いくつかの文を入力し、その続きを生成してみます。学習済みモデルは当然英語しかないため、日本語例文をGoogle翻訳した結果をGPT-2の入力とし、生成された英文をまたGoogle翻訳にかけ、日本語文を確認します。

※定量的な検証ではありません。結果を見た時の私の気持ちを絵文字として表現したものをもって検証結果とかえさせていただきます。

検証1：川端康成「雪国」の続きを生成できるか？

超有名な冒頭文です。

対象文：
国境の長いトンネルを抜けると雪国であった。
翻訳後：
It was a snowy country after passing through a long border tunnel.

小規模（124M）モデル

結果：😅

出力結果（英語）

`It was a snowy country after passing through a long border tunnel.` "I looked at them," says Brienne. "And to say that you're not allowed to walk home from work just because you were wearing red. That's wrong. People are not going to give you anything."

When she looked at her husband he smiled. "As if that is the normal thing that we take issue with. I don't like that. I love that. I like that we're allowed here and that you can be free to do whatever you want."

She's not just proud of him, of course. She has never let the idea of a family stay with her in her life. "I would never tell my kids, 'What are you gonna do next year?'" Brienne says. "If I'd told them then, 'You're gonna have to take care of your kids.' My kids would say, 'Oh, I didn't tell you to take care of your kids.' And if I had said that at a certain time, I know I would have died as a parent. And I would've done anything for my grandchildren."

Her husband and daughter live in St. Paul with their son, Mike. They take her for walks, they see her on other occasions and are happy to do so. They see her to get along with every other family member they know. She has her fair share of disagreements with them. "I was raised by three different gay parents in Minnesota. I've been told to be more conservative by the people I interact with that I would go to school with. That I would always go to my homie school with my mom and Dad, and I was taught to be my own worst enemy."

They're so happy to work together as a team and a family.

"This year we did an interdisciplinary program together. For me, this year was a little different. I felt like an outsider who felt like I was going to be seen. I'd rather be alone with the people I have the most respect for."

Her husband says that he and her are glad they did something together. "Just having that sense of independence, because we're friends and we're very connected to one another."<|endoftext|>A man was killed at a party in eastern Quebec Friday night after shooting at a woman he was trying to arrest, while another man died on Tuesday after being pulled out of the path by an explosive.

Léon Toussaint, a 20-year-old Montreal man, was shot

出力結果を翻訳した結果（日本語）

`国境の長いトンネルを抜けると雪国であった。` 「私はそれらを見た」とブリエンヌは言う。「そして、あなたが赤い服を着ていたからといって、仕事から家に帰ることは許されていないと言います。それは間違っています。人々はあなたに何も与えません。」

彼女が夫を見たとき、彼は微笑んだ。「あれは私たちが問題にしている通常のことです。私はそれが好きではありません。私はそれが大好きです。

彼女はもちろん彼を誇りに思っているだけではありません。彼女は、家族の考えを彼女の人生にとどめたことはありません。「私は子供たちに、「来年は何をするつもりですか？」とは決して言いません」とブリエンヌは言います。「私が彼らに言ったなら、「あなたはあなたの子供の世話をしなければならないだろう。」私の子供たちは「ああ、私はあなたの子供たちの世話をするようにあなたに言わなかった」と言うでしょう。そして、もしある時期にそれを言っていたら、私は親として死んだことを知っています。そして、孫のために何でもしました。」

彼女の夫と娘は、息子のマイクと一緒にセントポールに住んでいます。彼らは彼女を散歩に連れて行き、他の機会に彼女を見て、喜んでそうします。彼らは、彼女が知っている他のすべての家族と仲良くするために彼女を見る。彼女は彼らとかなり意見の相違がある。「ミネソタ州の3人の同性愛者の両親に育てられました。私は、学校に通う人と交流することでより保守的であると言われました。そして、私は自分自身の最悪の敵であると教えられました。」

彼らはチームと家族として一緒に仕事ができてとても幸せです。

「今年、私たちは学際的なプログラムを一緒にやりました。私にとって、今年は少し違っていました。自分が見られるように感じた部外者のように感じました。」

彼女の夫は、彼と彼女が一緒に何かをしたことがうれしいと言っています。「私たちは友人であり、お互いにとてもつながっているので、ちょうどその独立感を持っています。」火曜日に別の男が爆発物で道から引き出された後に死亡した。

20歳のモントリオール人、レオン・トゥーサンが撃たれた

結果概要

何かそれっぽい文章は生成されています。が、トンネルも雪国も全く出てこなくなってしまいました。というか、最初の一文がない方が自然なまであります。

彼女の夫と娘は、息子のマイクと一緒にセントポールに住んでいます。彼らは彼女を散歩に連れて行き、他の機会に彼女を見て、喜んでそうします。

一応、話題は一貫していそうですが、

火曜日に別の男が爆発物で道から引き出された後に死亡した。

＿人人人人人人＿
＞　突然の死　＜
￣Y^Y^Y^Y^Y^Y^￣

中規模（355M）モデル

結果：😮🤔😭

出力結果（英語）

`It was a snowy country after passing through a long border tunnel.` The wind and snow had stopped flowing so the road was frozen solid by the time it arrived at the exit of the tunnel.

Luo Feng carefully walked into the tunnel and made a quick left.

The light snow on the snowy ground outside of the snow cave wasn't thick at all.

A pair of white haired people in the white snow walking snow covered carriages were very tall, and appeared to be very agile.

At the beginning, Luo Feng didn't know their identity, but after hearing Luo Qian Qian's voice, it became clear.

"We are Yan Guang Wen." Luo Qian Qian replied.

"Mn?" Luo Feng's eyes lit up. This was the woman he first came across when he stumbled upon that black cat outside of the snow cave, yet she could be seen wearing the same clothes? This sort of scene was extremely intriguing.

"Wen Wen." The woman in white was clearly the Yan Guang Wen. Just after arriving at the entrance of his cold white tent, Luo Feng turned around and said to her with a smile…. "You must know me…. Wen, what are you doing here right now?"

"Wang Guang Wen." The woman's eyes widened as she stared at him, "Mn? You didn't come over to pick me up? Are you from the same family as Yan Hong Ling or Wen Yan? I never heard from him….."

While listening to her in shock, Luo Feng walked slowly forward towards her.

"Mn. What's right with you?" The woman looked at Luo Feng with pity on her faces and smiled bitterly….. "Luo Feng is indeed a great star. But he did not bring back those people from Yan Guang Wen, they have to make sure they meet him."

Luo Feng was at a loss of words, "How can I meet someone from Yan Guang Wen?"

"Then we should go." Said the woman with a smile as she smiled at Lei Feng. Although she wasn't the most knowledgeable with the Yan family, she wasn't here to speak. "Wen Yan, I have an idea, let's take a long walk."

"Wen Yan!" She exclaimed in a light tone as she spoke. At that moment, a loud howl came from outside…..

Previous Chapter | Index | Next Chapter<|endoftext|>From Hearthstone Wiki

Cobalt Axeboom is a common shaman spell card, from the

出力結果を翻訳した結果（日本語）

`国境の長いトンネルを抜けると雪国であった。` 風と雪が流れなくなったため、トンネルの出口に到着するまでに道路は凍りついていました。

Luo Fengは慎重にトンネルに入り、すぐに左に曲がりました。

雪の洞窟の外側の雪地の小雪は、まったく厚くありませんでした。

雪に覆われた馬車を歩く白い雪の中の一対の白い髪の人々は非常に背が高く、非常に機敏であるように見えました。

当初、羅豊は彼らの身元を知りませんでしたが、羅銭銭の声を聞いた後、それは明らかになりました。

「私たちはYan光文です。」羅銭銭は答えた。

「ん？」羅豊の目が光った。これは彼が雪の洞窟の外で黒い猫につまずいたときに彼が最初に出会った女性でしたが、彼女は同じ服を着ているのを見ることができましたか？この種のシーンは非常に興味深いものでした。

「ウェン・ウェン」白の女性は明らかにヤングァンウェンでした。冷たい白いテントの入り口に着いた直後、羅Fは振り向いて笑顔で彼女に言いました…。「あなたは私を知らなければならない…。ウェン、今ここで何をしているの？」

「王光文」彼女を見つめると、女性の目が広がりました。

ショックで彼女の話を聞きながら、羅Luはゆっくりと彼女に向かって歩きました。

「ミスター、あなたはどうした？」女性は顔に同情して羅峰を見て、微笑んだ…..「羅峰は実に偉大な星です。しかし、彼はヤングァンウェンから人々を連れ戻さなかった、彼らは彼に会わなければなりません。」

Luo Fengは「どうしてYan Guang Wenの人に会うことができますか？」

「じゃあ行きましょう。」彼女はレイ・フェンで微笑んだように笑顔で女性を言った。彼女はヤン家に最も精通していませんでしたが、話すためにここにいませんでした。「ウェン・ヤン、私は考えがあります。長い散歩をしましょう。」

「ウェン・ヤン！」彼女は話しながら軽い口調で叫んだ。その瞬間、外から大きなhowが聞こえてきた…..

前の章|インデックス|次の章<| endoftext |> Hearthstone Wikiより

Cobalt Axeboomは、一般的なシャーマンスペルカードです。

結果概要

トンネルまでは良いのですが、どうやら車に乗っている話になってしまいました。しかし、序盤は「雪が降っている」という状況はきちんと理解できています。

風と雪が流れなくなったため、トンネルの出口に到着するまでに道路は凍りついていました。
Luo Fengは慎重にトンネルに入り、すぐに左に曲がりました。

雪の洞窟の外側の雪地の小雪は、まったく厚くありませんでした。

中盤以降は、中国語のコーパスからきているのかわかりませんが、Google翻訳がうまくできていませんｗ

「私たちはYan光文です。」羅銭銭は答えた。

極め付けの、Wikiコピペ・・・

前の章|インデックス|次の章<| endoftext |> Hearthstone Wikiより

Cobalt Axeboomは、一般的なシャーマンスペルカードです。

結果からコーパスが推測できちゃうと、なんか微妙な気持ちになりますね。

大規模（774M）モデル

結果：😄🤣

出力結果（英語）

`It was a snowy country after passing through a long border tunnel.` At the end of the tunnel, the train had stopped, and I was taken to another section of the tunnel. I was told to stand on the left side of the tunnel, and to wait for the train to come round again. I waited for five minutes, when the train arrived and the whole tunnel was flooded. The water rose high enough so that it covered everything.

The last time I saw that train there, it had left a trail of destruction behind it. One of the tracks was twisted. The train, at least two of them, were completely submerged. I could see one of these trains moving through the tunnel. To put it lightly, it was a horrific sight.

I think this train was the third most famous disaster to happen in a rail tunnel in Europe in the last decade. The accident that took place in Tijuana just over two years ago and the flooding that caused the collapse of a tunnel in Germany are the two other big ones.

All I want now is to be able to go back and visit the last bridge. It was a beautiful sight. It was the bridge that had opened up the entire country, the northern border. It was also the place I decided to live, the house I lived in for many years. If I could go back in time and go into the house that I lived in before I got my diploma, I would like to see the inside of the tunnel.

You'd never believe how far it can be.

I've visited several of the bridges across that vast expanse of the sea. It's amazing how much a bridge can extend across. It's no wonder that they had to build bridges.

Some of the most spectacular and amazing bridges are the ones that cross the oceans. Some of the ships that sailed over the water would be like living in a fairy tale.

There's one of the oldest surviving bridges of the World in the Caribbean, the Panama Canal.

Here's the bridge over the Bosphorus.

As you're doing this, you're looking at the historic bridge of Alexandria, Egypt. If you walk down there and just look straight down, you will see the entire city with little bits and bobs on it. This was not just a bridge at all. This was a city built out of the remains that had been carved out of the ground during the construction of the canal.

Many people walk along the historic bridge, but for us, we're here to see the sea. Not

出力結果を翻訳した結果（日本語）

`国境の長いトンネルを抜けると雪国であった。`

トンネルの終わりに、列車が停止し、トンネルの別のセクションに連れて行かれました。トンネルの左側に立ち、列車が再びくるのを待つように言われました。電車が到着し、トンネル全体が浸水したとき、私は5分間待ちました。水はすべてを覆うように十分に高くなった。

その列車を最後に見たとき、その後ろに破壊の跡が残っていました。トラックの1つがねじれていました。列車は、少なくとも2人が完全に水没しました。これらの列車の1つがトンネル内を移動しているのが見えました。軽く言えば、それは恐ろしい光景でした。

この列車は、過去10年間にヨーロッパの鉄道トンネルで発生した3番目に有名な災害だったと思います。ティファナで2年ほど前に起こった事故と、ドイツでトンネルの崩壊を引き起こした洪水は、他の2つの大きな事故です。

私が今望んでいるのは、戻って最後の橋を訪れることです。それは美しい光景でした。国全体、北の国境を開いたのは橋でした。また、私が住むことにした場所、私が長年住んでいた家でもありました。時間をさかのぼって卒業証書を取得する前に住んでいた家に行くことができたら、トンネルの内部を見たいです。

あなたはそれがどれほど遠くにあると信じることは決してないでしょう。

その広大な海を渡るいくつかの橋を訪れました。橋がどれだけ広がっているかは驚くべきことです。彼らが橋を架けなければならなかったのも不思議ではない。

最も壮観で素晴らしい橋のいくつかは、海を渡る橋です。水上を航行した船の中には、おとぎ話に住んでいるようなものがあります。

カリブ海には、パナマ運河という世界最古の橋の1つがあります。

ボスポラス海峡に架かる橋です。

これを行っている間、エジプトのアレクサンドリアの歴史的な橋を見ています。そこを歩いて真っ直ぐ見下ろすと、街全体が少しずつ見えます。これは単なる橋ではありませんでした。これは、運河の建設中に地面から切り出された遺跡で建てられた都市でした。

多くの人が歴史的な橋を歩いていますが、私たちは海を見るためにここにいます。ない

結果概要

なんと、列車がトンネルを通った、という状況通りの文章が生成されています！ > トンネルの終わりに、列車が停止し、トンネルの別のセクションに連れて行かれました。

これはかなり期待できます。と思ったら

トンネルの左側に立ち、列車が再びくるのを待つように言われました。電車が到着し、トンネル全体が浸水したとき、私は5分間待ちました。水はすべてを覆うように十分に高くなった。

その列車を最後に見たとき、その後ろに破壊の跡が残っていました。トラックの1つがねじれていました。列車は、少なくとも2人が完全に水没しました。これらの列車の1つがトンネル内を移動しているのが見えました。軽く言えば、それは恐ろしい光景でした。

まさかの事故に遭ってしまいましたｗ
コーパスに強く依存するので、どうしてもニュースやWikipediaなどの内容に寄ってしまうのは仕方がないと思います。

超大規模（1558M）モデル

結果：😞

出力結果（英語）

`It was a snowy country after passing through a long border tunnel.` "Let's split up. The two of us are still going to fight alone."

"I don't know what you said."

Ares was surprised because of that phrase.

"What the hell are you talking about?"

"I'm gonna be the one to start, I'm the one to finish, you're gonna fall in the end."

"Is that for real?"

"That's what I just said."

In the end, just as he thought.

Even though she seemed to think that the idea of starting with the enemy, defending from the front and finishing with the enemy was a good idea, he felt that this plan would not work.

"Okay. You start."

"What? It's too soon already! I don't have any experience yet!"

But Ares was not convinced.

"That's why, as I said last time, you and me will fight together for a longer time. That way you will learn."

"I know… Then when you're done?"

"It will probably seem like you've been taking my words for granted, but that's what I mean. This is just where you're at now. I can't promise you that you'll be able to win in the end. To begin with, we've been fighting for over a year already."

Saying that, he showed his hand.

"There's something I found in there."

While Ares didn't think it was worth it, she accepted it without hesitation.

"Something you wanted to keep secret from me?"

"Yes. So let's go in."

The two walked on the snowy ground.

"I'm sorry, Ares. Are you really okay?"

"Of course. I have a bit of experience already."

"I see."

Their destination was a huge rock formation in the middle of the cliff.

The two entered the cave and found an opening like a tunnel.

"It's so dark…"

"What do you mean?"

"The entrance is really dark."

"But you can see the cave's end."

"That's because the cave is so deep. Also, you don't have a torch."

"I don't, but there isn't a problem."

"I think you'll be able to see through the darkness

出力結果を翻訳した結果（日本語）

`国境の長いトンネルを抜けると雪国であった。` 「分割してみましょう。私たち二人はまだ一人で戦うつもりです。」

「あなたの言ったことはわかりません。」

アレスはそのフレーズに驚いた。

「一体何を言っているの？」

「私が開始する人になり、終了する人になります。あなたは最後に落ちます。」

「それは本当ですか？」

「それは私がちょうど言ったことです。」

最後に、彼が思った通りに。

彼女は、敵から始めて、正面から防御し、敵で終わるというアイデアは良いアイデアであると考えているように見えたが、彼はこの計画がうまくいかないと感じた。

「わかりました。始めます。」

「なに？もうすぐだよ！まだ経験がない！」

しかし、アレスは確信していませんでした。

「だから、私が前回言ったように、あなたと私はより長い時間一緒に戦う。そうすればあなたは学ぶだろう。」

「わかってる…それで、終わったら？」

「おそらくあなたは私の言葉を当たり前のように思っているように思えますが、それは私が言っていることです。これはまさにあなたが今いる場所です。私はあなたが最終的に勝つことができると約束することはできません。そもそも、私たちはすでに1年以上も戦ってきました。」

そう言って、彼は手を見せた。

「そこに私が見つけたものがあります。」

アレスはそれが価値があるとは思わなかったが、ためらうことなく受け入れた。

「あなたが私から秘密にしたいことは何か？」

「はい。だから入ろう。」

二人は雪の地面を歩いた。

「すみません、アレス。本当に大丈夫ですか？」

「もちろん。もう少し経験があります。」

"そうですか。"

彼らの目的地は、崖の真ん中にある巨大な岩の形成でした。

二人は洞窟に入り、トンネルのような開口部を見つけました。

「とても暗い…」

"どういう意味ですか？"

「入り口は本当に暗い。」

「しかし、あなたは洞窟の終わりを見ることができます。」

「それは洞窟がとても深いからです。また、トーチを持っていません。」

「私はしませんが、問題はありません。」

「あなたは暗闇の中を見ることができると思う

結果概要

なぜか、ファンタジーよりの会話になってしまいました。 > 「分割してみましょう。私たち二人はまだ一人で戦うつもりです。」 > 「あなたの言ったことはわかりません。」

要所で「雪」「トンネル」という単語は出てきてはいますが・・・

二人は雪の地面を歩いた。
二人は洞窟に入り、トンネルのような開口部を見つけました。

思ったほど、良い結果ではありませんでした。

検証2：フェイクニュースを生成できるか？

検証1の結果を受けて、そもそもGPT-2は小説の生成に向いていない気がしてきました。また、インプットとなる文が短すぎて、的外れな文章を生成してしまっている可能性もあります。

というわけで、GPT-2の得意そうなフェイクニュース生成を試みてみます。
せっかくなので、日経 xTECHの「グーグルの最新AI「BERT」のインパクト、ついに読解力も人間超え | 日経 xTECH（クロステック）」という記事の冒頭から続きを生成してみます。

対象文：
AIに文章読解は不可能――。数年前までこれが常識だった。日本の国立情報学研究所（NII）が2011年に始めた「ロボットは東大に入れるか（東ロボ）」プロジェクトでも、AIは大学入試センター試験の英語試験で長文読解問題や単語の並べ替え問題に全く歯が立たなかった。東ロボのAIが2016年にセンター試験の英語問題を解いた際の偏差値は「45.1」。東大合格はとうてい不可能なレベルだった。ところがグーグルが2018年10月に発表した新技術「BERT」で常識が一変した。BERTが文章読解問題を人間より高い正答率で解いたからだ。
翻訳後：
AI cannot read texts. This was common sense a few years ago. Even in the “Can Robots Enter the University of Tokyo (Tokyo Robo)” project, launched in 2011 by the National Institute of Informatics (NII) in Japan, AI also used long-term reading and word sorting problems in the English test of the University Entrance Examination Center Test. Had no teeth at all. The deviation value when the East Robot AI solved the English test for the center test in 2016 was "45.1". Passing the University of Tokyo was an impossible level. However, the new technology "BERT" announced by Google in October 2018 changed the common sense. Because BERT solved the sentence reading problem with a higher correct answer rate than humans.

さてどうなる！？

小規模（124M）モデル

結果：🤔

出力結果（英語）

The paper by Daxhayu and fellow researcher Shihiro Yamamoto at NII, titled "A new neural network for text reading and word sorting at the University," was presented at Computational Brain and Language (CBL) 2017 in Beijing, China. In it, they present the computational algorithm that can translate and write text into a language that is not written in a language. The paper is a follow-up to the first paper (2016) led by Nakadou Sotoki, director of the Center for Computer Science and Engineering and Director of Computer Science and Technology at the NII.

The two-part program has been organized for this article. The first part consists of a series of papers summarizing some of the main developments in computational literature and of various results reported in these papers.

It is noteworthy that the papers published in these publications are very similar to the most recent work of Takuya Takashima, head of the NII Institute of Informatics, in March last year. In order to clarify this, the NII started working on an algorithm (also known as DeepMind) that can translate and write text into a language of English that is not written in English, but rather translated into other languages through neural network. Based on this work it was found the NII could have a number of new developments for reading and Word reading at the University of Tokyo in 2020.

Here, the authors present their results for the next three years in the open edition of Computational Brain and Language. The results, which are expected to remain open for several more years, will have important implications for the future of computing in general and for the field of computer science from a computational perspective.

Explore further: Computer-science students use novel algorithm to read, understand text

More information: Takuya Takashima et al. The study of machine learning in computational linguistics and computer science in

出力結果を翻訳した結果（日本語）

NIIのDaxhayuと仲間の研究者である山本志裕による「大学におけるテキスト読み取りと単語分類のための新しいニューラルネットワーク」という論文が、中国北京で開催されたComputational Brain and Language（CBL）2017で発表されました。その中で、彼らは言語で書かれていない言語にテキストを翻訳して書くことができる計算アルゴリズムを提示します。この論文は、NIIのコンピューター科学工学センターのディレクターであり、コンピューター科学技術のディレクターである中堂外時が率いる最初の論文（2016年）のフォローアップです。

この記事では、2部構成のプログラムが編成されています。最初の部分は、計算文献の主要な開発のいくつかを要約した一連の論文と、これらの論文で報告されたさまざまな結果で構成されています。

これらの出版物で発表された論文は、昨年3月にNII情報学研究所の所長である高島拓也の最新の研究と非常に類似していることは注目に値します。これを明確にするために、NIIは、英語で書かれていないが、ニューラルネットワークを介して他の言語に翻訳される英語の言語にテキストを翻訳および書き込むことができるアルゴリズム（DeepMindとも呼ばれる）の作業を開始しました。この研究に基づいて、NIIは2020年に東京大学で読書と単語読書のための多くの新しい開発を行うことができることがわかった。

ここで、著者は、Computational Brain and Languageの公開版で今後3年間の結果を発表します。この結果は、今後数年間はオープンのままであると予想されますが、コンピューティングの一般的な将来およびコンピューターの観点からコンピューターサイエンスの分野に重要な意味を持ちます。

さらに探索する：コンピューターサイエンスの学生は、新しいアルゴリズムを使用してテキストを読み、理解します

詳細：高島拓也他計算言語学とコンピューターサイエンスにおける機械学習の研究

結果概要

全体的に統一感のある話題ではあります。が、生成元の文章とは繋がりがあまり感じられません。「BERT」という単語がひっかかるかと思ったのですが、全く出てきませんでした。

これらの出版物で発表された論文は、昨年3月にNII情報学研究所の所長である高島拓也の最新の研究と非常に類似していることは注目に値します。

どうやら、今度は日本関連のコーパスから引用しているようです。生成元文章に含まれる「日本」が影響しているのでしょうか。

中規模（355M）モデル

結果：😉😮

出力結果（英語）

A number of papers have been published on how "bognitive" AI systems are able to improve the quality of the literature. One of these papers, "Intelligent Knowledge Machines", by the team of Peter Fodor, Andrew Ng, Paul Davies and Andrew Ng, shows how to use a BERT system to analyze the literature for relevant text in an interactive form. "Briefly", the system searches through a selection of documents and generates an overview of the document based on a short list of attributes of each author. Then the system creates a summary for the author using several algorithms, such as word frequency or word depth, based on a textual representation derived from the given documents. By simply looking at a summary, human readers can evaluate the content accurately through an artificial intelligence approach. However, the authors explain that "The system generates a detailed abstract of the document based on what can be read by users." In other words "Bert provides an intuitive, and easily understandable way to read text."

The BERT system used by the authors works similarly to how most AI systems are made today. The BERTs take the short input/short output and convert it into a "language representation" using "grammatical constraints" (the word limits) in English using English's grammar criteria. This is done using a computer (or a computer system) to create what the computer calls a "language model". This is an abstract representation that is used to form the text in a given document. The document is then loaded into the machine that then translates and understands the "language model" into English using English grammar criteria. Then there is a translation between English and BERT. This translation will be in English, meaning that the machine understands the English language based on English grammatical properties as written in English. With a BERT system, the authors show that any text that has grammatical limits is very difficult to parse and is almost never read by anyone, much less the humans. This is very similar to how a human can be a fool in a language that they have never learned. The human could be capable of speaking English like a very advanced chess player, and not be a fool, with no understanding of what the humans are reading.

So what is the use of having a BERT system for answering Japanese questions, which are supposed to make the most sense or to

出力結果を翻訳した結果（日本語）

「ボグニティブ」AIシステムがどのように文献の品質を改善できるかについて、多くの論文が発表されています。これらの論文の1つである、Peter Fodor、Andrew Ng、Paul Davies、およびAndrew Ngのチームによる「Intelligent Knowledge Machines」は、BERTシステムを使用して、インタラクティブな形式で関連テキストの文献を分析する方法を示しています。「ブリーフフライ」、システムは選択されたドキュメントを検索し、各著者の属性の短いリストに基づいてドキュメントの概要を生成します。次に、システムは、指定されたドキュメントから派生したテキスト表現に基づいて、単語の頻度や単語の深さなどのいくつかのアルゴリズムを使用して、作成者の要約を作成します。要約を見るだけで、人間の読者は人工知能アプローチを通じてコンテンツを正確に評価できます。しかし、著者は「システムはユーザーが読むことができるものに基づいて文書の詳細な要約を生成する」と説明しています。言い換えると、「Bertは直感的で簡単に理解できる方法でテキストを読むことができます。」

著者が使用するBERTシステムは、今日のほとんどのAIシステムの製造方法と同様に機能します。 BERTは短い入力/短い出力を受け取り、英語の文法基準を使用して英語の「文法上の制約」（単語の制限）を使用して「言語表現」に変換します。これはコンピューター（またはコンピューターシステム）を使用して行われ、コンピューターが「言語モデル」と呼ぶものを作成します。これは、特定のドキュメントでテキストを形成するために使用される抽象的な表現です。次に、ドキュメントはマシンにロードされ、マシンは英語の文法基準を使用して「言語モデル」を英語に翻訳して理解します。次に、英語とBERTの間の翻訳があります。この翻訳は英語で行われます。つまり、機械は英語で書かれた英語の文法特性に基づいて英語を理解します。著者は、BERTシステムを使用して、文法的な制限があるテキストは解析するのが非常に難しく、ほとんど人間には読まれないことを示しています。これは、人間が決して学んだことのない言語で愚か者になり得ることに非常に似ています。人間は、非常に高度なチェスプレーヤーのように英語を話すことができ、馬鹿ではなく、人間が何を読んでいるかを理解できません。

だから、最も理にかなっているか、またはすることが想定されている日本の質問に答えるためにBERTシステムを持つことの使用は何ですか

結果概要

小規模モデルと比較すると、かなり自然な文章です。ちゃんと「BERT」という単語に着目しています。

人間の最大精度は、英語テストで90.7％でした。このように、BERTは、人間が質問に答えることに間違いがないことを証明しました。

また、「ボグニティブ」という造語を作り出しています。あくまで単語ベースの学習をしていると思うので、このように造語を作り出しているのは非常に興味深いです。もしかしたら、コーパスに誤字があっただけかもしれませんが・・・

「ボグニティブ」AIシステムがどのように文献の品質を改善できるかについて、多くの論文が発表されています。

大規模（774M）モデル

結果：🤔

出力結果（英語）

On the other hand, BERT was tested against human students in Japan. It was the biggest problem to solve. The number of words in 1.0% and 0.5% problem is quite similar to the number of words in the SAT.

According to the University of Tokyo, the AI has learned to predict the scores of the Japanese university entrance exams correctly and the scores of the American SAT exam correctly. It means that by 2018 they should have learned their way to the new level and they should be able to score higher than human students.

In July, BERT was also shown in Singapore. It used natural language processing (NLP). It made predictions only after reading 2 billion webpages that have been categorized by keywords, and it used its understanding abilities to guess at the word meanings and meanings of other words in the web.

Here follows a list of the top BERT predictions.

1, 0.9% in the English exam:

"If you eat too large of a pizza you risk falling down the stairs" (4 million words)

"For every month my sister has to go back to school I've got less time than a month ago" (4 million words)

"It's impossible to read this book that does not have many numbers in it" (4 million words)

"A little bit of extra blood on the leg will make you feel sick" (4 million words)

2, 0.9% in the English exam:

"When you say "I love you" to yourself it's like calling the weather" [4 million words]

"You know the difference between the first and last person in a list" [2 million words]

"The person whom you choose when you enter a job interview is not the only one" [4 million words]

3.2, 0.1% in the test:

"I'd like you to tell me about your past life" [4 million words]

"I'm not a robot you know" [2 million words]

"If our eyes were closed we could

出力結果を翻訳した結果（日本語）

一方、BERTは日本の学生に対してテストされました。それが解決すべき最大の問題でした。 1.0％と0.5％の問題の単語数は、SATの単語数と非常に似ています。

東京大学によると、AIは日本の大学入試のスコアとアメリカのSAT試験のスコアを正しく予測することを学んだ。つまり、2018年までに、彼らは新しいレベルへの道を学び、人間の学生よりも高い得点を得ることができるはずです。

7月には、BERTもシンガポールで公開されました。自然言語処理（NLP）を使用しました。キーワードで分類された20億のWebページを読んだ後にのみ予測を行い、その理解能力を使用して単語の意味とWeb内の他の単語の意味を推測しました。

上位BERT予測のリストを次に示します。

1、英語の試験で0.9％：

「ピザを食べすぎると、階段から落ちる危険性があります」（400万語）

「毎月、妹は学校に行かなければならないので、1か月前よりも時間がありません」（400万語）

「数字が多くないこの本を読むことは不可能です」（400万語）

「足に少し余分な血が出ると気分が悪くなります」（400万語）

2、英語試験の0.9％：

「自分に「私はあなたを愛しています」と言うとき、それは天気を呼ぶようなものです」[400万語]

「リストの最初と最後の人の違いを知っている」[200万語]

「就職の面接に参加するときに選択する人だけが一人ではない」[400万語]

3.2、テストでは0.1％：

「あなたの過去の生活について教えてほしい」[400万語]

「私はあなたが知っているロボットではない」[200万語]

「目を閉じれば

結果概要

「BERT」に加え、「学生」や「テスト」といった単語が含まれていますが、全体的には微妙な結果ですね。 > 一方、BERTは日本の学生に対してテストされました。それが解決すべき最大の問題でした。 1.0％と0.5％の問題の単語数は、SATの単語数と非常に似ています。

超大規模（1558M）モデル

結果：😲😲

出力結果（英語）

`AI cannot read texts. This was common sense a few years ago. Even in the “Can Robots Enter the University of Tokyo (Tokyo Robo)” project, launched in 2011 by the National Institute of Informatics (NII) in Japan, AI also used long-term reading and word sorting problems in the English test of the University Entrance Examination Center Test. Had no teeth at all. The deviation value when the East Robot AI solved the English test for the center test in 2016 was "45.1". Passing the University of Tokyo was an impossible level. However, the new technology "BERT" announced by Google in October 2018 changed the common sense. Because BERT solved the sentence reading problem with a higher correct answer rate than humans.` BERT is an android app of the Alphabet system developed by Baidu, the main company of Google, as the Google AI Platform. BERT is currently an internal project of Google, so it is not yet open to open data access. In addition to the English test, BERT was also tested by a university professor and a teacher of English. On the English test, the average answers were 62.3 out of 100 (100 is an average out of a possible total of 100 in the test). On the question grading of BERT, the results were even more interesting. The BERT team received an average grade of 7.0 out of 100 in the English exam. This seems like a high grade, considering that almost all the university's students are now tested in the test. The BERT solution was not able to understand the meaning of the questions correctly. The test results gave a low percentage of correct answers (only 9%). BERT was in the middle of the table with the human scores: 3.3%, 3.2%. This result suggests that Google's Google Assistant is not only better at solving the natural language task in English (as was reported by the "Google Pixel" in 2014), but also at interpreting questions in other languages other than English. "BERT" is also able to answer questions in Chinese, Korean and Vietnamese, as well as in Spanish. Even though the AI system "BERT" solves the word reading task with a lower accuracy rate, it is much faster and more effective at answering questions.

The BERT team (which includes the two developers who received the "Best Student in the World" and "Google Pixel award" at the 2017 IWJ Future Tech Awards 2017) developed the BERT for the University Entrance Examination Center. One of the most important things is that the BERT team developed a BERT app through the "Baidu AI platform". This means that the software of this BERT app will be available both in Japan (on Android), and in China, Korea, Singapore and Indonesia (on iOS) as well. Also, it is available for Windows, OS X in development and Linux. The BERT AI can also read other foreign languages and is programmed with the "Korean language model". Also, the BERT Baidu app, with its language model, has also been tested by a German teacher. The teacher's assessment was a score of 7.0 out of 100, indicating that the Korean language model is an

出力結果を翻訳した結果（日本語）

`AIに文章読解は不可能――。数年前までこれが常識だった。日本の国立情報学研究所（NII）が2011年に始めた「ロボットは東大に入れるか（東ロボ）」プロジェクトでも、AIは大学入試センター試験の英語試験で長文読解問題や単語の並べ替え問題に全く歯が立たなかった。東ロボのAIが2016年にセンター試験の英語問題を解いた際の偏差値は「45.1」。東大合格はとうてい不可能なレベルだった。ところがグーグルが2018年10月に発表した新技術「BERT」で常識が一変した。BERTが文章読解問題を人間より高い正答率で解いたからだ。` BERTは、Googleのメイン企業であるBaiduがGoogle AIプラットフォームとして開発したAlphabetシステムのAndroidアプリです。現在、BERTはGoogleの内部プロジェクトであるため、まだデータアクセスを開くことができません。英語のテストに加えて、BERTは大学の教授と英語の教師によってもテストされました。英語のテストでは、平均回答数は100のうち62.3でした（100はテストの合計100のうちの平均です）。 BERTの質問の採点については、結果はさらに興味深いものでした。 BERTチームは、英語試験で100点満点中7.0点の平均点を獲得しました。大学のほとんどすべての学生がテストでテストされていることを考えると、これは高い成績のようです。 BERTソリューションは、質問の意味を正しく理解できませんでした。テスト結果の正解率は低かった（9％のみ）。 BERTはテーブルの中央にあり、人間のスコアは3.3％、3.2％でした。この結果は、GoogleのGoogle Assistantが英語の自然言語タスクの解決（2014年の「Google Pixel」で報告された）だけでなく、英語以外の言語での質問の解釈にも優れていることを示唆しています。「BERT」は、スペイン語だけでなく、中国語、韓国語、ベトナム語の質問にも答えることができます。 AIシステム「BERT」は単語の読解タスクを低い精度で解決しますが、質問への回答ははるかに高速で効果的です。

BERTチーム（2017 IWJ Future Tech Awards 2017で「世界で最も優秀な学生」と「Google Pixel賞」を受賞した2人の開発者を含む）は、大学入試センターのBERTを開発しました。最も重要なことの1つは、BERTチームが「Baidu AIプラットフォーム」を通じてBERTアプリを開発したことです。つまり、このBERTアプリのソフトウェアは、日本（Android）と、中国、韓国、シンガポール、インドネシア（iOS）の両方で利用可能になります。また、Windows、開発中のOS X、Linuxでも使用できます。 BERT AIは他の外国語も読むことができ、「韓国語モデル」でプログラムされています。また、BERT Baiduアプリは、言語モデルとともに、ドイツ語の教師によってテストされています。教師の評価は100点中7.0点で、韓国語モデルが

結果概要

まず、この一文がまぎれもないフェイクであり、驚かされます。 > BERTは、Googleのメイン企業であるBaiduがGoogle AIプラットフォームとして開発したAlphabetシステムのAndroidアプリです。

また、全体的にこれまで生成した文章と決定的に違うのは、最後まで話題がそれていないことです。
しかし、詳しくみていくと、結構めちゃくちゃなことを言っています。

大学入試センターのBERTを開発しました。最も重要なことの1つは、BERTチームが「Baidu AIプラットフォーム」を通じてBERTアプリを開発したことです。つまり、このBERTアプリのソフトウェアは、日本（Android）と、中国、韓国、シンガポール、インドネシア（iOS）の両方で利用可能になります。また、Windows、開発中のOS X、Linuxでも使用できます。 BERT AIは他の外国語も読むことができ、「韓国語モデル」でプログラムされています。

とある完成された文章に対して、固有名詞を適当に入れ替えた、そんな印象を受ける文章です。

おわりに

GPT-2を実際に実行し、文章によってはそれなりの精度を出せることが分かりました。ただ、この結果がそのままフェイクニュースになるか？と言われると、まだ遠い印象を受けます。

ただし、生成された文章は、最初の1〜2文はそれなりに自然な結果が得られています。パラメータチューニングや適用方法の調整によって、より自然な文章を生成できるかもしれません。

また、今回は2つの文でしか実験できていませんが、今後もっといろいろな文で実験したいと思える結果になりました（時間があれば・・・！）。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

危険すぎるAI「GPT-2」は実際どこまで自然な文を生成できるのか

はじめに

公開履歴

2019年2月 GPT-2発表＆小規模モデル公開

2019年5月 中規模モデル公開

2019年8月 大規模モデル公開

2019年11月 最大サイズのモデル公開

実験

インストール

実行方法

検証

検証1： 川端康成「雪国」の続きを生成できるか？

小規模（124M）モデル

中規模（355M）モデル

大規模（774M）モデル

超大規模（1558M）モデル

検証2： フェイクニュースを生成できるか？

小規模（124M）モデル

中規模（355M）モデル

大規模（774M）モデル

超大規模（1558M）モデル

おわりに

2019年5月中規模モデル公開

2019年8月大規模モデル公開

2019年11月最大サイズのモデル公開

検証1：川端康成「雪国」の続きを生成できるか？

検証2：フェイクニュースを生成できるか？