突然なのですが、実は私は(到底信じてもらえないでしょうが)ChatGPTが流行る前から機械学習をやりたい思いはあったのですが、当時はプログラミングの知識がほとんどなく、結局Webアプリケーションに興味移ってしまったためしばらくやってませんでした。。。
ただ最近GPT絡みでAIブームが来ているので(?)簡易的な機械学習と言語モデル生成をやってみます。
趣味の範疇なので悪しからず。
コード
1.あらかじめ指定した文字列を整理(?)するやつ
import random
# function to generate n-grams from text
def generate_ngrams(text, n):
ngrams = []
words = text.split()
for i in range(len(words)-n+1):
ngrams.append(' '.join(words[i:i+n]))
return ngrams
# function to generate a language model from text
def generate_language_model(text, n):
ngrams = generate_ngrams(text, n)
model = {}
for i in range(len(ngrams)-1):
if ngrams[i] not in model:
model[ngrams[i]] = []
model[ngrams[i]].append(ngrams[i+1])
return model
# function to generate text using language model
def generate_text(model, n, length):
current_ngram = random.choice(list(model.keys()))
text = current_ngram
for i in range(length):
if current_ngram not in model:
break
possible_words = model[current_ngram]
next_word = random.choice(possible_words)
text += ' ' + next_word.split()[n-1]
current_ngram = ' '.join(text.split()[-n:])
return text
# example usage
text = 'The quick brown fox jumps over the lazy dog'
n = 2
model = generate_language_model(text, n)
generated_text = generate_text(model, n, 10)
print(generated_text)
2.標準入力をもとにJSONでモデル生成するやつ
import random
import json
# function to generate n-grams from text
def generate_ngrams(text, n):
ngrams = []
words = text.split()
for i in range(len(words)-n+1):
ngrams.append(' '.join(words[i:i+n]))
return ngrams
# function to generate a language model from text
def generate_language_model(text, n):
ngrams = generate_ngrams(text, n)
model = {}
for i in range(len(ngrams)-1):
if ngrams[i] not in model:
model[ngrams[i]] = []
model[ngrams[i]].append(ngrams[i+1])
return model
# function to generate text using language model
def generate_text(model, n, length):
current_ngram = random.choice(list(model.keys()))
text = current_ngram
for i in range(length):
if current_ngram not in model:
break
possible_words = model[current_ngram]
next_word = random.choice(possible_words)
text += ' ' + next_word.split()[n-1]
current_ngram = ' '.join(text.split()[-n:])
return text
# example usage
text1 = str(input())
text2 = str(input())
text = text1 + text2
n = 2
model = generate_language_model(text, n)
generated_text = generate_text(model, n, 10)
print(generated_text)
with open('model.json', 'w') as f:
json.dump(model, f)
生成したモデル
モデル1
{
"The quick": [
"quick brown",
"quick brown"
],
"quick brown": [
"brown fox",
"brown fox",
"brown fox",
"brown fox"
],
"brown fox": [
"fox jumps",
"fox jumps",
"fox jumps",
"fox jumps"
],
"fox jumps": [
"jumps over",
"jumps over",
"jumps over",
"jumps over"
],
"jumps over": [
"over the",
"over the",
"over the",
"over the"
],
"over the": [
"the lazy",
"the lazy",
"the lazy",
"the lazy"
],
"the lazy": [
"lazy dog",
"lazy dogThe",
"lazy dogThe",
"lazy dog"
],
"lazy dog": [
"dog The"
],
"dog The": [
"The quick"
],
"lazy dogThe": [
"dogThe quick",
"dogThe quick"
],
"dogThe quick": [
"quick brown",
"quick brown"
]
}
モデル2
{
"1 2": [
"2 3",
"2 3"
],
"2 3": [
"3 4",
"3 4",
"3 4",
"3 4"
],
"3 4": [
"4 5",
"4 51",
"4 51",
"4 5"
],
"4 5": [
"5 1"
],
"5 1": [
"1 2"
],
"4 51": [
"51 2",
"51 2"
],
"51 2": [
"2 3",
"2 3"
]
}
モデル3(標準入力から生成するタイプ)
{
"22233 3332231313131": [
"3332231313131 daiwduai"
]
}
ノリと勢いでやりましたが案外いけますね
タノシイ