Twitterで人工地震と検索してマルコフ連鎖でランダムな文章を生成してTweetする

Last updated at 2022-08-02Posted at 2021-03-01

コード

主に使用したライブラリ
- tweepy
- janome
- markovify

生成する文章の元をツイートを検索して収集

文章生成の元となるツイートを検索する
- URLはマルコフ連鎖の関数に入れたときに良く分からないKeyErrorになることが多いので除去している。RTを除くのは同一内容を除くため、リプライを除くのは事故を防ぐためである。
- ツイートのURLがツイートに含まれると引用RTになってしまうが、引用RTはある意味ソレ系ツイートっぽいので気分でURLの有無を切り替えている。
- なんとなく検索回数を100回にしている。

def tweet_search(search_words):
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)
    set_count = 100
    word = search_words
    results = api.search(q=word, count=set_count)
    strResult = ""
    for result in results:
        # if "RT" not in result.text and "@" not in result.text:
        if "RT" not in result.text and "@" not in result.text and "https://t.co/" not in result.text:
            strResult += result.text
    print("-----Generate text based on this text-----")
    print(strResult)
    return strResult

janomeを利用して取得したツイートを単語単位に分割する

Janomeって何？っていうのはこの記事が参考になった。あとこのライブラリは、何となくで使ってるので、正直良く分かっていない。ここのコードを丸パクリしている。

def wakati(text):
   text = text.replace('\n','')
   text = text.replace('\r','')
   t = Tokenizer()
   result =t.tokenize(text, wakati=True)
   print("-----Generate words list-----")
   print(result)
   return result

分割した単語から文章を自動作成する

ここのコードを丸パクリしていていたが、KeyErrorの無限ループにハマったので、breakで抜けるようにした。
- 正直、何で動くのか、何となくでしか理解していない。

def generate_text(words_list):
   num_sentence = 5 # ここを変更すると何度も文章を作成する
   words_list = words_list
 
   markov = {}
   w1 = ""
   w2 = ""
   for word in words_list:
       if w1 and w2:
           if (w1, w2) not in markov:
               markov[(w1, w2)] = []
           markov[(w1, w2)].append(word)
       w1, w2 = w2, word
 
   count_kuten = 0 #  "。"
   num_sentence = num_sentence
   sentence = ""
   w1, w2  = random.choice(list(markov.keys()))
   while count_kuten < num_sentence:
       try:
           tmp = random.choice(markov[(w1, w2)])
           sentence += tmp
       except KeyError: # 問題個所
           break
       except Exception as e:
           print(e)
           print(type(e))
       if(tmp=='。'):
           count_kuten += 1
           sentence += '\n'
       w1, w2 = w2, tmp

   print("-----Generated Text-----")
   print(sentence)

   return sentence

###実行部分

汎用性を高めるために引数で検索語句を受け付けて、上記の関数に渡しているだけである。実行時-w "人工地震"と記述する。

if __name__ == "__main__":
    search_words = []
    parser = argparse.ArgumentParser(description='search words(space separated)')
    parser.add_argument('-w','--words',metavar='words',type=str,help='search words(space separated)',default="")
    args = parser.parse_args()
    print(args)
    if args.words == "":
        print("No Argument")
        sys.exit()
    s = args.words
    search_words = s.split()
    print(search_words)
    # Search for tweets data and generate text
    tweet_text = generate_text(wakati(tweet_search(search_words)))
    # trim
    tweet_text_140 = tweet_text[1:140]
    print("-----Text trimmed to 140 characters-----")
    print(tweet_text_140)
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)
    # tweet
    try:
        api.update_status(tweet_text_140)
    except Exception as e:
        print(e)
        print(type(e))

参考にした記事

Tweepy で Twitter の単語を検索する
Pythonでマルコフ連鎖で文章生成してみる
- ほぼほぼここのコードの改良版なので足を向けて寝られない
APIキーなどの環境変数を管理
- ベタ書きは推奨されていないらしく
SWAP領域の確認と作成
- 貧弱仮想マシンで動かしたらメモリが一瞬で枯渇しました

感想

マルコフ連鎖はしゅうまいくんみたいなものという認識しかなかったのに、コードを拾ってくる検索スキルだけで似たようなものが作れるというのに感動した。コードを読めるようにしないといけないのは重々わかるのだが、動いてしまったのだから仕方ない。
今後やりたいこと
- cronで動かしてるだけで、全くログを取っていないので、ログを取る。
- 母数を貯めていって学習している風にする。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up