More than 5 years have passed since last update.

素人の言語処理100本ノック:05

Last updated at 2017-05-03Posted at 2016-09-14

言語処理100本ノック 2015の挑戦記録です。環境はUbuntu 16.04 LTS ＋ Python 3.5.2 :: Anaconda 4.1.1 (64-bit)です。過去のノックの一覧はこちらからどうぞ。

第1章: 準備運動

05.n-gram

与えられたシーケンス（文字列やリストなど）からn-gramを作る関数を作成せよ．この関数を用い，"I am an NLPer"という文から単語bi-gram，文字bi-gramを得よ．

出来上がったコード：

main.py

# coding: utf-8


def n_gram(target, n):
	'''指定されたリストからn-gramを作成

	引数:
	target -- 対象リスト
	n -- n-gramのn値（1ならuni-gram、2ならbi-gram...）
	戻り値:
	gramのリスト
	'''
	result = []
	for i in range(0, len(target) - n + 1):
		result.append(target[i:i + n])

	return result


target = 'I am an NLPer'
words_target = target.split(' ')

# 単語bi-gram
result = n_gram(words_target, 2)
print(result)

# 文字bi-gram
result = n_gram(target, 2)
print(result)

実行結果：

端末

[['I', 'am'], ['am', 'an'], ['an', 'NLPer']]
['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']

ついでにuni-gramとtri-gramも

関数のテストを兼ねて、uni-gramとtri-gramも確認しました。

main.pyの続き

# 単語uni-gram
result = n_gram(words_target, 1)
print(result)

# 文字uni-gram
result = n_gram(target, 1)
print(result)

# 単語tri-gram
result = n_gram(words_target, 3)
print(result)

# 文字tri-gram
result = n_gram(target, 3)
print(result)

実行結果：

端末

[['I'], ['am'], ['an'], ['NLPer']]
['I', ' ', 'a', 'm', ' ', 'a', 'n', ' ', 'N', 'L', 'P', 'e', 'r']
[['I', 'am', 'an'], ['am', 'an', 'NLPer']]
['I a', ' am', 'am ', 'm a', ' an', 'an ', 'n N', ' NL', 'NLP', 'LPe', 'Per']

大丈夫そうですね。

　
6本目のノックは以上です。誤りなどありましたら、ご指摘いただけますと幸いです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up