ïŒ.ç°¡åãªæŠèŠ
ãã®èšäºã§ã¯éœå
ã©ãŒã¡ã³å±ã®é£ã¹ãã°å£ã³ãã䜿ã£ãŠé ããååºãã¬ã³ã¡ã³ãã§çºæããããæ¹ã解説ããŠãããŸãã
ç§èªèº«ðã倧奜ãã§æã¯å¹Žé100æ¯ä»¥äžé£ã¹æ©ããŠããèªç§°ã©ãŒã¡ã³ã¬ãå¢ã§ããããããªãããçŽè¿ã®å¥åº·èšºæã«ã²ã£ããããå»è
ãããã¯ã¿ãŒã¹ãããããããããŠããŸããŸããããã
è¡ãå Žããªãããã©ãŒã¡ã³ç±ãçºæ£ãã¹ãæ©æ¢°åŠç¿ã§ã©ãŒã¡ã³ã¬ã³ã¡ã³ãïŒé ããååºãã¬ã³ã¡ã³ãã§çºæïŒã«ææŠããŠã¿ãããšã«ããŸããã
ä»åã¯ãé倧æãšããŠãWord2vecã§ã¢ããªã³ã°ããmodelã䜿ã£ãŠé ããååºãã¬ãã§çºæããå®éã«ãã®ãåºã«è¡ã£ãŠç¢ºããããšãããŸã§ãããŸãïŒ
æååºã®ã©ãŒã¡ã³ã«å¯ŸããŠé¡äŒŒåºŠãé«ãã©ãŒã¡ã³åºãæ¢ãã€ã¡ãŒãžã§ãã
techgymããã®ããã°ã«æ²èŒããã ããŸããïŒããããšãããããŸãã
ã人工ç¥èœã®ç¡é§é£ã?ãAIããã°ã©ãã³ã°ã®é¢çœèšäºããŸãšããŠã¿ãŸããã
ïŒ.ã©ãŒã¡ã³ã¬ãå¢ã®æ©ã¿
ãçŸå³ãããã©ãŒã¡ã³å±ã®æ°èŠéæã«å°ã
幎éïŒïŒïŒæ¯ä»¥äžé£ã¹ãã¬ãå¢ã®åå®ãªæ©ã¿ãšããŠãè¡åå¿
è³ã®äººæ°åºãã¯ãã³ãã§è©±é¡ã®ãåºãªã©ãæ±äº¬éœå
ã§äººæ°ã®ã©ãŒã¡ã³åºã¯è¡ãå°œãããŠããŸãããçŸå³ãããã©ãŒã¡ã³å±ã®æ°èŠéæã®ãã¿ã«å°ããšããããšããããšæããŸããããããã§ããã
ãšããããšã§ã
ãèªåã®å¥œã¿ã®ã©ãŒã¡ã³å±ã®å³ã«è¿ããé ããååºãèªåã§æ¢ããã
ãšããçµè«ã«èŸ¿ãçããŸããã
ïŒ.é ããååºãçºæããããžãã¯ã®æµã
ãé ããååºãã®å®çŸ©
é£ã¹ãã°ã©ã³ãã³ã°ïŒäœã®ãåºã®å£ã³ããšé¡äŒŒåºŠãé«ããã€å£ã³ã件æ°ãå°ãªããåºããé ããååºããšå®çŸ©ããŸãã
å£ã³ãã®é¡äŒŒåºŠãé«ããâãã©ãŒã¡ã³ã®å³ã»ã¯ãªãªãã£ãè¿ããã
å£ã³ãã®ä»¶æ°ãå°ãªããâãç¥å床ãäœã
ããã«ãããé£ã¹ãã°ã©ã³ãã³ã°ïŒäœäžŠã¿ã®ããã³ã·ã£ã«ãããã«ãé¢ããããç¥å床ãäœãããããããé ããååºããæ¢ãããšãã§ããã¯ãïŒ
ãšããããšã§ãå®éã«ãã£ãŠã¿ãŸããïŒ
ãã£ãããïŒè¡ã§ããæ¹ãæžããš
ãâ åŠç¿ããŒã¿ã®ååŸ
ãâ¡word2vecã§ã¢ããªã³ã°
ãâ¢ã¬ã³ã¡ã³ãããžãã¯äœæ
ãšãªããŸãã
â åŠç¿ããŒã¿ã®ååŸ
詳ããã¯ãã¡ãââã§èª¬æããŠããŸãã
第ïŒåŒŸïŒãPythonãã©ãŒã¡ã³ã¬ãå¢ã«ããã¬ãå¢ã®ããã®é£ã¹ãã°ã¹ã¯ã¬ã€ãã³ã°
è¯è³ªãªå£ã³ãããŒã¿ãéããããã«é£ã¹ãã°ããŒãžãã¹ã¯ã¬ã€ãã³ã°ããŠå¿ èŠãªæ å ±ãååŸããŸããã
ã¹ã¯ã¬ã€ãã³ã°ããéã®ãã€ã³ããšããŠã¯ã
"ã©ãŒã¡ã³å±ã§ãã€ç¹æ°ãé«ãååºã®ã¿"
ã«çµã£ãŠè¯è³ªãªå£ã³ããååŸããããšã§ãã
ã¹ã¯ã¬ã€ãã³ã°ã§äžèšã®æ å ±ã¯ãã¡ã
ã»åºèåïŒstore_name
ã»é£ã¹ãã°ç¹æ°ïŒscore
ã»å£ã³ã件æ°ïŒreview_cnt
ã»å£ã³ãæç« ïŒreview
å£ã³ããïŒä»¶ãã€ååŸããåŸã«ãããŒã¿ãã¬ãŒã ã«ãŸãšããŸããã
â»é£ã¹ãã°èŠçŽã«ããšã¥ãå£ã³ãã«é¢ããç®æã«ã¯ã¢ã¶ã€ã¯ããããŠãããŸãããäºæ¿ãã ããã
â¡word2vecã§ã¢ããªã³ã°
詳ããã¯ããã¡ãââ
第ïŒåŒŸïŒãPythonãã©ãŒã¡ã³ã¬ãå¢ã«ããã¬ãå¢ã®ããã®Word2vecã«ããèªç¶èšèªåŠç
ã¹ã¯ã¬ã€ãã³ã°ã§ååŸããå£ã³ããåŠç¿ããŒã¿ãšããŠãword2vecã§ã¢ããªã³ã°ãããšããã©ãŒã¡ã³ãã«ç¹åããã¢ãã«æ§ç¯ãããããšãã§ããŸããã
âã¯ãåæ§çãªã©ãŒã¡ã³ã§ç¥ãããŠãããäºéããšé¡äŒŒåºŠãé«ãã¯ãŒãã衚瀺ããŠããŸãã
# ã¢ãã«ã®ããŒã
word2vec_ramen_model=word2vec.Word2Vec.load("../model/word2vec_ramen_model.model")
word2vec_ramen_model.most_similar("äºé")
>>>
[('ã©ãŒã¡ã³äºé', 0.7518627643585205),
('äºéç³»', 0.7041865587234497),
('ã€ã³ã¹ãã€ã¢', 0.6942269802093506),
('äžéæ¯', 0.6394986510276794),
('ã¡ã°ãž', 0.6040332317352295),
('ã€ãµã€', 0.5899537205696106),
('ä¹³å', 0.5867205858230591),
('çŽç³»', 0.5784134268760681),
('è±äº', 0.5678684711456299),
('äžä¹æ±', 0.567740261554718)]
èŠäºãäºéãã«è¿ãåèªã䞊ã³ãŸããïŒ
解説ãããšããäžéæ¯ãã¯ã©ãŒã¡ã³äºéäžéæ¯åºãæãããã¡ã°ããã¯ã©ãŒã¡ã³äºéç®é»åºã®ããšã§ãããè±äºããäºéç³»ã®ãåºã§ããã
ã¢ãã«ãã§ããã®ã§æåŸã«ã¬ã³ã¡ã³ãããžãã¯ãäœæããŸãã
â¢ã¬ã³ã¡ã³ãããžãã¯äœæ
ãã£ããïŒè¡ã§ãŸãšãããšã
ãâ
.ã³ãŒãã¹ã®äžèº«ãkmeansã§ã¯ã©ã¹ã¿ãªã³ã°
ãâ
¡.TF-IDFã§æç« ã«ãããç¹åŸŽçãªåèªãæœåº
ãâ
¢.ãåºéã®é¡äŒŒåºŠãèšç®
ãâ
£."é ããååº"床ãã¹ã³ã¢å
ãšãªããŸãã
â .ã³ãŒãã¹ã®äžèº«ãkmeansã§ã¯ã©ã¹ã¿ãªã³ã°
ãªããkmeansã§ã¯ã©ã¹ã¿ãªã³ã°ãå¿
èŠããšãããšãå£ã³ãã«ã¯ãã©ãŒã¡ã³ã®å³ã ãã§ãªããæ§ã
ãªå£ã³ããå«ãŸããŠããããã§ããã©ãŒã¡ã³ãšã¯é¢ä¿ãªãã¯ãŒããäºåã«åŒŸãããšãç®çã§ãã
ã¯ã©ã¹ã¿ãªã³ã°ã§ã¯ãŒããçµããã«é²ãããšãäŸãã°ããåºã®æå¯é§
ãåãåºããé¡äŒŒåºŠé«ããªããäžäœã«ã¬ã³ã¡ã³ããããŠããŸããŸãã
å£ã³ãäŸ
â»å®éšçšã«äœã£ããªãªãžãã«ã§ãã
éœå¶äžç°ç·ããé§ ããæ©ããŠïŒåçšãè¡åãç®å°ãšãªã£ãŠããããããèŠã€ããããšãã§ããŸãããæ°Žææ¥å€ã®17æ50åé å°çã§10çªç®ã§ããã
åŸ ã€ããš20åãæºãæããŠå ¥åºãåºå ã¯ããæããŠãåžã¯6åžãããã§ããã
æ©éåžå£²æ©ã§ãæ·¡éºäžè¯ãã°800åã»åãç200åã»å³ç100åãè³Œå ¥ã
ãããå€ã®æ·¡éºã©ãŒã¡ã³ïŒããã¯æšããªãç ®å¹²ãã®äžåãªæšå³ãæãããæšãã錻ã«æããéŠããçŽ æŽãããã
çè±ã䞻匵ããããªãçžã®äžã®åæã¡çãªè¯ãä»äºããŠãŸãããå³çãåçã§ã¯ãªãããã®æšãã
è±ãã£ãŒã·ã¥ãŒïŒè§ç ®ïŒããã«ããŠæããããæšå³ãæ矀ãªè§ç ®ããã®ã¹ãŒããšãã¹ããããããŠãããå¡©åã¯åŒ·ããªãããšã°ããçšããç ®å¹²ãã®æšå³æº¢ããã¹ãŒããšå ·æã©ããçŽ æŽããããã©ã³ã¹ã§äœãããŠãããªæšå³ãã³ã¯ãäœé»ã©ããçŽ æŽãããã
ãããªã¯ãªãªãã£ã®é«ãç ®å¹²ããã°ã¯ãããããäœåºŠãé£ã¹ãããªããŸãã
ãããé£ã¹ãã°3.8ãè¶ ããŠãã ããããŸããã
ããšãåºäž»ã¯ã¬ã¿ã€ããšã«ããã§ãããŠäžèŠæé¢ãªãã§ããããšã£ãŠãäžå¯§ãªæ¥å®¢ã§ã¡ãã£ãšæå€ã§ããã
âã®å£ã³ããã¿ããšã倧åã¯ã©ãŒã¡ã³ãšã¯ç¡é¢ä¿ãªã¯ãŒããå€ãå«ãŸããŠããŸãã
ä»åã®ç®çã¯ãçŽç²ã«ãã©ãŒã¡ã³ãã®é¡äŒŒåºŠã枬ãããã£ãã®ã§ãã©ãŒã¡ã³ã«é¢ããèšèã ãã«çµããããªãšæããŸããã
ããã§ã掻çšããã®ã "kmeans" ã§ãã
from collections import defaultdict
from gensim.models.keyedvectors import KeyedVectors
from sklearn.cluster import KMeans
model = KeyedVectors.load('../model/word2vec_ramen_model.model')
max_vocab = 30000 #40000ã«ããŠãçµæã¯åãã ã£ã
vocab = list(model.wv.vocab.keys())[:max_vocab]
vectors = [model.wv[word] for word in vocab]
n_clusters = 6ã#ã¯ã©ã¹ã¿ãŒæ°ã¯ãã¡ãã§ä»»æã®å€ãå®ãã
kmeans_model = KMeans(n_clusters=n_clusters, verbose=0, random_state=42, n_jobs=-1)
kmeans_model.fit(vectors)
cluster_labels = kmeans_model.labels_
cluster_to_words = defaultdict(list)
for cluster_id, word in zip(cluster_labels, vocab):
cluster_to_words[cluster_id].append(word)
for words in cluster_to_words.values():
print(words[:20])
âã¯å£ã³ãã®äžäŸã§ãããå£ã³ãã®ã¯ãŒããkmeansã§ïŒã€ã«åé¡ãããšä»¥äžã®ããã«ã¯ã©ã¹ã¿ãªã³ã°ãããŸããã
ã¯ã©ã¹ã¿ãŒå¥ã«20åãã€åèªã衚瀺ã
匷åŒã§ãããåã¯ã©ã¹ã¿ãŒããšã«ååãä»ããŠã¿ãŸããã
def change_dict_key(d, old_key, new_key, default_value=None):
d[new_key] = d.pop(old_key, default_value)
change_dict_key(cluster_to_words, 0, 'æ¥ä»ããåºã®è©äŸ¡ããããçšèªã«é¢ããã¯ãŒã')
change_dict_key(cluster_to_words, 1, '人ãæ¥å®¢ãå
è£
ã«é¢ããã¯ãŒã')
change_dict_key(cluster_to_words, 2, 'ãã®ä»ã®ã¯ãŒã')
change_dict_key(cluster_to_words, 3, 'åžå£²æ©ã泚æã«é¢ããã¯ãŒã')
change_dict_key(cluster_to_words, 4, 'ææ¥æéãåºèã®å°ççãªã¯ãŒã')
change_dict_key(cluster_to_words, 5, 'ã©ãŒã¡ã³ã®äžèº«ã«é¢ããã¯ãŒã')
df_dict = pd.DataFrame.from_dict(cluster_to_words, orient="index").T
df_dict.ix[:,[5,3,1,4,0,2]]
â ã©ãŒã¡ã³ã®äžèº«ã«é¢ããã¯ãŒã
â¡åžå£²æ©ã泚æã«é¢ããã¯ãŒã
â¢äººãæ¥å®¢ãå
è£
ã«é¢ããã¯ãŒã
â£ææ¥æéãåºèã®å°ççãªã¯ãŒã
â€æ¥ä»ããåºã®è©äŸ¡ããããçšèªã«é¢ããã¯ãŒã
â¥ãã®ä»ã®ã¯ãŒã
è©Šè¡é¯èª€ããçµæã
ãâ ã©ãŒã¡ã³ã®äžèº«ã«é¢ããã¯ãŒã
ãâ¢åžå£²æ©ã泚æã«é¢ããã¯ãŒã
ã ãã«ã¯ãŒããçµããšãã¬ã³ã¡ã³ãã®çµæãæããããªããŸããã
å
ã»ã©ã®å£ã³ããçµã£ãçµæããã¡ãã
â ¡.TF-IDFã§æç« ã«ãããç¹åŸŽçãªåèªãæœåº
å£ã³ãããâ ã§ã©ãŒã¡ã³ã«é¢ããã¯ãŒãã«çµãããã®äžã§TF-IDFå€ã®é«ãã¯ãŒããæœåºããŸãã
# åè https://qiita.com/tatsuya-miyamoto/items/f1539d86ad4980624111
from gensim import corpora
from gensim import models
taste_words = cluster_to_words['ã©ãŒã¡ã³ã®äžèº«ã«é¢ããã¯ãŒã']
kenbaiki_words = cluster_to_words['åžå£²æ©ã泚æã«é¢ããã¯ãŒã']
taste_words.extend(kenbaiki_words)
ramen_word = taste_words
cluster_to_words.keys()
# ææž
f = open('../work/ramen_corpus.txt','r',encoding="utf-8")
trainings = []
for i,data in enumerate(f):
word = data.replace("'",'').replace('[','').replace(']','').replace(' ','').replace('\n','').split(",")
trainings.append([i for i in word if i in ramen_word])
# åèª->idå€æã®èŸæžäœæ
dictionary = corpora.Dictionary(trainings)
# textsãcorpuså
corpus = list(map(dictionary.doc2bow,trainings))
# tfidf modelã®çæ
test_model = models.TfidfModel(corpus)
# corpusãžã®ã¢ãã«é©çš
corpus_tfidf = test_model[corpus]
# id->åèªãžå€æ
texts_tfidf = [] # id -> åèªè¡šç€ºã«å€ããææžããšã®TF-IDF
for doc in corpus_tfidf:
text_tfidf = []
for word in doc:
text_tfidf.append([dictionary[word[0]],word[1]])
texts_tfidf.append(text_tfidf)
from operator import itemgetter
texts_tfidf_sorted_top20 = []
#TF-IDFå€ãé«ãé ã«äžŠã³æ¿ãäžäœåèª20åã«çµãã
for i in range(len(texts_tfidf)):
soted = sorted(texts_tfidf[i], key=itemgetter(1),reverse=True)
soted_top20 = soted[:20]
word_list = []
for k in range(len(soted_top20)):
word = soted_top20[k][0]
word_list.append(word)
texts_tfidf_sorted_top20.append(word_list)
# çµæãããŒã¿ãã¬ãŒã ã«è¿œå
df = pd.read_csv('../output/tokyo_ramen_review.csv')
df_ramen = df.groupby(['store_name','score','review_cnt'])['review'].apply(list).apply(' '.join).reset_index().sort_values('score', ascending=False)
df_ramen['texts_tfidf_sorted_top20'] = texts_tfidf_sorted_top20
df_ramen['id'] = ['ID-' + str(i + 1).zfill(6) for i in range(len(df_ramen.index))]
df_ramen_texts_tfidf_sorted_top20 = df_ramen.iloc[:,[5,0,1,2,4]].reset_index(drop=True)
df_ramen_texts_tfidf_sorted_top20
pickle.dump(df_ramen_texts_tfidf_sorted_top20, open('../work/df_ramen_texts_tfidf_sorted_top20', 'wb'))
ããã§ã©ãŒã¡ã³åºXã®å£ã³ãã®ç¹åŸŽãè¡šããŠããåèªã®æœåºããããšãã§ããŸããã
â»ç»åã§ã¯7åèªã§ãããå®éã¯20åèªã§ãã£ãŠããŸãã
â ¢.ãåºéã®é¡äŒŒåºŠãèšç®
å£ã³ãã®ç¹åŸŽãè¡šããŠããåèªã®æœåºã«æåããŸããã®ã§ã次ã«åèªéé¡äŒŒåºŠãç·åœããã§èšç®ããå¹³åããšããŸããäŸãšããŠãç ®å¹²ãç³»ã®ã©ãŒã¡ã³åºïŒåºã®é¡äŒŒåºŠãèšç®ããŠã¿ãŸãã
â»ç»åã§ã¯7åèªã§ãããå®éã¯20åèªã§ãã£ãŠããŸãã
âã®èšç®ããç
®å¹²ããâãè§ç
®ãã»ã»ã»ãæ·¡éºããŸã§ãã¹ãŠã«å¯ŸããŠè¡ããŸãã
åèªæ°åã®é¡äŒŒåºŠãåºããšããã§ãããã«å¹³åãåããŸãã
ãã®äœæ¥ãã¹ã¯ã¬ã€ãã³ã°ã§ååŸããåºãã¹ãŠç·åœããã§èšç®ããé¡äŒŒåºŠãé«ãé ã«äžŠã³æ¿ããŸãã
from itertools import product
f = open('../work/df_ramen_texts_tfidf_sorted_top20','rb')
store_df = pickle.load(f)
store_cross = []
for ids in product(store_df['id'], repeat=2):
store_cross.append(ids)
store_cross_df = pd.DataFrame(store_cross, columns=['id_x', 'id_y'])
store_cross_detail = store_cross_df.merge(
store_df[['id','store_name','score','review_cnt','texts_tfidf_sorted_top20']], how='inner', left_on='id_x', right_on='id'
).drop(columns='id').merge(
store_df[['id','store_name','score','review_cnt','texts_tfidf_sorted_top20']], how='inner', left_on='id_y', right_on='id'
).drop(columns='id')
store_cross_detail = store_cross_detail[store_cross_detail['id_x'].isin(store_df['id'].loc[0:50])]
store_cross_detail = store_cross_detail.reset_index(drop=True).sort_values(['id_x'])
ã©ãŒã¡ã³åºxãšã©ãŒã¡ã³åºyã®é¡äŒŒåºŠãç®åº
##ã©ãŒã¡ã³åºxã«å¯ŸããŠã©ãŒã¡ã³åºyã®é¡äŒŒåºŠãç®åº
import itertools
from tqdm import tqdm
#ã³ãµã€ã³é¡äŒŒåºŠãç®åºããé¢æ°ãå®çŸ©
def cos_sim(v1, v2):
return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
#cossimã ãã®çµã¿åããïŒåãã¯ãŒãå士ã®çµã¿ããããã§ãŠããããïŒ
#2次å
ãïŒæ¬¡å
ã«ãããsetãéè€ãåé€ãŠããªãã€ã
uniq_words = list(set(itertools.chain.from_iterable(store_df['texts_tfidf_sorted_top20'].values)))
scores = {}
for word1, word2 in product(uniq_words, repeat=2):
scores[(word1, word2)] = cos_sim(word2vec_ramen_model.wv[word1],word2vec_ramen_model.wv[word2])
avg_avg_scores = []
for i in tqdm(range(len(store_cross_detail['texts_tfidf_sorted_top20_x']))):
avg_scores = []
for j in range(len(store_cross_detail['texts_tfidf_sorted_top20_x'][i])):
word_cross_scores = []
word_a = store_cross_detail['texts_tfidf_sorted_top20_x'][i][j]
for k in range(len(store_cross_detail['texts_tfidf_sorted_top20_y'][i])):
word_b = store_cross_detail['texts_tfidf_sorted_top20_y'][i][k]
score = scores[(word_a, word_b)]#åèªéã®ã¹ã³ã¢ãåºãã
word_cross_scores.append(score)
avg_scores.append(np.mean(word_cross_scores))#20åã®åèªéã¹ã³ã¢ã®å¹³åå€
avg_avg_scores.append(np.mean(avg_scores))#20åã®åèªéã¹ã³ã¢ã®å¹³åå€ã®å¹³åå€
store_cross_detail.insert(6, 'avg_cos_sim_rate', avg_avg_scores)
# ãäºéããšé¡äŒŒåºŠãé«ãã©ãŒã¡ã³å±ãé«ãé ã«è¡šç€º
store_cross_detail = store_cross_detail.sort_values(['id_x', 'avg_cos_sim_rate'], ascending=[True, False])
df_sim_x = store_cross_detail[store_cross_detail['store_name_x'].str.contains('äºé')]
df_sim_x.reset_index(drop=True)
def min_max(x, axis=None):
min = x.min(axis=axis, keepdims=True)
max = x.max(axis=axis, keepdims=True)
result = (x-min)/(max-min)
return result
b = df_sim_x['avg_cos_sim_rate']
c = min_max(b.values)
df_sim_x.insert(7, 'æ£èŠå', c)
df_sim_x
ã©ãŒã¡ã³äºéã²ã°ãã¶äžé§
ååºãšé¡äŒŒåºŠãé«ããåºïŒé«ãé ïŒ
ã©ãŒã¡ã³äºéã²ã°ãã¶äžé§ ååºïŒxïŒã«é¡äŒŒåºŠãé«ãã©ãŒã¡ã³å±ïŒyïŒã¯ã
ïŒäœïŒ"ã©ãŒã¡ã³äºé ã²ã°ãã¶äžé§
ååº"
ïŒäœïŒ"ããŒãã éž"
ïŒäœïŒ"ã©ãŒã¡ã³äºé åå·åº"
ïŒäœïŒ"ã©ãŒã¡ã³å¯å£«äžž ææ²»éãéœé»æ¢¶ååº"
ïŒäœïŒ"ã©ãŒã¡ã³äºé æ¡å°é§
ååº"
ãšããçµæã§ããã
ïŒäœã¯"ã©ãŒã¡ã³äºé ã²ã°ãã¶äžé§
ååº"ã«ãªãã®ã¯å£ã³ããåãã ããã§ãã
ïŒäœãïŒäœã®ãåºãšåçã§æ¯èŒããŠã¿ãŸãã
ããããäºé ã²ã°ãã¶äžé§
ååºãããããããŒãã éžããããããããäºé åå·åº
ãããããããããããã
åçããã§ãé¡äŒŒåºŠã®é«ããããããŸããã
mikoãããããããããã³ã¡ã³ããããã ããŸããïŒ
äºéã¯ä¹³åãšéä¹³åã®ã¹ãŒãã«åé¡ã§ãããã§ããã©ãæå ¥ããã²ã°ãã¶äžäºéã¯ä¹³åã ã£ãã¯ãã§ãåºåãããåå·ãšãæ¡å°ãä¹³åãªã®ã§äžæããã£ãŠããªããšæããŸããïŒ
â £.é ããååºãçºæããïŒ"é ããååº"床ãã¹ã³ã¢åïŒ
ãããããå ·äœçã«"é ããååº"ãçºæããäœæ¥ãããããã«ãååºåºŠãæ°å€ãä»ããŠã¹ã³ã¢åããŸãããä»åã¯äžçåã®ãã·ã¥ã©ã³ïŒãæã©ãŒã¡ã³ã§ãããèŠãïŒ å·£éŽšé§ ã«å¯ŸããŠé¡äŒŒåºŠãé«ãïŒé ããŠãããåºãæ¢ããŸãã
ãèŠãããåç¥ã§ãªãéã¬ãå¢ã®æ¹ã¯æ¯éâã®èšäºã§äºç¿ããŠãã ããïŒ
https://icotto.jp/presses/1708
store_cross_detail = store_cross_detail.sort_values(['id_x', 'avg_cos_sim_rate'], ascending=[True, False])
df_sim_x = store_cross_detail[store_cross_detail['store_name_x'].str.contains('èŠ')]
df_sim_x.reset_index(drop=True)
def min_max(x, axis=None):
min = x.min(axis=axis, keepdims=True)
max = x.max(axis=axis, keepdims=True)
result = (x-min)/(max-min)![x_y_é ããååºåºŠ.jpg](https://qiita-image-store.s3.amazonaws.com/0/327405/66e75204-2d00-f399-d62e-f01ed492633e.jpeg)
return result
b = df_sim_x['avg_cos_sim_rate']
c = min_max(b.values)
df_sim_x.insert(7, 'æ£èŠå', c)
d = df_sim_x['review_cnt_y']
e = 1-min_max(d.values)
df_sim_x.insert(11, 'ã¬ãã¥ãŒæ°_æ£èŠå', e)
f = df_sim_x['æ£èŠå']*(df_sim_x['ã¬ãã¥ãŒæ°_æ£èŠå'])
df_sim_x.insert(9, 'é ããååº_score', f)
df_kakureta_meiten = df_sim_x.sort_values('é ããååº_score', ascending=False)
df_kakureta_meiten[df_kakureta_meiten['review_cnt_y'] < 100]
é ããååºåºŠãã¹ã³ã¢åããããã«ãã¬ãã¥ãŒæ°ãå°ãªããåºãïŒãå€ããåºãïŒã«è¿ã¥ãããã«æ£èŠåããŸããã
ãããšã
é¡äŒŒåºŠãæ£èŠåããã¹ã³ã¢ à ã¬ãã¥ãŒæ°ãæ£èŠåããã¹ã³ã¢ ïŒ é ããååºåºŠ
ãã®ããã«ããŠé ããååºåºŠãæ°å€ã§è¡šããŸãã
äžçåã®ãã·ã¥ã©ã³ïŒãæã©ãŒã¡ã³ãèŠã察ããŠã«é¡äŒŒåºŠãé«ãããã¬ãã¥ãŒæ°ãå°ãªããé ããååºã¯ã
ïŒäœïŒ"éººå± äžå·æ äœååº"ã73.1ãã€ã³ã
ïŒäœïŒ"ããŒãã MAIKAGURA"ã64.1ãã€ã³ã
ïŒäœïŒ"éºµå± è¥¿å·"ã63.2ãã€ã³ã
ããããããããããããããããããããèŠ
ãããããããããããããããã
ãããéººå± äžå·æ äœååºããããããMAIKAGURAãããããããããéºµå± è¥¿å·
ããããããããããã
ïŒäœã¯ãéººå± äžå·æ äœååºããšããçµæãšãªããŸããïŒ
äžäœïŒåºã¯ãã©ããæ·¡éºç³»ã§åçããã䌌ãŠããããšãããããŸãã
ïŒ.é ããååºã«è¡ã£ãŠã¿ãïŒ
æ¬åœã«é ããååºãã©ããã確ãããããã«å®éã«è¡ã£ãŠã¿ãŸããã
éŠç³žçºé§
ããæ©ããŠ7åããæŒã®æé垯ã§ãããã䞊ã¶ããšãªãå
¥åºã§ããŸããã
æã«ãé€æ²¹ã©ãŒã¡ã³ãããããã§ãããšæžããŠãã£ãã®ã§ãçŽ çŽã«ãç¹è£œé€æ²¹ã©ãŒã¡ã³ã1,100åã泚æãæ«ãããŠã©ãŒã¡ã³ãç䞌ãèŠãç®ããããŠãã¯ãªãªãã£ã®é«ãæ·¡éºç³»ãããã¯æåŸ
ããã«ã¯ããããŸãããã§ã¯æ©éäžå£ããããïŒå£ã®äžã«é€æ²¹ã®ãŸãããããšãã©ã¢ã°ã©ã®ããŸå³åºãã絶åŠãªã¹ãŒããããããŸãïŒ ééããªãçŸå³ããã©ãŒã¡ã³ã§ããã倧æºè¶³ã§ãã
åºå
ãèŠæž¡ããšèžèœäººãã©ãŒã¡ã³è©è«å®¶ã®ãµã€ã³ããããã
ãã¬ãã«ã玹ä»ãããŠããããã§ãæåãªãåºã ã£ããšããããšãåŸããç¥ããŸããã
é£ã¹ãã°ã®ã¬ãã¥ãŒãå°ãªãã£ãã®ã¯ããªãã¥ãŒã¢ã«ãªãŒãã³ããŠãã2幎ããçµã£ãŠããªãããšãåå ã ã£ãããã§ãã
ç¥ã£ãŠãã人ãããããšãé ããååºããšããã®ã¯å€§è¢è£ãããããŸããããã¬ãã«ãé«ãã©ãŒã¡ã³ã䞊ã°ãã«é£ã¹ãããç©Žå Žçãªååšã§ããããšã«ã¯ééããªããšæããŸãã
éººå± äžå·æ äœååºïŒhttps://tabelog.com/tokyo/A1312/A131201/13205611/
ã©ãŒã¡ã³ã³ã©ã ïŒhttps://www.syokuraku-web.com/column/3161/
ïŒ.課é¡
ä»åã玹ä»ãããéººå± äžå·æ äœååºãã¯ãç§çã«ããèŠããšé¡äŒŒåºŠãé«ãã©ãŒã¡ã³å±ãšããããšæããŸããããã¹ãŠã®çµæã现ããèŠãŠãããšçµæã埮åŠã ã£ããåºããããŸããã
äŸãã°ãTF-IDFã§ãããã³ã°ãç¹åŸŽçãªåèªãšããŠæœåºãããŠããŸããšãã¹ãŒãã®çš®é¡ãéãã©ãŒã¡ã³å±å士ãé¡äŒŒåºŠãé«ããªããšããçŸè±¡ãèµ·ãããŸãããŸããäžã€ã®ãåºã§ãçš®é¡ã®éãã©ãŒã¡ã³ãæäŸããŠããŠå£ã³ããåè¡¡ããŠããå ŽåãäŸãã°ãå¡©ã©ãŒã¡ã³ãé€æ²¹ã©ãŒã¡ã³ãå³åã©ãŒã¡ã³ã©ããã¯ãªãªãã£ãé«ããåºã§ããTF-IDFã§æœåºããåèªã«ããæ¿åã»æ·¡éºã»å¡©ã»å³åã»é€æ²¹ãã®ããã«ççŸããã¯ãŒãã䞊ãã§ããŸãå¯èœæ§ããããŸãã
ãã®ãããã®åé¡ç¹ãã©ã®ããã«è§£æ¶ããããä»åŸã®èª²é¡ã§ãã
ïŒ.ãŸãšã
ãã¯ã¿ãŒã¹ããããæãã£ãŠããŠå»è ããã©ãŒã¡ã³ãæ§ããããããããŠããŸããããä»åã°ããã¯ææ ¢ã§ããé£ã¹ãŠããŸããŸããããã
ä»åã¯ãé ããååºãçºæããšããç®çã§æ©æ¢°åŠç¿ã«ãã£ã¬ã³ãžããŸãããã課é¡ã¯ãããã®ã®æŠãåœåã®ç®çã¯éæããããšãã§ããã®ããªãšæããŸãã
é ããååºãšããŠçºæã§ããŸãããéººå± äžå·æ äœååºãã¯ãç§ã®ç掻åå€ãšããããšããããä»åã®äŒç»ããªããã°ããã£ãšäžçé£ã¹ãããšã¯ãªãã£ãã©ãŒã¡ã³å±ã§ãã
ç§ã«ãšã£ãŠããããŸã§èŸ¿ãçãéã®ãã¯æ³å®ä»¥äžã«ããŒãã ã£ãããšããããèªåã§èŠã€ãããé ããååºãã¯æãåºã«æ®ã"äžæ¯"ãšãªããŸããã
次åã¯ãçªå€ç·šãšããŠãå¯æãåºå¡ãããããã©ãŒã¡ã³åºãé£ã¹ãã°å£ã³ãããèªç¶èšèªåŠçã§æœåºããŠã¿ãã«ãã£ã¬ã³ãžããäºå®ã§ãã