@15698-sai (開基斎藤)posted at 2022-11-04

2つのファイルの共通用語について

Q&A

Closed

解決したいこと

小文字,大文字関係なく共通用語としてカウントさせるようにしたい.
このソースコードでは、csvのファイルとテキストファイルを比較して、共通用語があったらカウントするプログラムです.
Chinaとacuteを出力結果でカウントを2としたいのですが分からなかったので教えて頂きたいです.

ソースコード

with open('1.csv','r') as f:
    rows = f.readlines()
         
with open('1.txt','r') as f:
    text = f.read()

id_count = {}

with open('re_1.txt','w') as f:
    f.write('用語, 回数, id\n')
    for row in rows:
           
        tmp = row.split(',')
        id = tmp[0]
            
        word = tmp[1].strip()
           
        count = text.count(word)
        if count==0:
            pass
        else:
            f.write('%s,%d,%s\n' % (word, count, id))
            if id in id_count:
                id_count[id] += count
            else: 
                id_count[id] = count

1.txt

Severe Acute respiratory distress syndrome due to acute coronavirus (SARS-CoV-2), which was first diagnosed in china, China in December 2019.

1.csv

0000,acute
0000,distress
1111,coronavirus
1111,China

実行結果

用語, 回数, id
acute,1,0000
distress,1,0000
coronavirus,1,1111
China,1,1111

0 likes

2Answer

@tonberry1050 posted at 2022-11-04

取り出した単語を全て大文字か小文字に変えてから比較・保存しましょう。
str.lower()とすればChinaとchinaは両方chinaになり同じ文字列だと判定できますよね。

2Like

Comments

@tonberry1050
雑ですけど今回なら`word = tmp[1].strip().lower()`と`count = text.lower().count(word)`ですかね。

辞書ファイルに入力する時点で大文字は小文字にしておくと良いですよ

@HalHarada posted at 2022-11-04

with open(・・・) as f:　はwith終了時にcloseされるので f = open(・・・)を利用する方法もあります。あと、return ではなく、yieldも便利です。

さて、

import csv, io
def getCount(text, word):
    .......
　　　正規表現
    .......
    return count

def setCount(txt, csv2):
   f1 = open(txt,'r')
   text = f1.read()
   f1.close()
   
   with open(csv2,'r') as f:
     line = csv.reader(f, delimiter=',')
     for rows in line:
       id = rows[0]
       word = rows[1]
       count = getCount(text, word)
       yield word, count, id 

with open('re_1.txt','w') as f:
  f.write('用語, 回数, id\n')
  for word, count, id in setCount('1.txt', '1.csv'):
     f.write('%s,%d,%s\n' % (word, count, id))
  f.flush()

先ずは入出力の体裁を整えます。
関数を用いて役割を分離すると楽にプログラミング出来ますよ！
正規表現でマッチした位置とマッチした回数を返す関数がありました。

暇人ｘ in 居酒屋

1Like

Are you sure you want to delete the question?