GCC コメント除去、単語計算
https://researchmap.jp/joo4thhg9-1778110/
TOPPERS/sspのコメントを除去し、有効なプログラムの命令、変数、関数名を数え上げ、分析するための資料とするため、
1 GCCでコメントを除去
2 スクリプトで単語を数え上げ
3 分析
という手順で進めます。
コメント除去をプログラムで行おうと思い探していましたが、うまくみつからず。
GCCで消すことに。
1
gcc-4.9 -fpreprocessed -dD -E 入力ファイル >> 出力ファイル
gcc-4.9 -fpreprocessed -dD -Ealarm.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ealarm.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eallfunc.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Earm_m.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ebanner.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ebanner.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ecfg1_out.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Echeck.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ecy8c5xlp.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ecyclic.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ecyclic.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Edataqueue.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Edataqueue.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eeventflag.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eeventflag.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eexception.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eexception.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Einterrupt.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Einterrupt.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eitron.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ekernel.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ekernel_cfg.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ekernel_cfg.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ekernel_impl.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ekernel_int.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ekernel_rename.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Ekernel_unrename.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Elog_output.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Elog_output.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Elogtask.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Elogtask.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_cfg1_out.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_config.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_config.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_insn.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_kernel.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_rename.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_sil.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_stddef.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_test.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_timer.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_timer.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eprc_unrename.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Equeue.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Erx600_uart.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Erx600_uart.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Esample1.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Esample1.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eserial.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Eserial.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Esil.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Estartup.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Esys_manage.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Esyslog.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Esyslog.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Et_stddef.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Et_syslog.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_cfg1_out.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_config.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_config.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_kernel.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_rename.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_serial.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_serial.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_sil.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_stddef.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_syssvc.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_test.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_timer.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etarget_unrename.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etask.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etask.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etask_manage.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etime_event.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etime_event.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etime_manage.c>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Etool_stddef.h>>../cmt/text
gcc-4.9 -fpreprocessed -dD -Evasyslog.c>>../cmt/text
上記処理で、下記の警告が出た。
sample1.h:58:0: warning: "ERRORTSK_PRIORITY" redefined
#define ERRORTSK_PRIORITY (2)
^
sample1.h:47:0: note: this is the location of the previous definition
#define ERRORTSK_PRIORITY (6)
^
sample1.h:59:0: warning: "MAIN_PRIORITY" redefined
#define MAIN_PRIORITY (3)
^
sample1.h:48:0: note: this is the location of the previous definition
#define MAIN_PRIORITY (7)
^
sample1.h:60:0: warning: "TASK1_PRIORITY" redefined
#define TASK1_PRIORITY (4)
^
sample1.h:49:0: note: this is the location of the previous definition
#define TASK1_PRIORITY (8)
^
sample1.h:61:0: warning: "TASK2_PRIORITY" redefined
#define TASK2_PRIORITY (5)
^
sample1.h:50:0: note: this is the location of the previous definition
#define TASK2_PRIORITY (9)
^
sample1.h:62:0: warning: "TASK3_PRIORITY" redefined
#define TASK3_PRIORITY (6)
^
sample1.h:51:0: note: this is the location of the previous definition
#define TASK3_PRIORITY (10)
^
sample1.h:63:0: warning: "TASK3_EXEPRIORITY" redefined
#define TASK3_EXEPRIORITY (5)
^
sample1.h:52:0: note: this is the location of the previous definition
#define TASK3_EXEPRIORITY (9)
^
t_stddef.h:234:0: warning: "assert" redefined
#define assert(exp) ((void) 0)
^
t_stddef.h:231:0: note: this is the location of the previous definition
#define assert(exp) ((void)((exp) ? 0 : (TOPPERS_assert_fail(#exp, \
^
t_stddef.h:260:0: warning: "MERCD" redefined
#define MERCD(ercd) ((ER)((((uint_t) (ercd))) | ~0xffU))
^
t_stddef.h:258:0: note: this is the location of the previous definition
#define MERCD(ercd) ((ER)((int8_t)(ercd)))
^
t_stddef.h:274:0: warning: "TMAX_RELTIM" redefined
#define TMAX_RELTIM ((RELTIM) LONG_MAX)
^
t_stddef.h:272:0: note: this is the location of the previous definition
#define TMAX_RELTIM ((RELTIM) UINT_MAX)
^
#2
###2.1 初版。URLのところからほとんど持ってきただけの版
#!/usr/bin/ruby
# -*- mode:ruby; coding:utf-8 -*-
# 2014.12.29 Eddited by Dr. OGAWA Kiyoshi
words = Hash.new(0)
File.open("full.txt","r") do |file|
file.read.downcase.scan(/\p{Letter}+/) do |word|
words[word] += 1
end
end
print "WORD\tFREQUENCY\n"
words.sort_by{|word,count| [-count,word]}.each do |word,count|
print "#{word}\t#{count}\n"
end
#https://sites.google.com/…/ruby-guan…/tango-hindo-wo-kazoeru
###2.2 標準入出力に対応した版
0.2版は_を除去していた。
結果を見て、文学ならよいが、プログラムだと_入りの単語で検討しないと無駄が多いことが分かり、0.3版で_入りとした。
ソースは
#!/usr/bin/ruby
# -*- mode:ruby; coding:utf-8 -*-
# Word Counter for source code in programming language without comment
# ver.0.1 2014.12.29, ver.0.2 2014.12.29, ver0.3 2014.12.30
# Eddited by Dr. OGAWA Kiyoshi
words = Hash.new(0)
while buf = STDIN.gets
break if buf.chomp == "exit"
buf.scan(/\w+/) do |word|
words[word] += 1
end
end
print "WORD\tFREQUENCY\n"
words.sort_by{|word,count| [-count,word]}.each do |word,count|
print "#{word}\t#{count}\n"
end
ps. 2015/04/06
$ ./wc.rb < n1570.txt > n1570wc.csv
./wc.rb:12:in `scan': invalid byte sequence in UTF-8 (ArgumentError)
from ./wc.rb:12:in `<main>'
3 分析
処理後のデータ(まだ作業をはじめた段階のもの)
https://researchmap.jp/mu5yptkmp-45645/
単語、頻度、分類、full spell
分類は
1 C言語予約語等
2 C言語プログラム上必要な定義(OS固有でないもの)
3 TOPPERS OS 固有名詞(関数、変数)
4 定数など
に分けようと作業しかけです。
迷いがあるので、一度全部分類したら、識者に相談。
頻度が1回のものは、別の分類をする
1 アセンブラで定義してCで呼び出している
2 定義していて使っていない(アプリなどで呼ぶことを想定しているもの)
3 定義していて使っていない(上記以外)
4 定数など
に分類し、3について精査する。作業ができたら、識者に相談。
----wc.rb
大文字・小文字集約版
#!/usr/bin/ruby
# Word Counter for source code in programming language without comment or standard document without 0C .
# ver.0.1 2014.12.29,
# ver.0.2 2014.12.29,
# ver.0.3 2014.12.30 standard I/O
# ver.0.4 2015.4.13 downcase
# https://sites.google.com/site/rubycocoamemo/Home/ruby-guan-lian/tango-hindo-wo-kazoeru
# Eddited by Dr. OGAWA Kiyoshi
words = Hash.new(0)
while buf = STDIN.gets
break if buf.chomp == "exit"
buf.downcase.scan(/\w+/) do |word|
words[word] += 1
end
end
print "WORD\tCOUNT\n"
words.sort_by{|word,count| [-count,word]}.each do |word,count|
print "#{word}\t#{count}\n"
end
p.s. 20170709追記
$ chmod 0777 wc.rb
$ ./wc.rb pcd2.txt
$ ./wc.rb: line 11: syntax error near unexpected token `('
$ ./wc.rb: line 11: `words = Hash.new(0)'
p.s. 2017
f = open('sample.txt')
data = f.read()
# counting
words = {}
for word in data.split():
words[word] = words.get(word, 0) + 1
# sort by count
d = [(v,k) for k,v in words.items()]
d.sort()
d.reverse()
for count, word in d[:1000]:
print count, word
最後までおよみいただきありがとうございました。
いいね 💚、フォローをお願いします。
Thank you very much for reading to the last sentence.
Please press the like icon 💚 and follow me for your happy life.