LoginSignup
0
0

More than 3 years have passed since last update.

2020年センター国語の解答を予想してみる。

Posted at

前段

「Python 3 エンジニア認定データ分析試験」の勉強をしているので、実際にデータを分析してみようと思いました。
以前にpythonでセンター数学を解くというのをやったので、同じ題材で今回はセンター国語の解答予想にしました。
数学は数字だったり符号、記号だったりと絞りにくいので止めました。
センター試験がマークの間に試してみます。

環境

  • python3
  • jupyter notebook

事前準備

ネットで集められるだけ過去の解答を集めます。
解答順に並べていますが問によっては2つ選ぶ等あるので、、問の番号と完全に一致しているわけではないです。
また、年によって設問数が異なるので、不足している所は平均値で埋めるとします。
インデックスのaddは追試の意味です。
今回はひとまずカラムごとの最頻値で予想します。


year,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
2019,3,2,4,3,2,4,2,2,2,4,2,3,1,2,3,5,2,1,4,6,2,4,5,4,5,3,1,2,3,4,2,5,3,5,2,3,,,
2018,2,3,5,5,2,2,5,3,1,4,4,2,5,5,3,1,4,3,3,6,1,3,5,3,2,3,4,4,3,1,3,4,3,3,2,4,,,
2017,5,5,3,1,4,5,4,3,4,3,1,1,2,1,4,5,2,4,4,5,3,3,4,5,2,2,4,1,5,2,2,3,2,4,2,1,,,
2016,3,5,5,3,5,1,2,4,2,1,3,5,3,2,1,4,3,2,1,4,3,5,1,1,4,4,4,3,5,4,1,1,4,4,3,5,,,
2015,5,5,2,4,4,3,2,4,2,3,4,3,5,3,2,3,2,3,1,2,1,5,4,2,1,5,3,3,4,5,5,3,4,4,4,3,4,5,2
2014,2,3,4,2,3,4,2,4,3,2,1,5,4,1,3,4,3,1,4,6,5,1,4,5,3,2,1,4,4,3,5,1,5,3,1,5,,,
2013,1,3,1,1,5,1,5,2,5,3,1,1,5,4,2,4,2,1,2,4,2,4,3,1,5,4,2,5,3,2,1,3,3,2,4,3,5,,
2012,3,2,5,3,5,4,5,1,3,4,1,4,3,,15,1,3,2,3,5,4,,25,2,1,4,3,5,2,1,2,1,4,5,3,4,,,
2011,5,4,2,4,1,5,4,3,1,4,3,4,5,2,4,1,4,5,3,5,2,4,5,1,4,3,5,2,5,4,2,4,3,1,5,1,4,,
2010,3,1,1,2,2,3,5,5,3,4,3,2,4,3,1,1,2,4,3,4,4,3,2,1,3,5,1,5,4,5,5,3,4,1,5,4,,,
2009,1,5,3,1,5,3,3,2,4,3,6,1,3,5,3,5,1,4,4,5,3,1,4,4,3,2,1,5,2,3,1,4,3,5,5,4,,,
2008,4,3,2,1,3,1,2,3,1,3,4,4,4,2,2,5,2,1,2,4,1,3,4,5,2,4,2,3,4,3,4,3,2,3,5,4,2,5,
2007,1,4,5,5,3,2,4,5,3,2,3,5,5,1,4,2,2,2,5,4,3,2,5,1,3,4,1,5,2,5,1,5,3,3,4,3,,,
2006,2,2,2,4,5,1,4,5,3,1,1,3,5,1,4,5,3,2,3,1,4,1,2,1,1,2,4,1,3,4,5,5,2,1,2,,,,
2005_1_2_add,1,2,1,2,4,5,2,4,3,2,5,4,5,2,3,2,1,3,6,2,4,1,5,3,4,4,3,4,1,5,2,3,5,4,,,,,
2005_1_add,4,5,2,1,5,3,1,4,5,1,2,2,5,3,1,2,3,4,1,5,2,2,1,2,5,3,3,4,2,5,5,1,2,4,,,,,
2005_1_2,5,1,3,4,4,4,3,3,2,4,4,2,5,4,1,3,2,4,4,5,3,1,2,5,3,1,4,1,4,5,5,3,2,2,3,,,,
2004_1_2,1,1,4,5,3,1,5,3,2,4,5,2,4,1,5,1,5,4,3,3,4,1,2,4,5,2,3,2,1,5,3,5,2,1,1,6,,,
2004_1_add,5,3,4,2,2,2,1,3,3,4,1,4,2,2,5,1,3,1,6,3,4,1,1,5,5,2,4,4,3,4,2,1,4,5,3,4,,,
2004_1_2_add,5,4,4,5,3,5,2,3,1,4,6,5,4,1,3,4,4,1,2,5,3,2,1,4,3,5,5,3,1,4,5,3,4,4,1,2,1,2,5
2003_1_2,2,5,3,4,1,4,3,5,2,1,3,4,2,5,4,5,3,1,2,2,5,1,5,4,4,3,2,2,4,2,3,3,5,4,1,5,,,
2003_1_2_add,1,5,3,2,4,2,5,4,3,1,6,2,5,3,4,5,1,3,2,4,2,1,5,3,2,3,1,4,4,3,1,2,5,2,3,1,5,,
2003_1_add,5,3,4,1,2,4,3,1,1,3,5,4,3,2,2,4,5,1,3,6,4,4,2,3,4,1,6,5,3,5,1,5,4,3,2,4,7,,
2002_1_add,2,5,1,4,4,4,5,3,1,1,5,1,2,5,2,4,4,3,3,5,2,2,5,3,4,3,1,5,2,5,1,4,4,1,3,,,,
2002_1_2_add,2,1,4,3,5,4,5,1,4,2,6,4,3,3,3,1,5,3,4,1,3,5,5,5,5,2,3,4,3,4,4,2,4,1,3,2,3,,
2001_1_add,3,2,4,4,5,1,3,2,2,5,4,3,1,5,1,2,4,5,5,3,1,2,1,4,5,3,2,4,2,2,4,5,1,3,,,,,
2001_1_2_add,3,1,2,5,4,2,4,4,5,1,3,1,3,2,4,1,5,4,3,4,1,1,3,2,2,3,1,4,2,3,5,3,5,2,2,3,4,,
2000,1,3,5,4,3,3,4,1,5,1,6,2,5,3,4,2,5,3,4,5,5,2,5,1,4,3,2,5,1,3,6,2,4,3,2,1,5,,

プログラム

import pandas as pd
from sklearn.preprocessing import Imputer
df = pd.read_csv(r"C:\source\python\file/center_Japanese.csv", index_col='year')
# 欠損値を列の平均値で埋める
imp = Imputer(strategy='mean', axis=0)
imp.fit(df)
fill_df = pd.DataFrame(imp.transform(df), columns=[num for num in range(1, len(df.columns) + 1)])
fill_df.mode().fillna(0).to_csv(r"C:\source\python\file/output_center_Japanese.csv")

結果

出力した内容は下記の通りです。

,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
0,1.0,5.0,4.0,4.0,5.0,4.0,5.0,3.0,3.0,4.0,1.0,4.0,5.0,2.0,4.0,1.0,2.0,1.0,3.0,5.0,3.0,1.0,5.0,1.0,4.0,3.0,1.0,4.0,2.0,5.0,5.0,3.0,4.0,3.0,2.0,4.0,4.0,4.0,3.5
1,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,4.0,3.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
1,5 5 4 4 5 4 5 3 3 4 1,3 4 5 2 4 1 2 1 3 5 3,4 1 5 1 4,5 3 1 4 2,3,4 5 5 3 4 3,4 2,3 4 4 4 3.5

39番目だけ平均値が最頻値になってしまったので、いまいちな結果です。
ただ最近は解答数が36前後なので、無視していいと思います。
また、最頻値が複数あるところの絞り方を悩み中です。

感想

私が学生の時は3を選べと言われていましたが、こうして見ると4,5が多いのかな。
統計は勉強中なのでもっといい予想ができたらいいなと。
時間と情報があればほかの科目も試すかもです。
センター試験の結果が出たら比較したい気持ちはあります。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0