2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

2021-2022シーズンの欧州5大リーグを分析してみた

Last updated at Posted at 2023-01-16

この記事の概要

今回分析する対象リーグは、イタリア、スペイン、フランス、イングランド、ドイツの国内リーグ、いわゆる欧州5大リーグです。
点数を量産するフォワードとはどのような特徴があるのか機械学習を使って分析していきたいと思います。

データセット

データセットはkaggleに公開されている2021-2022 Football Player Statsです。

本編

データの読み込み

import pandas as pd

data = pd.read_csv('2021-2022_Football_Player_Stats.csv',sep = ';', encoding = 'ISO-8859-1')

データには全ポジションの選手データがあるが、今回アタッカーのみで分析したいので、アタッカーのみを取り出していきます。

pos = ['FW', 'MFFW', 'FWMF']
index = []
for i in range(2921) :
    if data['Pos'][i] in pos :
        index.append(i)
football_stats = data.iloc[index,:]

データセットにある特徴量は以下の通りになります。2021 年から 2022 年までの 90 分ごとのサッカー選手の統計が含まれています。

Rk : Rank
Player : Player's name
Nation : Player's nation
Pos : Position
Squad : Squad’s name
Comp : League that squat occupies
Age : Player's age
Born : Year of birth
MP : Matches played
Starts : Matches started
Min : Minutes played
90s : Minutes played divided by 90
Goals : Goals scored or allowed
Shots : Shots total (Does not include penalty kicks)
SoT : Shots on target (Does not include penalty kicks)
SoT% : Shots on target percentage (Does not include penalty kicks)
G/Sh : Goals per shot
G/SoT : Goals per shot on target (Does not include penalty kicks)
ShoDist : Average distance, in yards, from goal of all shots taken (Does not include penalty kicks)
ShoFK : Shots from free kicks
ShoPK : Penalty kicks made
PKatt : Penalty kicks attempted
PasTotCmp : Passes completed
PasTotAtt : Passes attempted
PasTotCmp% : Pass completion percentage
PasTotDist : Total distance, in yards, that completed passes have traveled in any direction
PasTotPrgDist : Total distance, in yards, that completed passes have traveled towards the opponent's goal
PasShoCmp : Passes completed (Passes between 5 and 15 yards)
PasShoAtt : Passes attempted (Passes between 5 and 15 yards)
PasShoCmp% : Pass completion percentage (Passes between 5 and 15 yards)
PasMedCmp : Passes completed (Passes between 15 and 30 yards)
PasMedAtt : Passes attempted (Passes between 15 and 30 yards)
PasMedCmp% : Pass completion percentage (Passes between 15 and 30 yards)
PasLonCmp : Passes completed (Passes longer than 30 yards)
PasLonAtt : Passes attempted (Passes longer than 30 yards)
PasLonCmp% : Pass completion percentage (Passes longer than 30 yards)
Assists : Assists
PasAss : Passes that directly lead to a shot (assisted shots)
Pas3rd : Completed passes that enter the 1/3 of the pitch closest to the goal
PPA : Completed passes into the 18-yard box
CrsPA : Completed crosses into the 18-yard box
PasProg : Completed passes that move the ball towards the opponent's goal at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area
PasAtt : Passes attempted
PasLive : Live-ball passes
PasDead : Dead-ball passes
PasFK : Passes attempted from free kicks
TB : Completed pass sent between back defenders into open space
PasPress : Passes made while under pressure from opponent
Sw : Passes that travel more than 40 yards of the width of the pitch
PasCrs : Crosses
CK : Corner kicks
CkIn : Inswinging corner kicks
CkOut : Outswinging corner kicks
CkStr : Straight corner kicks
PasGround : Ground passes
PasLow : Passes that leave the ground, but stay below shoulder-level
PasHigh : Passes that are above shoulder-level at the peak height
PaswLeft : Passes attempted using left foot
PaswRight : Passes attempted using right foot
PaswHead : Passes attempted using head
TI : Throw-Ins taken
PaswOther : Passes attempted using body parts other than the player's head or feet
PasCmp : Passes completed
PasOff : Offsides
PasOut : Out of bounds
PasInt : Intercepted
PasBlocks : Blocked by the opponent who was standing it the path
SCA : Shot-creating actions
ScaPassLive : Completed live-ball passes that lead to a shot attempt
ScaPassDead : Completed dead-ball passes that lead to a shot attempt
ScaDrib : Successful dribbles that lead to a shot attempt
ScaSh : Shots that lead to another shot attempt
ScaFld : Fouls drawn that lead to a shot attempt
ScaDef : Defensive actions that lead to a shot attempt
GCA : Goal-creating actions
GcaPassLive : Completed live-ball passes that lead to a goal
GcaPassDead : Completed dead-ball passes that lead to a goal
GcaDrib : Successful dribbles that lead to a goal
GcaSh : Shots that lead to another goal-scoring shot
GcaFld : Fouls drawn that lead to a goal
GcaDef : Defensive actions that lead to a goal
Tkl : Number of players tackled
TklWon : Tackles in which the tackler's team won possession of the ball
TklDef3rd : Tackles in defensive 1/3
TklMid3rd : Tackles in middle 1/3
TklAtt3rd : Tackles in attacking 1/3
TklDri : Number of dribblers tackled
TklDriAtt : Number of times dribbled past plus number of tackles
TklDri% : Percentage of dribblers tackled
TklDriPast : Number of times dribbled past by an opposing player
Press : Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball
PresSucc : Number of times the squad gained possession withing five seconds of applying pressure
Press% : Percentage of time the squad gained possession withing five seconds of applying pressure
PresDef3rd : Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball, in the defensive 1/3
PresMid3rd : Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball, in the middle 1/3
PresAtt3rd : Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball, in the attacking 1/3
Blocks : Number of times blocking the ball by standing in its path
BlkSh : Number of times blocking a shot by standing in its path
BlkShSv : Number of times blocking a shot that was on target, by standing in its path
BlkPass : Number of times blocking a pass by standing in its path
Int : Interceptions
Tkl+Int : Number of players tackled plus number of interceptions
Clr : Clearances
Err : Mistakes leading to an opponent's shot
Touches : Number of times a player touched the ball. Note: Receiving a pass, then dribbling, then sending a pass counts as one touch
TouDefPen : Touches in defensive penalty area
TouDef3rd : Touches in defensive 1/3
TouMid3rd : Touches in middle 1/3
TouAtt3rd : Touches in attacking 1/3
TouAttPen : Touches in attacking penalty area
TouLive : Live-ball touches. Does not include corner kicks, free kicks, throw-ins, kick-offs, goal kicks or penalty kicks.
DriSucc : Dribbles completed successfully
DriAtt : Dribbles attempted
DriSucc% : Percentage of dribbles completed successfully
DriPast : Number of players dribbled past
DriMegs : Number of times a player dribbled the ball through an opposing player's legs
Carries : Number of times the player controlled the ball with their feet
CarTotDist : Total distance, in yards, a player moved the ball while controlling it with their feet, in any direction
CarPrgDist : Total distance, in yards, a player moved the ball while controlling it with their feet towards the opponent's goal
CarProg : Carries that move the ball towards the opponent's goal at least 5 yards, or any carry into the penalty area
Car3rd : Carries that enter the 1/3 of the pitch closest to the goal
CPA : Carries into the 18-yard box
CarMis : Number of times a player failed when attempting to gain control of a ball
CarDis : Number of times a player loses control of the ball after being tackled by an opposing player
RecTarg : Number of times a player was the target of an attempted pass
Rec : Number of times a player successfully received a pass
Rec% : Percentage of time a player successfully received a pass
RecProg : Completed passes that move the ball towards the opponent's goal at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area
CrdY : Yellow cards
CrdR : Red cards
2CrdY : Second yellow card
Fls : Fouls committed
Fld : Fouls drawn
Off : Offsides
Crs : Crosses
TklW : Tackles in which the tackler's team won possession of the ball
PKwon : Penalty kicks won
PKcon : Penalty kicks conceded
OG : Own goals
Recov : Number of loose balls recovered
AerWon : Aerials won
AerLost : Aerials lost
AerWon% : Percentage of aerials won

今回は以下の処理を施して、利用して行きます。

df_football_stats = pd.DataFrame()

df_football_stats['Player'] = football_stats['Player']
df_football_stats['Nation'] = football_stats['Nation']
df_football_stats['Squad'] = football_stats['Squad']
df_football_stats['League'] = football_stats['Comp']
df_football_stats['Age'] = football_stats['Age']
df_football_stats['MP'] = football_stats['MP']
df_football_stats['G/90'] = football_stats['Goals']
df_football_stats['G/Sh'] = football_stats['G/Sh']
df_football_stats['PKGoals'] = ((football_stats['ShoPK'] * football_stats['Min']) / 90).round(0).astype(int)
df_football_stats['shots'] = football_stats['Shots']
df_football_stats['Goals'] = ((football_stats['Goals'] * football_stats['Min'])/90).round(0).astype(int)
df_football_stats['Pass'] = football_stats['PasTotAtt']
df_football_stats['PassCompleted'] = football_stats['PasTotCmp']
df_football_stats['PassComp%'] = df_football_stats['PassCompleted']
df_football_stats['Pass3rd'] = football_stats['Pas3rd'] * football_stats['Min']
df_football_stats['Assist'] = ((football_stats['Assists'] * football_stats['Min']) / 90).round(0).astype(int)
df_football_stats['Assist/90'] = football_stats['Assists'] 
df_football_stats['Cross'] = football_stats['PasCrs']
df_football_stats['CrossCompleted'] = football_stats['CrsPA'] 
df_football_stats['CrossComp%'] = df_football_stats['CrossCompleted']
df_football_stats['Tackle_Won'] = football_stats['TklWon']
df_football_stats['SucDribble'] = ((football_stats['DriSucc'] * football_stats['Min']) / 90).round(0).astype(int)
df_football_stats['Dribble'] = football_stats['DriAtt']
df_football_stats['DribbleComp%'] = df_football_stats['SucDribble']
df_football_stats['TouAttPen'] = football_stats['TouAttPen'] 
df_football_stats['Fls'] = football_stats['Fls']
df_football_stats['Fld'] = football_stats['Fld']
df_football_stats['AerWon'] = football_stats['AerWon']
df_football_stats['AerLost'] = football_stats['AerLost']
df_football_stats['AerWon%'] = football_stats['AerWon'] 
df_football_stats['Car3rd'] = football_stats['Car3rd'] 
df_football_stats['TouAttPen'] = football_stats['TouAttPen']
df_football_stats['GCA'] = football_stats['GCA']
df_football_stats['Touches'] = football_stats['Touches']
df_football_stats['ShoDist'] = football_stats['ShoDist']
df_football_stats['CarMis'] = football_stats['CarMis']
df_football_stats['CPA'] = football_stats['CPA']

相関係数

Goalsと相関が強い特徴量は以下のようになります。
相関が強いとは、一方が大きくなれば他方が大きくなるという関係です。
相関が強いほど1に近づきます。

corr_df = df_football_stats.corr()
corr_df.sort_values('Goals', ascending=False).head(15).style.background_gradient(axis=None)

出力結果

得点ランキングTOP10

df_football_stats.sort_values('Goals', ascending = False).head(10).iloc[:, [0,1,2,3,4,5,10]]
Player Nation Squad League Age MP Goals
Robert Lewandowski POL Bayern Munich Bundesliga 33 34 35
Kylian Mbappé FRA Paris S-G Ligue 1 23 35 28
Karim Benzema FRA Real Madrid La Liga 34 32 27
Ciro Immobile ITA Lazio Serie A 32 31 27
Wissam Ben Yedder FRA Monaco Ligue 1 31 37 25
Patrik Schick CZE Leverkusen Bundesliga 26 27 24
Son Heung-min KOR Tottenham Premier League 29 35 23
Mohamed Salah EGY Liverpool Premier League 29 35 23
Erling Haaland NOR Dortmund Bundesliga 21 24 22
Lautaro Martínez ARG Inter Serie A 24 35 21

これを見るとランキング1位は、ロベルト・レヴァンドフスキです。
レヴァンドフスキ.jpeg

アシストランキングTOP10

df_football_stats.sort_values('Assist', ascending = False).head(10).iloc[:, [0,1,2,3,4,5,15]]
Player Nation Squad League Age MP Assist
Kylian Mbappé FRA Paris S-G Ligue 1 23 35 18
Lionel Messi ARG Paris S-G Ligue 1 34 26 14
Domenico Berardi ITA Sassuolo Serie A 27 33 14
Ousmane Dembélé FRA Barcelona La Liga 25 21 13
Mohamed Salah EGY Liverpool Premier League 29 35 13
Christopher Nkunku FRA RB Leipzig Bundesliga 24 34 13
Marco Reus GER Dortmund Bundesliga 32 29 13
Karim Benzema FRA Real Madrid La Liga 34 32 12
Benjamin Bourigeaud FRA Rennes Ligue 1 28 38 12
Moussa Diaby FRA Leverkusen Bundesliga 22 32 12

アシストランキングを見ると、1位は、キリアン・エンバベです。
エンバベは得点ランキングでは2位であり、得点に多く絡んでいることは一目瞭然です。

ドリブル成功数TOP10

df_football_stats.sort_values('SucDribble', ascending = False).head(10).iloc[:, [0,1,2,3,4,5,21]]
Player Nation Squad League Age MP SucDribble
Allan Saint-Maximin FRA Newcastle Utd Premier League 25 35 140
Kylian Mbappé FRA Paris S-G Ligue 1 23 35 103
Vinicius Júnior BRA Real Madrid La Liga 21 35 101
Rafael Leão POR Milan Serie A 22 34 95
Sofiane Boufal MAR Angers Ligue 1 28 29 95
Houssem Aouar FRA Lyon Ligue 1 23 36 75
Patrick Wimmer AUT Arminia Bundesliga 20 31 75
Lucas Paquetá BRA Lyon Ligue 1 24 35 75
Emmanuel Dennis NGA Watford Premier League 24 33 73
Wilfried Zaha CIV Crystal Palace Premier League 29 33 73

ランキング1位は、アラン・サン・マクシマンです。

機械学習

今回モデルはlightgbmを使用します。
カテゴリー特徴量は、one-hotエンコーディングを用いると特徴量の数が多くなってしまうのでLabelEncodingで処理します。
目的変数をゴール数として、それ以外を特徴量としています。
パラメータチューニングはしていません。
コードは以下のようになります。

#ラベルエンコーディング
from sklearn.preprocessing import OrdinalEncoder
dummies_col = ['Nation', 'Squad', 'League']
oe = OrdinalEncoder()
encoded = oe.fit_transform(df_football_stats[dummies_col].values)
decoded = oe.inverse_transform(encoded)
df_labelencode = pd.DataFrame(encoded, columns=dummies_col)
df_football_stats['Nation'] = df_labelencode['Nation']
df_football_stats['Squad'] = df_labelencode['Squad']
df_football_stats['League'] = df_labelencode['League']

#モデル
from sklearn.model_selection import train_test_split
import lightgbm as lgb
from sklearn.metrics import mean_squared_error

x = df_football_stats.drop(['MP', 'Player', 'G/90', 'Goals', 'G/Sh'], axis=1)
y = df_football_stats['Goals']

x_train, x_val, y_train, y_val = train_test_split(
    x, 
    y,
    random_state=42,
    test_size=0.1,
)
params = {
    'task' : 'train',
    'boosting_type' : 'gbdt',
    'objective' : 'regression',
    'metric': 'rmse',    
}





lgb_train = lgb.Dataset(x_train, y_train)
lgb_eval = lgb.Dataset(x_val, y_val)

model = lgb.train(
    params,
    train_set = lgb_train,
    valid_sets = lgb_eval,
    num_boost_round = 1000,
    verbose_eval = 20,
    early_stopping_rounds = 10,
)

y_valid_pred = model.predict(x_val)
score = np.sqrt(mean_squared_error(y_val, y_valid_pred))
print(f' RMSE: {score}')

結果 RMSE 1.8077498197050759

特徴量の重要度

lgb.plot_importance(model)

考察

Lightgbmの特徴量の重要度を見ると、得点を決めるFWは敵陣に近いところでのパス成功率、シュート本数、ドリブル成功数、ペナルティエリアでのタッチ数が重要とされています。このことから、点数を多くとるFWとは、相手ゴール付近でドリブル、パス、ジュートの三つのことができるプレイヤーだとわかります。
守っているDFやGKは、多くの選択肢を持つプレイヤーほどゴールを守りずらいということが言えるでしょう。
このことから、ドリブル成功数、アシストランキングの両方で上位であるエンバベは非常に守りずらいFWだと考えます。
また、所属リーグを見てみると、重要度が低いことが見てとれます。つまり、得点をとりやすいリーグがある可能性が低いということです。

まとめ

今回は、簡単なデータ分析で、得点を量産するFWの特徴を分析してみました。
リーグで上位の成績を残すためには、ゴール前で多くの選択肢を持つことが大事だということがわかりました。
ただ、今回欧州5大リーグの一年分のデータのみを利用したため、もっと多くのデータを用いると結果は変わってくるかもしれません。また、走力のデータや映像データを用いたりすると面白いと思いました。
完全に私欲を満たすための記事でしたが、お付き合いありがとうございました。

コードを詳しくみたい方は、こちらのリンクからご覧ください。
https://github.com/ka1to0324/2021-2022_football_analysis/blob/main/2021-2022_football_analysis.ipynb

2
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?