Chelsea FC is an English professional football club based in Fulham, West London…
- Full name : Chelsea Football Club
- Founded : 10 March 1905
- Head coach : Mauricio Pochettino
- Winning record : EPL (1954-55,2004-05,2005-06,2009-10,2014-15,2016-17), UCL (2011-12, 2020-21), UEFA (2012-13,2018-19), etc…
However, these teams hava been performing poorly recently.
Let’s find the best player Chelsea FC needs and strengthen the team.
source code
Python | FIFA23 OFFICIAL DATASET
run
28.3s
Data, Modules Loading and Config
Data Loading
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import warnings # ignoring warning massages
warnings.filterwarnings('ignore')
# load data (The FIFA23 data doesn't reflect 'Main Position' well, so we are using FIFA22 data)
data = pd.read_csv("/content/drive/MyDrive/Dataset/FIFA dataset/FIFA22_official_data.csv")
Basic Exploration
Basic Exploration
data.head()
output:
Delete unnecessary columns
print(data.columns)
data = data.drop(columns=['Photo', 'Flag', 'Club Logo', 'Real Face'])
ouput:
Exploritory Data Analysis of Chelsea FC
EDA, Data Visualization
1. Check the positions and ages of Chelsea players
Chelsea = data[data['Club']=='Chelsea']
plt.figure(figsize=(10,6))
sns.countplot(x = Chelsea['Age'])
output:
plt.figure(figsize=(10,6))
sns.countplot(x = Chelsea['Best Position'])
output:
1.1 Check overalls by position using a box-and-whisker diagram
plt.figure(figsize = (10,6))
sns.boxplot(data=Chelsea, x='Best Position', y='Overall')
output:
2. Compare with other strong teams
I choose data from Real Madrid CF and Arsenal FC which i think are strong teams
# Insert data of Arsenal FC and Real Madrid CF in df1
df1 = data[(data['Club']=='Chelsea')|(data['Club']=='Arsenal')|(data['Club']=='Real Madrid CF')]
# Filtering players whose value over 1M
df1 = df1[df1['Value'].str.contains('M')]
# Likewise, delete unnecessary columns
df1.info()
df1 = df1.drop(columns=['Marking'])
# Conversion data in Value column to Float type
df1['Value'] = df1['Value'].str.slice(1,-1).astype(float)
output:
2.1 Identify insufficeint positions
cs = df1[df1['Club']=='Chelsea'].sort_values(by='Overall',ascending=False)
rm = df1[df1['Club']=='Real Madrid CF'].sort_values(by='Overall',ascending=False)
mc = df1[df1['Club']=='Manchester City'].sort_values(by='Overall',ascending=False)
data['Best Position'].unique()
output:
To compare the main players select starting lineup for each team (Based on overall)
Based on 4-4-2 tactics, 1 GK, 4 CB, 4 MF, 2 ST are selected
gk_list = ['GK']
cb_list = ['CB','RB','LB','RWB','LWB']
mf_list=['CAM','CM','CDM','LM','RM']
st_list=['ST','LW','RW','CF']
st_count = 2
mf_count = 4
cb_count = 4
gk_count = 1
cs_id = []
for index in cs.index:
if cs['Best Position'][index] in gk_list:
if gk_count != 0:
cs_id.append(cs['ID'][index])
gk_count -= 1
elif cs['Best Position'][index] in cb_list:
if cb_count != 0:
cs['Best Position'][index]='CB'
cs_id.append(cs['ID'][index])
cb_count -= 1
elif cs['Best Position'][index] in mf_list:
if mf_count != 0:
cs['Best Position'][index]='MF'
cs_id.append(cs['ID'][index])
mf_count -= 1
else:
if st_count != 0:
cs['Best Position'][index]='ST'
cs_id.append(cs['ID'][index])
st_count -= 1
cs = cs[cs['ID'].isin(cs_id)]
st_count=2
mf_count=4
cb_count=4
gk_count=1
rm_id=[]
for index in rm.index:
if rm['Best Position'][index] in gk_list:
if gk_count != 0:
rm_id.append(rm['ID'][index])
gk_count -= 1
elif rm['Best Position'][index] in cb_list:
if cb_count != 0:
rm['Best Position'][index]='CB'
rm_id.append(rm['ID'][index])
cb_count -= 1
elif rm['Best Position'][index] in mf_list:
if mf_count != 0:
rm['Best Position'][index]='MF'
rm_id.append(rm['ID'][index])
mf_count -= 1
else:
if st_count != 0:
rm['Best Position'][index]='ST'
rm_id.append(rm['ID'][index])
st_count -= 1
rm=rm[rm['ID'].isin(rm_id)]
st_count=2
mf_count=4
cb_count=4
gk_count=1
mc_id=[]
for index in mc.index:
if mc['Best Position'][index] in gk_list:
if gk_count != 0:
mc_id.append(mc['ID'][index])
gk_count -= 1
elif mc['Best Position'][index] in cb_list:
if cb_count != 0:
mc['Best Position'][index]='CB'
mc_id.append(mc['ID'][index])
cb_count -= 1
elif mc['Best Position'][index] in mf_list:
if mf_count != 0:
mc['Best Position'][index]='MF'
mc_id.append(mc['ID'][index])
mf_count -= 1
else:
if st_count != 0:
mc['Best Position'][index]='ST'
mc_id.append(mc['ID'][index])
st_count -= 1
mc = mc[mc['ID'].isin(mc_id)]
# Put them back together for comparison
df = pd.concat([cs,mc,rm], axis=0)
plt.figure(figsize=(10,6))
sns.boxplot(data=df, x='Best Position', y='Overall', hue='Club')
output:
plt.figure(figsize=(10,6))
sns.boxplot(data=df, x='Best Position', y='Value', hue='Club')
output:
Comparing other teams, Chelsea Fc is inferior overall and thier defense is particularly weak
Solution, Data exploration
1. Making own formula
Create a formula for player’s score by weighting what i think is most important in a player (eg. age, overall, etc..)
Point = (Overall*2+Potential)/Age
cs['Point'] = (cs['Overall']*2+cs['Potential'])/cs['Age']
cs[cs['Best Position']=='CB'][['Name', 'Overall', 'Potential','Age','Point','Joined','Value']]
output:
Thiago Silva is one of my most favorite players but his score is low since his age
2. Find a replacement player
# Players who are not in Chelsea FC and have an overall over 83
market=data[(data['Best Position']=='CB')&(data['Club']!='Chelsea')&(data['Overall']>=83)]
market['Point']=(market['Overall']*2+market['Potential'])/market['Age']
want = market[['Name','Club','Age','Overall','Potential','Value','Point','Joined']]
want
output:
fig, ax = plt.subplots(nrows=2,ncols=2)
fig.set_size_inches((12,8))
plt.subplots_adjust(wspace = 0.4, hspace = 0.2)
sns.barplot(data=want, x='Age', y='Name', ax=ax[0,0])
sns.barplot(data=want, x='Overall', y='Name', ax=ax[0,1])
sns.barplot(data=want, x='Potential', y='Name', ax=ax[1,0])
sns.barplot(data=want, x='Point', y='Name', ax=ax[1,1])
output:
Conclusion: Chelsea FC offers a contract to M.de Light who has the highest point