Chelsea FC is an English professional football club based in Fulham, West London…

  • Full name : Chelsea Football Club
  • Founded : 10 March 1905
  • Head coach : Mauricio Pochettino
  • Winning record : EPL (1954-55,2004-05,2005-06,2009-10,2014-15,2016-17), UCL (2011-12, 2020-21), UEFA (2012-13,2018-19), etc…

However, these teams hava been performing poorly recently.
Let’s find the best player Chelsea FC needs and strengthen the team.

source code
Python | FIFA23 OFFICIAL DATASET
run
28.3s

Data, Modules Loading and Config

Data Loading

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import warnings # ignoring warning massages
warnings.filterwarnings('ignore')
# load data (The FIFA23 data doesn't reflect 'Main Position' well, so we are using FIFA22 data)
data = pd.read_csv("/content/drive/MyDrive/Dataset/FIFA dataset/FIFA22_official_data.csv")

Basic Exploration

Basic Exploration

data.head()

output: image

Delete unnecessary columns

print(data.columns)
data = data.drop(columns=['Photo', 'Flag', 'Club Logo', 'Real Face'])

ouput: image

Exploritory Data Analysis of Chelsea FC

EDA, Data Visualization

1. Check the positions and ages of Chelsea players

Chelsea = data[data['Club']=='Chelsea']
plt.figure(figsize=(10,6))
sns.countplot(x = Chelsea['Age'])

output: image

plt.figure(figsize=(10,6))
sns.countplot(x = Chelsea['Best Position'])

output: image

1.1 Check overalls by position using a box-and-whisker diagram

plt.figure(figsize = (10,6))
sns.boxplot(data=Chelsea, x='Best Position', y='Overall')

output: image

2. Compare with other strong teams

I choose data from Real Madrid CF and Arsenal FC which i think are strong teams

# Insert data of Arsenal FC and Real Madrid CF in df1
df1 = data[(data['Club']=='Chelsea')|(data['Club']=='Arsenal')|(data['Club']=='Real Madrid CF')]

# Filtering players whose value over 1M
df1 = df1[df1['Value'].str.contains('M')]

# Likewise, delete unnecessary columns
df1.info()
df1 = df1.drop(columns=['Marking'])

# Conversion data in Value column to Float type
df1['Value'] = df1['Value'].str.slice(1,-1).astype(float)

output: image

2.1 Identify insufficeint positions

cs = df1[df1['Club']=='Chelsea'].sort_values(by='Overall',ascending=False)
rm = df1[df1['Club']=='Real Madrid CF'].sort_values(by='Overall',ascending=False)
mc = df1[df1['Club']=='Manchester City'].sort_values(by='Overall',ascending=False)
data['Best Position'].unique()

output: image

To compare the main players select starting lineup for each team (Based on overall)

Based on 4-4-2 tactics, 1 GK, 4 CB, 4 MF, 2 ST are selected

gk_list = ['GK']
cb_list = ['CB','RB','LB','RWB','LWB']
mf_list=['CAM','CM','CDM','LM','RM']
st_list=['ST','LW','RW','CF']
st_count = 2
mf_count = 4
cb_count = 4
gk_count = 1

cs_id = []
for index in cs.index:
  if cs['Best Position'][index] in gk_list:
    if gk_count != 0:
      cs_id.append(cs['ID'][index])
      gk_count -= 1
  elif cs['Best Position'][index] in cb_list:
    if cb_count != 0:
      cs['Best Position'][index]='CB'
      cs_id.append(cs['ID'][index])
      cb_count -= 1
  elif cs['Best Position'][index] in mf_list:
    if mf_count != 0:
      cs['Best Position'][index]='MF'
      cs_id.append(cs['ID'][index])
      mf_count -= 1
  else:
    if st_count != 0:
      cs['Best Position'][index]='ST'
      cs_id.append(cs['ID'][index])
      st_count -= 1
cs = cs[cs['ID'].isin(cs_id)]
st_count=2
mf_count=4
cb_count=4
gk_count=1

rm_id=[]
for index in rm.index:
  if rm['Best Position'][index] in gk_list:
    if gk_count != 0:
      rm_id.append(rm['ID'][index])
      gk_count -= 1
  elif rm['Best Position'][index] in cb_list:
    if cb_count != 0:
      rm['Best Position'][index]='CB'
      rm_id.append(rm['ID'][index])
      cb_count -= 1
  elif rm['Best Position'][index] in mf_list:
    if mf_count != 0:
      rm['Best Position'][index]='MF'
      rm_id.append(rm['ID'][index])
      mf_count -= 1
  else:
    if st_count != 0:
      rm['Best Position'][index]='ST'
      rm_id.append(rm['ID'][index])
      st_count -= 1
rm=rm[rm['ID'].isin(rm_id)]
st_count=2
mf_count=4
cb_count=4
gk_count=1

mc_id=[]
for index in mc.index:
  if mc['Best Position'][index] in gk_list:
    if gk_count != 0:
      mc_id.append(mc['ID'][index])
      gk_count -= 1
  elif mc['Best Position'][index] in cb_list:
    if cb_count != 0:
      mc['Best Position'][index]='CB'
      mc_id.append(mc['ID'][index])
      cb_count -= 1
  elif mc['Best Position'][index] in mf_list:
    if mf_count != 0:
      mc['Best Position'][index]='MF'
      mc_id.append(mc['ID'][index])
      mf_count -= 1
  else:
    if st_count != 0:
      mc['Best Position'][index]='ST'
      mc_id.append(mc['ID'][index])
      st_count -= 1
mc = mc[mc['ID'].isin(mc_id)]
# Put them back together for comparison
df = pd.concat([cs,mc,rm], axis=0)
plt.figure(figsize=(10,6))
sns.boxplot(data=df, x='Best Position', y='Overall', hue='Club')

output: image

plt.figure(figsize=(10,6))
sns.boxplot(data=df, x='Best Position', y='Value', hue='Club')

output: image

Comparing other teams, Chelsea Fc is inferior overall and thier defense is particularly weak

Solution, Data exploration

1. Making own formula

Create a formula for player’s score by weighting what i think is most important in a player (eg. age, overall, etc..)
Point = (Overall*2+Potential)/Age

cs['Point'] = (cs['Overall']*2+cs['Potential'])/cs['Age']
cs[cs['Best Position']=='CB'][['Name', 'Overall', 'Potential','Age','Point','Joined','Value']]

output: image

Thiago Silva is one of my most favorite players but his score is low since his age

2. Find a replacement player

# Players who are not in Chelsea FC and have an overall over 83
market=data[(data['Best Position']=='CB')&(data['Club']!='Chelsea')&(data['Overall']>=83)]
market['Point']=(market['Overall']*2+market['Potential'])/market['Age']
want = market[['Name','Club','Age','Overall','Potential','Value','Point','Joined']]
want

output: Uploading image.png…

fig, ax = plt.subplots(nrows=2,ncols=2)

fig.set_size_inches((12,8))
plt.subplots_adjust(wspace = 0.4, hspace = 0.2)

sns.barplot(data=want, x='Age', y='Name', ax=ax[0,0])
sns.barplot(data=want, x='Overall', y='Name', ax=ax[0,1])
sns.barplot(data=want, x='Potential', y='Name', ax=ax[1,0])
sns.barplot(data=want, x='Point', y='Name', ax=ax[1,1])

output: image

Conclusion: Chelsea FC offers a contract to M.de Light who has the highest point