Predictive Models
From problem statement to solution approach, leading to result and conclusion.

The above image of a basketball player would have made you to think what a basketball player had to do with predictive models in ML. But this is what we will be focussing on in this story, where I will be talking about the improvement in basketball players, and using ML for making predictive model. Although this is not one of the projects I had done, but this was something so great that I just can’t stop sharing with you all. So let’s get started.
As stated above, this story not only discuss the problem statement, but also discuss about the solution approach, which will ultimately lead to some result and conclusion. The problem statement is as below:
Based on data like the player’s age, his position, his performance in last season, how much a player will improve the next season.
As in my previous story, I had already discussed about getting a dataset from kaggle, and how we then convert it into dataframe, extract required information(data cleaning) and stuffs, this is what we are not going to talk about today. We will directly jump to the step after data cleaning i.e Exploratory Data Analysis. Many a times, we don’t have target variable in the dataset, instead we deduce it from the given parameters or through their combinations. This is what we do in here. We focus on players improvement, hence we calculate the difference of “win shares” between two consecutive years as the target variable.
- Relationship between improvement and age : Younger players are expected to improve much faster and effectively as compared to older players. This argument is supported by the following figure showing improvement of players of different ages using a box plot.

2. Relationship between improvement and playing time/duration : It is mostly observed in strenuous sports like hockey, football, basketball that the time duration of play also affects how much a player improves in his/her skills. Usually players with short playing time comes up with significant improvement and so improvement is directly proportional to short and effective participation.
3. Relationship between improvement and number of games played : If a player had played less number of games, then it may be because he was in a break, or may be injured and recovering. While those who had gone exhaustive involvement may face fatigue or may have several injuries, which will adversely affect their performances in the coming season. Below scatter plot will show the same as stated.

4. Relationship between improvement and last year’s improvement : Sometimes a player improves his game with time, although it’s not the case always. Sometimes, a player revolves around a fixed boundaries, and this is what is shown below in the plot.

Predictive Modelling
There are 2 types of predictive models, i.e regression and classification. Regression models are used to obtain continuous values, which may be rather more predictive in this case, as it helps us to estimate the extend of improvement, while the classification model provides information whether improvement is possible or not (categorical classification). So we can use linear models (linear regression, Ridge regression, and Lasso regression), support vector machines (SVM), random forest, and gradient boost models to the dataset, while root mean squared error (RMSE) as the tuning and evaluation metric. The predicted values had much narrow range than the actual values as shown below.

As a result, the prediction errors were larger as the actual values deviated further from zero as shown.

For the solution we treat players with large improvement/decline with higher weights in model training and evaluation because they were more rare. Using this approach, we found that predicted target values have similar range and distribution as the actual target values, which are shown as below.

Performance of different models
The summary of performance of various models are shown as below.

Conclusion and Result
To come up with the conclusion and solution, we had used a number of factors which make it rather complex, but this is what real time models be like. Making predictive models is not just the only work, more important is the evaluation of performance of different models and their efficiency check, which helps us to choose the most efficient model for a particular case.
Really excited for my 2nd story on Medium.
Feel free to comment below.