My Fantasy Football PPR Prediction

My Fantasy Football PPR Prediction

As a fan of professional football I also part take in fantasy football as do many others. About 75 million people in the US play fantasy football. I wanted to see if I could get the upper hand on my friends. That is where I got the idea to use my data analytic skills and see if I can predict the best players to draft in fantasy football.

Rules

For those that don’t know, these are the rules that I play by in fantasy football.

You need to have a group of people to play with, typically between 8 and 12. For the sake of simplicity, let’s say 10 people. The first thing we need to do is draft players for our team. So we have a computer randomize the order of the draft. We use a snake draft, so the person with the 1st pick also has the 20th pick (in a 10-person league). 2nd pick also has the 19th pick, and so on. That means the 10th pick also has the 11th pick. This is to even out the draft.

Your fantasy team consists of 1 Quarterback (QB), 2 Running Back (RB), 2 Wide Reciever (WR), 1 Tight End (TE), 2 RB/WR/TE (this is a flex position, so you can play an RB, WR, or a TE), 1 Kicker (K), 1 Defense (DEF). You also have some bench slots to add depth to your team in case of injury or bye weeks.

Each week you have your starting team play against someone else in your league. Whoever has the most points at the end of the week receives a win, and the loser a loss.

This is how Points Per Reception (PPR) is recorded (how you players get points):

Passing Yards: 1pt per 25 yards
Passing Touchdowns: 4pts
Passing Interceptions: -2pts
Rushing Yards: 1pt per 10 yards
Rushing Touchdowns: 6pts
Receptions: 1pt
Receiving Yards: 1pt per 10 yards
Receiving Touchdowns: 6pts
2-points Conversions: 2pts
Fumbles Lost: -2pts
There are other ways to receive points for the Defense and Kicker but this article only goes into the PPR Prediction of QB, RB, WR, and TE

Each week you play a different player in your league. Then around week 12 (it depends on the league), you go to the playoffs, so the top 4 teams based on record play in a bracket-style tournament. One game elimination. The winner is the winner of the entire league and if money was bet also gets the largest portion of the cash.

In order to win your league it is critical you have a good draft. This means a bad draft could end your season before it even starts. Hence, my goal is to see if I can predict the best players to draft for next season.

Predicting PPR

In order to predict a player’s next year’s PPR, I needed some data to base my predictions on. I downloaded data on all offensive players (QB, RB, WR, TE) from Pro Football Reference.

Data was collected from players from the years 2016 to 2021. I could have used even more data from previous years but I wanted to test it out with a few years and I had more data than I anticipated with just the six years of data.

Check out my Github Repository on my Fantasy Football PPR Prediction

GitHub

Once I read all of the files into RStudio I merged each year’s data frame with the next year’s data. This allowed me to compare data from one year to the next of the same player. It also eliminated the data from the players who either didn’t play in the previous year or didn’t play in the following year.

Now I have 5 separate data frames with data of players from two consecutive years. I now want to combine these data frames into one big data frame. This left me with 2456 observations. It does include some players twice but this is not a problem since it is comparing two different years of the same player.

Next, I created subsets of this large data frame. Split up by position. This is so I can look through the data without having to worry about player positions.

I really wanted to focus on not only how well the players previous year’s statistics predicted the following year’s PPR but also show how their stats directly influence the same year’s PPR.

Here is the breakdown of the positions:

Quarterback Plots

I find it really cool how the same-year statistics directly correlate to PPR. This obviously makes sense since PPR is calculated from these statistics. Then looking at the statistics compared to the next year’s PPR is all over the place.

Also notice the large group of data points right after 250 completions, 3000 passing yards, and 20 passing touchdowns. These are your starting QBs, like your franchise QBs. They start every game and are consistent. The other data points before these markers are either backup QBs or starting QBs that are not franchise QBs that will stick around as a starter.

Running Back Plots

The biggest thing I noticed from the running backs’ plots is that the players can still score quite a bit of points without scoring that many touchdowns. Obviously, there is a correlation that the more touchdowns you score the more points you get, but there are plenty of players who have more fantasy points and fewer touchdowns than other players.

Wide Receiver Plots

Looking at the WR Receiving Yards vs PPR there is one point that stood out to me. Which is the one that has about 600 receiving yards and a little over 200 PPR. I was confused as to why this player wasn’t with all the other players. Turns out this is Tyreek Hill in 2016. He has a lot of fantasy points with a lack of receiving yards because he had a lot of touchdowns (12 total). But also had 24 rushing attempts with a total of 267 rushing yards. Which many WR don’t get.

Tight End Plots

Not much stood out to me from the TE plots. We just see the same spread of results when using statistics compared to the following years’ PPR. This already gives us an indication that we probably won’t get a great correlation between last year’s statistics and to prediction following year’s PPR.

Regression Models

In order to predict the next year’s PPR, I need to create some regression models. Not only did I make a prediction regression model but also a regression model of the same year PPR to compare. On the left is the regression model of player statistics predicting the same years’ PPR. For all positions, the R-squared is .99. What this means is that 99% of the variation in PPR can be explained by those player statistics. This makes perfect sense because PPR is calculated from these player statistics.

What also makes sense about the left-side regression models are the estimates of each variable. The estimates are the same as the points received in calculating the PPR that I explained earlier. Each with great significance. Again we were expecting all of this. But now let’s compare these regression models to the right-side models, where we see if there is a correlation to the following years’ PPR.

Quarterback Regression

The most significant statistic for the regression model is Passing Interceptions. The model predicts that for every passing interception you throw the following year’ PPR will decrease by 5.4 points holding all else constant. This makes sense because if a player throws a lot of interceptions in one year the team might consider benching him for next year, trading him, or just dropping him altogether. This would cause a decrease in PPR for that player.

Another thing to notice is that the model predicts that for every rushing TD a QB scores the following year PPR will increase by 28.8 points, holding all else constant. QB receiving TDs is very rare. So to me, this means a QB scored a receiving and the following year had a better PPR, thus increasing how much a receiving TD is worth for the following year’s PPR.

The R-squared of this model is .52 which means that 52% of the variation in the following year’s PPR can be explained by the variables in the model.

Running Back Regression

As a running back passing doesn’t happen very often. So when the regression model predicts an increase in 25 points for each passing TD and a decrease in 72 points for each passing interception, holding all else constant. We can assume the reason for this is the same as the QB receiving TD prediction.

What is interesting about this model is that for every 10 yards of rushing the model predicts that the player will receive .95 points in the following PPR, holding all else constant. That is almost exactly how PPR for rushing yards is calculated, for every 10 yards of rushing the player receives 1 point. On top of that, it is also statistically significant too.

The R-squared of this model is .46 which means that 46% of the variation in the following year’s PPR can be explained by the variables in the model.

Wide Receiver Regression

Similarly to the rushing yards for running backs, receiving yards for wide receivers also has an effect. For every 10 yards of receiving, the model predicts that the following year PPR will increase by .9 points. Once again very close to the actual calculation, every 10 yards of receiving you gain 1 point. And just like the running backs, this estimation by the model is also statistically significant.

The R-squared of this model is .48 which means that 48% of the variation in the following year’s PPR can be explained by the variables in the model.

Tight End Regression

The model predicts that for every passing interception a TE throws the following year PPR will increase by 105 points, holding all else constant. It is quite funny to see that is the result the model pulled from the data but again it is just because Tight Ends rarely throw the football, which means an interception is even more rare. So if a TE throws an Interception and the next year balls out in the field that is how we get that kind of result from the model.

The R-squared of this model is .48 which means that 48% of the variation in the following year’s PPR can be explained by the variables in the model.

The Prediction

Using the PPR predicting regression model I used the 2021 player data to predict these players’ 2022 PPR. One thing to mention is that there are players who played in 2021 who retired or are longer on a team in 2022, they will have 0 PPR the following year. The model obviously doesn’t know this and gives a prediction based on the previous year.

Quarterback Prediction

The goal here is to now pick a player based on value. Of course, you want to pick the best player, but picking up josh Allen or Tom Brady isn’t always viable. This is because draft picks are valuable and you would have to use a higher draft pick on Josh Allen than you would have to on Kirk Cousins.

I use Kirk as an example because based on the model he is predicted to be the 8th best QB by PPR this season. But was only 11th best QB last year, his value is low but his predicted points are high

Running Back Prediction

The biggest running back that stands out here is Derrick Henery. Predicted to be the 6th best RB but last year only recorded less than 200 points. This looks like a case of a low-value player, with great potential. But this is where the model falls flat. Those that recall from last year Derrick Henery was a force to reckoned with, an absolute menace to NFL defenses. But he got hurt last year which is why his points didn’t stack up to everyone else’s on this list. He will still be a top RB draft pick and will go first overall in many leagues.

I am looking more towards Antonio Gibson and Dalvin Cook. Low value with a high predicted PPR.

Wide Receiver Prediction

In the WR I am looking at Diontae Johnson and CeeDee Lamb, great value picks. This is all depending on what draft pick I get since some of these players may have already been taken by the time my pick comes around.

One play my model is telling me to avoid is Deebo Samuel. It looks like he is way over valued. With a stellar performance, last year with 339 points is predicted to be the 15th best WR next year. It’s not that he’s a bad pick but that he isn’t valued as the 3rd best WR like he was last year.

Tight End Prediction

For the TEs Dallas Goedert looks like a good value pick. But I tend to look at TEs that are lower on the list since the higher value TEs use up early draft picks for my liking. I like the Darren Waller pick for my team.

Prediction Caveats

Although using regression models can be beneficial to give predictions about the outcome of the future, it should be taken with a grain of salt. This is because there is a lot of uncertainty regarding the future. For example, here are a few things these models do not account for:

Players switching teams
- The player themselves could be switching teams, either to a good team increasing their potential PPR or a bad team which could limit their PPR
- The players’ teammate gets traded or retires, this could hurt their potential PPR if their teammate influences how they receive points
  - A QB throwing to a WR gets traded
  - A WR getting traded increases the target share for other WRs on the team they left
Player get hurts
- Last year a player could have been hurt thus the model shows a low PPR for the following year even though that might not be the case if they are now healthy
- Maybe a teammate gets hurt which could affect their potential PPR in multiple ways
- The model also doesn’t predict if the player would get the following year
New coaching or management
- New coaches or management can really influence the energy of working there. It is still their job and we all know how much our boss influences our attitude and energy at work.
Personal Conflicts
- This can anything from a death in the family to having a baby. These things affect how the players play. Being able to predict this is near impossible.

Conclusion

Although the model, data, and plots are really cool you can’t predict future PPR since there is too much uncertainty. This is very similar to using previous data to predict future stock prices. You can get an estimate of what you think it might be, but unknown factors can easily affect this rendering your model useless.

I will be using my model to help influence my draft decision-making this year in fantasy football. My plan is to look for the best value players and limit my losses. I will add an update to my team when my league is done drafting.