NFL Predictor
Anastasio E. Kesoglidis & Patrick
L. Johnson
Introduction
Both of the authors of this program come from an avid sports
fan background. When the list of
suggested projects included an NFL game predictor, we both
quickly agreed that this was what we
wanted to work on. Having seen the power of learning during the
semester in the domain of face
recognition, we thought that it would be both interesting as well
as fun trying to develop a system
that could learn to predict the outcome of NFL games. Thus our
problem was - Can we create an
accurate NFL game predictor that uses learning? To create such a
system, we gathered statistical
data from past NFL seasons and used them as our data set. The
system learned which statistics
were more important than other by comparing teams that met in the
playoffs of past seasons and
noting which statistics the winning team was better at. The
system was tested on the 1998 NFL
season and it made predictions about the upcoming season.
Approach
Usually, intelligent systems in sports domains rely on
statistical data from previous seasons. It
was clear from the start that if we wanted to accurately pick a
winner between two teams, we
would have to look at two things - the players that are on the
team and their respective statistics in
a previous season. We decided not to look too far back into the
past as a players current ability
most closely resembles their performance from two or three
seasons ago. Thus, we decided to
focus on the three most recent season statistics - 1996, 1997,
and 1998.
After going through a pool of different statistical categories
related to each players position, we
picked out what we considered was the most important aspect for
each position. In the end, we
came up with 27 statistical categories that spanned 8 different
positions. For example, we decided
that the following 7 statistics were the most important for
quarterbacks - number of games started,
number of passing attempts, percentage of passes that were
completed, average number of yards
per attempt, number of touchdowns thrown, number of interceptions
thrown, and their rating.
Because of the fact that the statistical data was in different
files according to position, we had to
create a module that would go through each of these files and
create a roster for each team. For
example, the module had to go through each statistical file, and
read in all those players who
played for a particular team. Thus, this module simply brought
all of the players on a particular
team together for us into a TEAM structure. This gathering of
players was done for each of the
three seasons we used because of the fact that some players
change teams after a season. During
the gathering of the players for a particular team and for a
particular season, certain statistics for
certain players were combined while others were eliminated to
create team statistics for each
position. For example, combining the total number of sacks that
each player had on a team told us
how good a team was defensively as a unit, not as individuals.
This was not the case for every
player. Certain positions are more individualistic than others.
For example, a really good
quarterback means a great deal to a team whereas, because the
defensive line of a team is made up
of several players, one good player on the line can be offset by
one bad one.
For the next phase, we had to decide how we wanted to use this
gathered data for purposes of
camparing two teams. We came to the conclusion that, we had to
decide which of the 27 gathered
statistics were more important than others. For exmaple, are the
number of field goals made as
important as the number of touchdowns a running back has? To
address this problem, we took the
gathered statistics for each team in a particular year and ranked
them in each statistical category.
For example, the team that had the most quarterback touchdowns
was given a ranking of 1 in that
category, the next 2, etc. Teams that had the same number for a
statistic were given equal rankings
and teams for which no data was available for a statistic were
given a ranking 0 (not to denote a
bad ranking, but no information).
Thus we had a rankings systems for each team for each
statistic for three seasons. Breaking it
down into rankings made the statistical data somewhat more
comparable since teams that had
relatively close numbers ended up with close rankings and those
that didnt had big differences in
rank. From this data we decided that we were ready to
learn.
The approach we chose for learning was to first take the
compilation of statistical data and assign
them all a weight of 1 to start. Our ultimate goal was by the
time we finished our learning was to
have all the weights updated to values such that the numbers
captured what was really important
about winning games and devalued meaningless stats. To that end,
we looked at playoff games and
compared the winning team to the losing team. In statistical
areas where the winning team was
stronger we increased the magnitude of that weight and where the
team won despite being weaker
we decreased the value. The hope was that over the course of lots
of games, the effect of flukes
would be dissipated and that the important factors would show
themselves. The Super Bowl was
weighted more heavily than the other playoff games because we
felt that the team that won the
Super Bowl really had something that the other team was missing
that pushed them over the top.
After all the weights were compiled based on the data and
adjusted based on how much they helped
teams win, we came up with power ratings for the teams. This
power rating was a number that
captured the teams strength. It was calculated by taking
the numbers that the teams acheived in
the 27 different statistical categories and multiplying by the
computed weight for that category,
then summing. This computation was done after all the learning.
Comparisons between teams was
done based on their power rating. If teams had very close power
ratings then the program chose
the home team and predicted the game would be very close.
Otherwise if the ratings were within
another certain threshold it would output the winner and thought
the outcome would be a regular
game. For power rating differences above this threshold it
predicted a blowout for the winning
team.
We created a java applet for two reasons - simplicity in
simulating a season and also for demoing
our project to the class. The applet allows you to select a home
team and a road team and the year
that the game is to be played. In order for our applet to run
faster, we simply hardcoded the power
ratings for each team into the applet thus not to waste time
learning every time a game is entered.
The applet can be accessed online at http://www.columbia.edu/~aek19/nfl.html.
Results of running the program on the 1998 NFL season
In order to test what our program had learned, we took data
from 1996 and 1997 and used that as
training data to teach the program what statistics were important
in talking about what makes one
team better than another. It turned out that the most important
statistics turned out to be
quarterback numbers. This is an understandable result that can be
confirmed by casual football
knowledge. Armed with the knowledge of this and the other
relevant weights we then got a copy of
the 1998 schedule along with the results and ran the program for
each game of the season. We
recorded what our program predicted the outcome of the game to be
and compared that with the
actual result of last season. Out of the 240 games, it predicted
the correct winner 148 times, or
62% of the time. Details of the programs performance during
the 1998 season can be viewed
under Appendix A. While we were encouraged by the fact that the
program had at least learned
something, we observed several problems that we felt might have
contributed to the accuracy being
lower than we might have hoped.
One problem we had was with the data set that we used to
compile team statistics. We didnt
realize at the time we chose this data set or during the early
stages while we were using it that it
counted all of the players statistics that they had accrued
for the whole season to the team that
they were currently on, even if they had only recently been
acquired by that team and had in fact
gotten those numbers largely while playing for another team. We
tried to mitigate this somewhat
by using the maximum value a team had for some of the statistics,
other times only a total really
made sense so we were forced to include some of the incorrect
data in our program. However, this
wasnt that common a case so we feel on the whole that the
effect was rather minor.
As stated in the approach discussion, our program attempted to
ascertain what made the difference
between winning teams and losing teams. We had a wealth of
statistical information available to
us but our program had to try to make sense out of it and weed
out the largely insignificant values
from the ones that really separated the winners from the
also-rans. One problem with this that
contributed to lowering our results was the fact that we
couldnt learn from data from the current
season as we were running the program. This had the effect of
making our program naive about
the current season. To evaluate the teams from the current
season, all we had was previous
performance to go on. Through the course of a season, there are
often developments that would
change how you would evaluate a team. A good example of this in
the season we ran was the
Minnesota Vikings performance. Before the season the team had a
lot of question marks. Their
quarterback was someone who had been out of the league for more
than a season. They had just
signed a tremendously talented receiver who had been having
trouble with the law. Their defense
was unproven. Without any prior knowledge about the season, many
experts, not just our program
were doubtful the team would do very well. However, a few weeks
into the season experts were
starting to change their opinions. Everything went right for this
team. The quarterback came out
of retirement and played better than he ever had before. The star
receiver stayed straight and
became one of the best in the league. The defense solidified and
quickly cleared the doubts people
had. As it became apparent that this was one of the best teams in
the league, prognosticators were
able to update their opinion on the team and replace the
uncertainty with strong confidence in the
team. Later in the season they were almost always the favorite
and often outscored their opponents
by impressive margins. The fact that our program couldnt
take advantage of this dynamism
probably cost it some games. However, we couldnt get
midseason data about the players so we
couldnt make the same changes football experts and casual
fans observing could. We attribute
this to the unpredictability of sports. The team in question,
though talented, could have very easily
ended up being mediocre so it wasnt necessarily a bad
decision to evaluate them the way the
program did, it just didnt work out in this case.
Another problem we had was that we sometimes overestimated
talented teams that piled up good
numbers but for some reason or another didnt put together
many wins. This question has
mystified coaches and fans for some time now. Some teams seem to
have all the talent they could
hope for but not be able to put it all together.
Overall we felt the results showed that the
program had learned respectably well what made teams win and
lose.
Results of running the program on the 1999 NFL season
Our decision to use the data from 1996, 1997, and 1998 to
simulate next years season grew
mostly from curiosity. Seeing that we had a performance of 62%,
we decided to see what would
happen if we added in a third data set. Although we wont be
able to see how accurate the results
from running this simulation are until next year, we thought it
was worth doing. The actual
schedules for next season were used with the home teams playing
at home and the road teams
playing on the road. The only slight bump that we had prior to
running the simulation was that a
new team on which we had no past data on, the Cleveland Browns,
were part of the season. The
simplest way of taking care of this was just to not count games
that involved them. All that this
would have done, in terms of the simulation, would have been that
some teams ended up with fewer
games played than others.
The results of the simulation can be viewed in Appendix B.
Judging simply from the standings on
the first page, we were not too surprised with the results. All
but one of the division winners (those
that have a y next to them) were playoff teams last season. There
were two surprises - 2 teams that
did much better than expected (St.Louis and San Diego) and 4
teams that never won.
The reason why the two teams did good was that they had good
rankings in the categories our
learning process labeled as the most important. I strongly doubt
they will perform as they did here
next season, however. I was amazed that 4 teams did not win any
games. I would have accepted 2
teams with no wins but I feel 4 is too much. The reason why they
did as they did was because of
their schedule. All of the teams they ended up playing against
were much better than they were.
Even the home field advantage in some games was not good enough
to get them a win.
In our algorithm, we had a bias towards the home team. For
example, if two teams with relatively
equal strengths were playing each other, we decided to pick the
home team as the winner because
we felt that the home field in a game usually plays a big role in
close games. Although we could
have randomized it a bit, after seeing the good teams do good, it
did not seem to make that much of
a difference. (If a good team did well, they won games away from
their stadium). The only place
where this seems to have caused a problem is in the simulation of
the playoffs. Because teams
with better records are given home field advantage in the
playoffs and because most of the teams in
the playoffs are of close strength, it was obvious that the teams
that played at home won every
game in the playoffs. In our simulated Super Bowl XXXIV, the
Broncos beat Atlanta simply
because I put them as the home team. If I had reversed it to
Atlanta being the home team, Atlanta
would have won. Thus, between the standings and the playoffs, I
think the standings will be more
close to accuracy next season than the playoffs.
Conclusions
Our program was an example of an application of some of what
we learned in the course to solve
an interesting problem in a fun domain. The reasons that the
project was interesting was that we
had prior interest in both artificial intelligence and sports and
that the nature of the problem
allowed for immediate gratification in terms of testing and
evaluation. We encountered some of the
challenges that make artificial intelligence difficult.
Specifically, the real world (or even a
microcosm of it in a game) is an incredibly complex space that we
are attempting to abstract away
from and infer some knowledge about. Computers and AI learning
techniques allow us to make
some progress, but the basic problem of modelling the real world
continues to be a challenge. We
are satisfied with what we did.