Database Reference
In-Depth Information
knows that those intangibles are often manifest by athletes' past performance. He wants to mine a
data set of all current players in the league in order to help find those prospects that can bring the
most excitement, scoring and defense to the team in order to reach the league championship.
While salary considerations are always a concern, management has indicated to Juan that their
desire is to push for the championship in the upcoming season, and they are willing to do all they
can financially to bring in the best two to four athletes Juan can identify. With his employers'
objectives made clear to him, Juan is prepared to evaluate each of the 59 prospects' past statistical
performance in order to help him formulate what his recommendations will be.
DATA UNDERSTANDING
Juan knows the business of athletic statistical analysis. He has seen how performance in one area,
such as scoring, is often interconnected with other areas such as defense or fouls. The best
athletes generally have strong connections between two or more performance areas, while more
typical athletes may have a strength in one area but weaknesses in others. For example, good role
players are often good defenders, but can't contribute much scoring to the team. Using league data
and his knowledge of and experience with the players in the league, Juan prepares a training data
set comprised of 263 observations and 19 attributes. The 59 prospective athletes Juan's team
could acquire form the scoring data set, and he has the same attributes for each of these people.
We will help Juan build a neural network , which is a data mining methodology that can predict
categories or classifications in much the same way that decision trees do, but neural networks are
better at finding the strength of connections between attributes, and it is those very connections
that Juan is interested in. The attributes our neural network will evaluate are:
Player_Name : This is the player's name. In our data preparation phase, we will set its
role to 'id', since it is not predictive in any way, but is important to keep in our data set
so that Juan can quickly make his recommendations without having to match the data
back to the players' names later. (Note that the names in this chapter's data sets were
created using a random name generator. They are fictitious and any similarity to real
persons is unintended and purely conincidental.)
Position_ID : For the sport Juan's team plays, there are 12 possible positions. Each one
is represented as an integer from 0 to 11 in the data sets.
Shots : This the total number of shots, or scoring opportunities each player took in their
most recent season.
 
Search WWH ::




Custom Search