Neural Networks - Data Mining for the Masses

Database Reference

In-Depth Information

knows that those intangibles are often manifest by athletes' past performance. He wants to mine a

data set of all current players in the league in order to help find those prospects that can bring the

most excitement, scoring and defense to the team in order to reach the league championship.

While salary considerations are always a concern, management has indicated to Juan that their

desire is to push for the championship in the upcoming season, and they are willing to do all they

can financially to bring in the best two to four athletes Juan can identify. With his employers'

objectives made clear to him, Juan is prepared to evaluate each of the 59 prospects' past statistical

performance in order to help him formulate what his recommendations will be.

DATA UNDERSTANDING

Juan knows the business of athletic statistical analysis. He has seen how performance in one area,

such as scoring, is often interconnected with other areas such as defense or fouls. The best

athletes generally have strong connections between two or more performance areas, while more

typical athletes may have a strength in one area but weaknesses in others. For example, good role

players are often good defenders, but can't contribute much scoring to the team. Using league data

and his knowledge of and experience with the players in the league, Juan prepares a training data

set comprised of 263 observations and 19 attributes. The 59 prospective athletes Juan's team

could acquire form the scoring data set, and he has the same attributes for each of these people.

We will help Juan build a neural network , which is a data mining methodology that can predict

categories or classifications in much the same way that decision trees do, but neural networks are

better at finding the strength of connections between attributes, and it is those very connections

that Juan is interested in. The attributes our neural network will evaluate are:

 Player_Name : This is the player's name. In our data preparation phase, we will set its

role to 'id', since it is not predictive in any way, but is important to keep in our data set

so that Juan can quickly make his recommendations without having to match the data

back to the players' names later. (Note that the names in this chapter's data sets were

created using a random name generator. They are fictitious and any similarity to real

persons is unintended and purely conincidental.)

 Position_ID : For the sport Juan's team plays, there are 12 possible positions. Each one

is represented as an integer from 0 to 11 in the data sets.

 Shots : This the total number of shots, or scoring opportunities each player took in their

most recent season.

Search WWH ::

Custom Search

Home