Information Technology Reference
In-Depth Information
Pairwise attribute matching techniques often apply transitive closure as a final
step in entity resolution. It is important to evaluate any entity resolution results after
transitive closure is applied because this step can propagate error within the data.
If object a is the same as b and b the same as c, the transitivity relationship will
conclude that a is the same as c. If b and c are not really the same passenger than
this error will duplicate itself when a is joined with c.
While the f-measure metric attempts to combine the values of precision and recall
into one measure, Bilenko and Mooney [14] warn against using this measure in
favour of precision recall curves. In single value measures, the measures do not
provide any indication of where the cutoff threshold that separates matches from
non-matches is. On the other hand, precision values interpolated at standard recall
levels can highlight the performance of a classifier at different cutoff thresholds.
5
Identifying Airline Customers Case Study
In our case study we describe the extraction process of a social network of passen-
gers travelling with an airline that was inferred from a source of passenger booking
data. The data set used for this study consists of a total of 9,468,460 one-way flight
passenger records from which 2,968,282 unique passengers are extracted.
The primary source of data in the sale of an airline ticket is the passenger name
record (PNR). Each airline computer reservation system (CRS) has its own PNR
record format, however all PNRs have a similar structure and contain approximately
the same information. The PNR record contains all the information required to make
a booking and buy a ticket, including the travelling passenger names, flight itinerary,
passenger contact details and information on the entity that made the sale. The pas-
senger contact details can include mail address, email addresses and phone numbers.
However, only the phone number is strictly compulsory. The amount of available
data is usually dependent on the source of the booking. In some cases, such as web-
site bookings, the front-end application can make certain fields compulsory, even
though the back-end CRS does not. The booking and ticket information provides a
wealth of data that can be mined to provide better business intelligence and support
decision making.
5.1
Identifying Customers
Whenever airlines need to analyse customer data usually the only source of data
available is the frequent flyer system, which only contains passengers who volun-
tarily register for the frequent flyer program. A member of the frequent flyer pro-
gram can be a valuable customer or can be a regular passenger. What frequent flyer
membership provides is the facility to track and measure the value of a customer.
Typically valuable customers eventually become members of the frequent flyer pro-
gram, because of the added benefit the program gives them, however not all do.
In this context therefore it is important to distinguish between passengers and
customers. A customer is a passenger who provides value to the airline. Presently
 
Search WWH ::




Custom Search