Data Mining for Supply Chain Management in Complex Networks


Supply chain comprises the flow of products, information, and money. In traditional supply chain management, business processes are disconnected from stock control and, as a result, inventory is the direct output of incomplete information. The focus of contemporary supply chain management is to organize, plan, and implement these flows. First, at the organizational level, products are manufactured, transported, and stored based on the customers’ needs. Second, planning and control of component production, storage, and transport are managed using central supply management and replenished through centralized procurement. Third, the implementation of the supply chain involves the entire cycle from the order-entry process to order fulfillment and delivery. Data mining can create a better match between supply and demand, reducing or sometimes even eliminating the stocks.
Data mining thus has become an indispensable tool in understanding needs, preferences, and behaviors of customers. It is also used in pricing, promotion, and product development. Conventionally, data mining techniques have been used in banking, insurance, and retail business. This is largely because of the fact that the implementation of these techniques showed quick returns. Data mining is being used for customer profiling where characteristics of good customers are identified with the goals of predicting new customers and helping marketing departments target new prospects. The effectiveness of sales promotions/ product positioning can be analyzed using market-basket analysis to determine which products are purchased together or by an individual over time, which products to stock in a particular store, and where to place products in each store (Groth 2000; Kopanakis & Theodoulidis, 2003; Weir, 1998). In addition, data mining is used in a variety of other industries such as the financial, healthcare, and telecommunications industry, among others.
There are a lot of opportunities and applications of data mining even beyond the obvious. One of the potential areas is “Supply Chain Management.” One of the realities of the demand and supply in the manufacturing industry is that no matter how well balanced a system is, there is an element of uncertainty that creates a mismatch between demand and supply. The objective of this article is to identify those areas in the supply chain where most of the uncertainty exists and to determine suitable data-mining methods to accurately predict uncertainty. The underlying assumption of this paper is that a data warehouse has been implemented before the data-mining techniques can be applied.


There are two issues that plague supply chain management—variation in the demand and supply and variation in the speed and extent of communication within the supply chain. Variation in demand and supply is due to the inherent uncertainty also present in the processes. Accurately predicting the uncertainties in demand, supply, and processes and then formulating action plans around the prediction is the essence of supply chain management (SCM).
Before we can address the problem of uncertainty in supply chain and explain the use of data-mining techniques, we need to understand the basic process of SCM and where uncertainty exists. In its most simplified form, a supply chain can be depicted as the flow of information from a customer’s customer to a supplier’s supplier and then the flow of material in the reverse direction, as shown in Figure 1.
The whole supply chain can be conceptually broken down as Supply - Process – Demand. Traditional forecasting planners of supply chains use demand and supply forecasting as a means of controlling uncertainty. However, there are three major drawbacks in those methods, namely:
1. Incorrect forecasting model
2. Incorrect number of parameters
3. Incorrect coefficients values of these parameters.

Figure 1. Information and material flows in the supply chain

Information and material flows in the supply chain
Each of these three problems can be solved using data mining. The models are chosen from a finite set of predefined models in data mining. The model can be recreated as many times as needed in order to extract previously unknown patterns and relationships in data. When forecasting using data-mining techniques, the program can detect even minor effects of some parameters.


Data mining is the process of extracting ideas in data. It can also be defined as “a decision support process that tries to discover patterns and relationships that are beyond the realm of human experience and imagination in the large database and present them to a knowledgeable user for review and examination” or “as the process of extracting previously unknown, valid, and actionable information from large databases and then using the information to make crucial business decisions” (Groth, 2000).
Data mining not only uses a discovery-based approach in which pattern matching and other algorithms are employed to determine the key relationships in the data but also describes the steps that must be taken to ensure meaningful results.


Data mining is used to build six types of models aimed at solving business problems: classification, regression, time series, clustering, association analysis, and sequence discovery.


A predictive model is generated based on the historical data. These models are used to assign instances to a group or class by calculating the value of a categorical variable. The value of this categorical variable is generally binary in nature. It can include multiple but discrete values.


Regression is used to predict values for categorical variables. The values are continuous, real numbers, i.e., it has decimal values, and it has no fixed range in which the values of the variables are fitted.

Time-series forecasting

This method uses a series of existing values and their attributes to forecast future values, except that the values of the categorical variables are dependent on time. Using various data-mining tools, the distinctive features of time can be exploited.


Clustering is used to segment a database into clusters, with the members of each cluster sharing a number of interesting properties. These clusters are not predefined and have two basic uses: 1) summarizing the contents of the target databases; and 2) as inputs to the other methods s like supervised learning.


Association is used to describe behavior that is captured in the database. This method relates the occurrences of various events by identifying patterns or groups of items.


Sequencing defines items that are likely to occur together on a sequence basis. This could help marketers in timing their promotions to correlate with the sequential buying order exhibited by their customers.

Data-Mining Process

Robert Grossman, Director of the National Center for Data Mining at the University of Illinois at Chicago, classifies the data-mining process into four phases. Each of these phases is described below:

Phase 1: Data Warehousing

Data warehousing is the foundation for successfully applying data-mining techniques and other analytical and predictive tools. Data warehousing involves the transfer, conversion, and integration of data from legacy systems to a central repository where data is stored and made available to clients.
The downside of data warehousing is the high cost of implementation and the time it takes to complete the process. Data warehouses can cost in excess of $10 million to build and take anywhere from one to three years to complete (Peacock, 1998). This is a very expensive and time-consuming effort.
An alternative is the use of a data mart, which is a functional or departmental data repository. It can constructed as individual components, usually costs between $10,000 and $1 million to build, and can be brought online in less than six months. Data marts can be inconsistent with the data warehouse. To correct this, data may also be enriched with additional attributes. This may be accomplished by adding data extracted from other internal databases or purchased from third-party sources (Asbrand,1997).
Data-mining provides more meaningful data when it uses large databases extracted into data warehouses. Data-mining technology is more commonly used in large, consumer-oriented businesses such as banking and the retail industry because of the extremely high cost of implementation.

Phase 2: Data Mining Tools

Algorithms are applied to the data in order to produce predictive models. The selection of tools depends on the proper identification of a business problem and analysis to determine the correct technique to use. Some common types of problems and the technique used in data mining are illustrated in Table 1 (META Group, 1997):
The key to using these tools is to understand that it requires a team effort between the analysts, the marketing experts, and information technology experts.

Phase 3: Predictive modeling

During this phase, the predictive models are analyzed and combined to produce a single aggregate model. These techniques may be mixed sequentially or in parallel. Sequentially, the user picks a technique to produce a model and then applies another technique to the results. In parallel, the user chooses different techniques and applies them all to the initial dataset.

Phase 4: Predictive Scoring

Here, the predictive models are applied to score operational data (Grossman, 1998). For instance, a bank could analyze the attributes and habits of its checking account customers for clues that might reveal an acceptable minimum balance in order to retain profitable customers. The bank can use data mining to develop profiles of customer groups inclusive of members consistently having trouble maintaining minimum balances. This helps the bank identify profitable customers and predict the minimum balance needed to retain them. As a result, the percentage of profitable customers can rise by a significant percentage (Fabris, 1998).

Table 1. Problem types and techniques used in data mining

Example Problem Type Technique
What are the top three Classification Neural Networks
characteristics of customers who Decision Tree
have switched to my
What are the largest buckets Clustering Neural Networks
within my customer base to Decision Tree
which I should be marketing a
new service?
What is the likelihood a given Association and Sequencing Statistical Techniques
individual who opens a bank Rule Induction
account will also open an IRA
within the next three months?
What will the average exchange Regression and Forecasting Neural Networks
rate be over the next three Statistical Techniques


Manufacturers, airlines, banks, insurance companies, credit card companies, and retailers have successfully used data-mining technology. Data mining works best as a supplement to the existing tools and can be used with decision support systems (DSS) to improve the overall result of the system. Zdanowicz (2004) discuss the detection of money laundering and terrorist financing via data mining. Data-mining tools can be used to provide the most accurate picture of the capacity, maintenance, and factory scheduling problems. DSS can take this information as input to provide the planner with an optimal factory scheduling solution.
The supply chain model for any industry will have suppliers, manufacturers, distributors, retailers and customers. Next, we look at each segment of the supply chain to understand the application of data mining.


At this point of the supply chain, retailers receive the forecast from primarily two sources, one directly from the individual customer and the other from small- and medium-size organizations. Actual consumption by the individual customers and medium and small organizations is added. The difference between the forecast and the actual consumption is the variation in the demand. The difference between the request and the supply accounts for the variation in the supply.
Data mining can be used at this point in the following ways:
• Market segmentation based on service needs of distinct groups;
• Market basket analysis - retailers can understand the buying behavior of the customers; and
• Target promotion with the use of a computerized approach and an extensive database.


At this point in the supply chain, distributors receive the forecast from retailers and large organizations. They pair this data with the actual consumption by the retailers and large organizations. The difference between the forecast and the actual consumption is the variation in the demand. The difference between the promise and the actual supply accounts for the variation in the supply.
Data mining can be used at this point in the following ways:
• Predictions of supply uncertainties to predict the supply uncertainties at the supplier level and at the item level;
• Predictions of process uncertainties can be loss and item obsolescence;
• Predictions of demand uncertainties – market segmentation for retailers and big customers based on factors such as volume of demand, periodicity, variations, etc.;
• Stock out prediction at the warehouse; and
• Strategic implications - logistics becomes a tool to help accomplish corporate strategic objectives.


At this point, manufacturers receive the demand from distributors or directly from retailers. Manufacturers pass on the demand of retailers to alliance partners and try to fulfill only the demand of distributors. The difference between the forecasted demand from distributors and the actual consumption by the distributors is the variation in the demand. The difference between the promise and the actual supply from component suppliers accounts for the variation in the supply for manufacturers.
Data mining can be used at this point in the following ways:
• Predictions of supply uncertainties – predict supply uncertainties for manufacturers at the supplier level and at the item level;
• Predictions of process uncertainties due to machine breakdown, poor performance, and maintenance schedules;
• Predictions of demand uncertainties based on distributor, item, location, etc.; and
• Predicting future trends in demand – discover trends in the demand of the product.

Mass Customization

Another area where data mining can be used is in identifying the products for “mass customization” at the delivery end. Companies implement mass customization by putting together unique customer orders from a large number of products, while minimizing the inventory of product components. Data mining helps in identifying these permutations and also in identifying the demand pattern for these permutations.


A data-mining application can be an effective tool only when implemented along with some existing decision support system and an enterprise resource planning (ERP) system. The difference between a decision support system (DSS) and a data-mining application is that, unlike data-mining applications, DSSs do not perform queries or analysis of data. Figure 2 illustrates the manner in which this marriage between DSS and data mining affects the implementation architecture.
An ERP system integrates the processes of an organization. Data related to these processes are scattered throughout the system. This large amount of unorganized data cannot be used as it is for data-mining applications. A data warehouse has to be built that organizes ERP data so that it is easily accessible and organized for online analysis.
The smooth flow of information from an ERP to a data-mining tool through a data warehouse will require open metadata integration. Information describing the ERP data, the target data warehouse schema, and the data mappings and transformation rules need to be stored in an open relational metadata repository and be easily accessible to the data- mining tools used in the architecture. This facilitates changes made to the underlying warehouse. It then passes these changes to the analysis tools and makes complete information available to end users, such as the source of the data flows and frequency of updates (Coombs, 1999). Some of the issues in implementing data mining include result interpretation, data selection and representation, and system scalability considerations.


Data mining creates fertile ground for the invention of new tools, analytical methods, and data management to add value to an organization’s most valuable asset—its data. Organizations have experienced paybacks of 10 to 70 times their data warehouse investment after data-mining components are added (Chen, Sakaguchi, & Frolick, 2000). But it is not meant to be used for simple querying or reporting purposes, nor complex queries where the parameters are known. It is ideal for cases where only the problem is known, and neither the parameters nor the values of the parameters are known.
It is more important to formulate the business problems and opportunities rather than to start at the technology side. A company should ask the following questions:
1. What are the main business problems and opportunities in the supply chain?
2. What knowledge does a company need to solve these problems or explore opportunities?
3. Can a company use this knowledge to take appropriate action?
4. Does a company have the necessary (historical) data available on demand, supply and process behavior to make an analysis potentially successful?
5. Given a company’s knowledge need, what would be the appropriate technique to analyze the available data?

Figure 2. The data mining implementation architecture

The data mining implementation architecture
After answering all these questions, a company may decide whether data mining is an appropriate technology.


A common characteristic shared by the industry users of data-mining technology is the data intensive. The benefits of data mining have been sufficiently proven in a wide variety of business sectors; it’s time to expand the horizon of the application of data mining to other potential areas such as supply chain management. This article makes an initial effort to explore the possibility of using data mining in supply chain management. There is a huge potential for improvement using the techniques of data mining. If small areas can be identified and an incremental approach followed to studying supply chain management, the early adopters of this technology will reap major benefits. Considering the capabilities of today’s data-mining software, the power of hidden patterns and relations in supply chain data will lead to important differentiating factors in the competitive global business environment.


Data Mining: Process of extracting ideas from data.
Data Warehouse: System for managing data for decision support.
Induction Techniques (Decision Trees and Rule Induction): Assign the largest portion of the discovery process to machine (Nelson, 1998).
Logistics: The art of moving goods within the supply chain.
Neural Networks: Build internal representations of the patterns (Nelson, 1998).
Statistics: High level of user involvement to build models describing the behavior of the data (Nelson, 1998).

Next post:

Previous post: