The Development of a Health Data Quality Programme

abstract

Data quality requirements are increasing as a wider range of data becomes available and the technology to mine data shows the value of data that is “fit for use.” This topic describes a data quality programme for the New Zealand Ministry of Health that first isolates the criteria that define “fitness” and then develops a framework as the basis of a health sector-wide data quality strategy that aligns with the sector’s existing strategies and policies for the use of health information. The framework development builds on existing work by the Canadian Institute for Health Information, and takes into account current data quality literature and recognised total data quality management (TDQM) principles. Strategy development builds upon existing policy and strategy within the New Zealand health sector, a review of customer requirements, current sector maturity and adaptability, and current literature to provide a practical strategy that offers clear guidelinesfor action. The topic ends with a summary of key issues that can be employed by health care organisations to develop their own successful data quality improvement programmes.

introduction

The New Zealand health sector data quality improvement programme attempts to provide a structure for managing data to prevent data quality errors at the source of collection and to maintain the meaning of the data as they move throughout the health sector. This approach requires viewing data quality from a holistic perspective — going beyond a one-dimensional assessment of quality based only on accuracy — and assessing other dimensions (Ballou & Tayi, 1999) such as relevance, timeliness, comparability, usability, security, and privacy of data.

As data quality affects everyone in the health sector, the whole sector is responsible for maintaining and improving data quality. The role of the New Zealand Ministry of Health is one of leadership and support, whilst data collectors need to employ all possible processes to ensure only high quality data are collected, using agreed national and international standards, where available. Data quality needs to be the responsibility for high-level managers in an organisation to ensure the entire organisation makes the required changes for improvement. “All too often data quality is seen as something that is the responsibility of informatics staff alone and is often seen with disinterest by clinicians and managers, despite being so critical to the quality of the decisions they make” (Data Remember, UK National Health Service, 2001; UK Audit Commission, 2002).

This topic describes the development of a data quality evaluation framework (DQEF) and underpinning strategy for the Ministry of Health in New Zealand and outlines the process to “institutionalise” total data quality management throughout the whole of the health sector.

THE IMPORTANCE AND ELEMENTS OF DATA QUALITY PROGRAMMES

Bill Gates (1999) states:

The most meaningful way to differentiate your company from your competition, the best way to put distance between you and the crowd, is to do an outstanding job with information. How you gather, manage and use information will determine whether you win or lose.

Organisations are becoming more and more dependent on information. Virtually everything the modern organisation does both creates and depends upon enormous quantities of data. A comprehensive data management programme is therefore essential to meet the needs of the organisation (Pautke & Redman, 2001). Many authors, for example, Levitin and Redman (1993),also draw attention to the importance of data quality in managing information as a resource of the organisation.

The first step in setting up a data quality (improvement) programme is therefore to decide the determinants that define quality. A framework is then required to apply these determinants and their associated metrics that can assess the level of data quality and establish processes such as collection, storage, access, and maintenance that lead to quality improvements where they are necessary. Finally, whilst a data quality framework models the data environment, it must be underpinned and supported by a strategy that is broader in scope. This strategy establishes the business purpose and context of data and aims to make the framework a routine tool and part of day-to-day operations.

These three elements, quality determinants, assessment framework, and implementation strategy are the core components of a data quality programme and we now look at these stages in more detail.

Data Quality Determinants

Strong, Lee, and Wang (1997) take a consumer (people or groups who have experience in using organisational data to make business decisions) focused view that quality data are “data that are fit for use,” and this view is widely adopted in the literature (Wang, Strong, & Guarascio, 1996). Redman (2001) comes to the following definition based on Juran and Godfrey (1999):

Data are of high quality if they are fit for their intended uses in operations, decision-making, and planning. Data are fit for use if they are free of defects and possess desired features.

Clearly, however, fitness for purpose depends upon the purpose and so the set of data quality determinants will vary according to the application. In addition, modern views of data quality have a wider frame of reference and many more features than the simple choice of attributes such as accuracy and currency. There are therefore multiple approaches to designing and applying data quality systems as well as competing terminologies to trap the unwary (Canadian Institute for Health Information, 2005; Eppler & Wittig, 2000). The approach we have adopted here is based on a hierarchical system (see next section) developed by the Canadian Institute for Health Information (CIHI) (2003a, 2005). At the uppermost level are the familiar attributes such as accuracy, relevance, and so forth. These attributes are referred to in the scheme as “dimensions.” Each dimension is defined in context by appropriate determinants known as “characteristics.” For example, characteristics of the accuracy dimension might include the tolerated level of error and the population to which the data accuracy must apply. Characteristics require answers to “what is/are” questions. Underpinning these characteristics are “criteria” that define processes and metrics used to assess the presence of potential data quality issues. Thus, the level of error characteristic might be assessed by asking if the error falls into a predefined range and if the level of bias is significant. Criteria typically demand “yes” or “no” answers. In this topic we are concerned mainly with data quality dimensions and criteria although we describe the hierarchical process for their selection.

Data Quality Evaluation Frameworks

At its most basic, a data quality evaluation framework (DQEF) is defined by Wang et al. (1996) as “a vehicle that an organisation can use to define a model of its data environment, identify relevant data quality attributes, analyse data quality attributes in their current or future context, and provide guidance for data quality improvement.”

In our terminology, a DQEF enshrines and enacts the processes and metrics (criteria) that assess whether the level of a dimension (e.g., accuracy) is acceptable or not. However, Eppler and Wittig (2000) describe a data quality framework as:

A point in time assessment and measurement tool, integrated into organisational processes, providing a benchmark for the effectiveness of any future data quality improvement initiatives and a standardised template for information on data quality both for internal and external users.

The same authors also add that a framework should not only evaluate but also provide a scheme to analyse and solve data quality problems by proactive management.

Ideally, therefore, a DQEF goes beyond straightforward quality assessment and becomes an integral component of the processes that an organisation puts in place to deliver its business goals (Willshire & Meyen, 1997). The framework then uses data quality to target poor quality processes or inefficiencies that may reduce profitability or lead to poor service (Wang et al., 1996). The implication of this extended framework is that the organisation will need to engage in a certain amount of business process reengineering. That is why the framework needs to be part of an organisational improvement programme and strongly tied to a data quality strategy.

Data Quality strategies

Thus far, little has been published on what constitutes a data quality strategy, let alone an evaluation of a structured and tested improvement programme. Recently, however, this type of strategy has become increasingly important as a core requirement for many businesses. It is likely that some large organisations do have such strategies, improvement programmes, or components of them, but these are not currentlydocumented and available in the literature. Davis (2003), publishing on the FirstLogic Web site, wrote several articles on his vision of a data quality strategy. According to Davis, a data quality strategy should include:

• A statement of the goals (what is driving the project)

• A description of the primary organizational processes impacted by the goals

• A high-level list of the major data groups and types that support the operations

• A description of the data systems where the data groups are stored

• A statement of the type of data and how they are used

• Discussion of cleansing solutions matching them to the types of data

• Inventory of the existing data touch points

• A plan for how, where, and when the data can be accessed for cleansing

• A plan for how often the cleansing activity will occur and on what systems

• A detailed list of the individual data elements

Whilst Davis’ list is a useful starting point, it is based on an information providers’ perspective. Other components should be added to incorporate the needs of consumers and to define and document these. This is still not easy to do given that customers often do not know what their needs are (Redman, 2001). A first step would be the identification of the organisation’s customers, or important customer groups, where there are too many individual customers for initial improvement programmes.

With this survey of the elements of data quality improvement programmes we can now look briefly at international attempts to apply the principles to health care.

INTERNATIONAL HEALTH DATA QUALITY PROGRAMMES

Health care delivery and planning rely heavily on data and information from clinical, administrative, and management sources, and quality data lead to quality and cost-effective care, improving patient outcomes and customer satisfaction. Data for health care delivery range from the clinical records of individual patients, detailing their interactions with medical services, to the administrative data required to manage the complex business of health care. When abstracted and aggregated in warehouses, and informed by other management and policy information, these “unit level data” produce knowledge bases for heath care planning and decision support. The totality of data from these various sources can then be used for further policy development (Al-Shorbaji, 2001). A prime example is the planning at government level to provide services that address the prevalence and distribution of diseases such as diabetes in a population. Clearly, any programme that improves the quality of data at all levels will improve the quality and cost-effectiveness of care and patient outcomes.

A review of international data quality improvement programmes in health care, including the National Health Service (NHS) (Department of Health, 2004) in the United Kingdom, the Canadian Institute for Health Information (CIHI) (2003b), HealthConnect Australia (Department of Health and Aging, 2003), and the United States Department of Health and Human Services (2002) identified similarities between the various programmes. All of the reviewed programmes note the multilevel, multidimensional complexity of data quality improvement initiatives. They seek to manage data proactively and ensure integrity by preventing data quality problems using a systematic total data quality management (TDQM) approach (Wang, 1998). There is also commonality of role expectations — the data suppliers are responsible for the quality of the data they provide to the central government, whilst central government is required to provide leadership and assistance to data suppliers by developing sector wide standards and best practice guidelines.

The NHS in particular outlines clearly, and in detail, the substantial work that is required by any health care provider to ensure good data quality. The NHS developed an accreditation scheme that was initially thought to be all that would be needed to ensure the supply of good quality data. The scheme is extensive and was found to be very successful but did not sufficiently identify the responsibilities of the data supplier; central government was still monitoring more than it was leading. This discovery led to the more extensive guidelines developed around principles of data quality supported within the NHS. Several NHS Trusts have published data quality strategies on their Web sites that align to the NHS core strategy requirements. These efforts have been recognised in a recent review of the programme by the UK Audit Commission (2004), which found significant improvements to levels of data quality. However, similar issues to those initially found are still apparent after five years of targeted improvements and the report recommended:

• The development of a more coordinated and strategic approach to data quality

• Development of an NHS-wide strategy for specifying, obtaining, and using both national and local information

• Making more and better use of patient-based information

• Involving Trust board members

• Training and developing staff

• Keeping systems up to date

In Canada, the CIHI also briefly discusses accreditation for enhancing collaboration with data suppliers. They have undertaken extensive work on data quality through collaborative work with experienced statisticians from Statistics Canada and base their theories on research by the Massachusetts Institute of Technology in the U.S.

We now look at the relevance of this international experience in health data quality programmes to New Zealand health.

Towards a health data

QUALITY PROGRAMME IN NEW ZEALAND

The New Zealand Health Information Service (NZHIS) is a specialised group within the Ministry of Health responsible for the collection and dissemination of health-related data. NZHIS has as its foundation the goal of making “fit-for-purpose” information readily available and accessible in a timely manner throughout the health sector to support the sector’s ongoing effort to improve the health status of New Zealanders. The vision of NZHIS is to be a leader in the provision of health information services in New Zealand, and to be recognised and respected as a leading organisation internationally. Effective and timely use of information is crucial to achieving this vision. NZHIS has responsibility for:

• The overall collection, processing, maintenance, and dissemination of health data, health statistics and health information

• The ongoing quality improvement of data entering the national data collections1

• The continuing maintenance and development of the national health and disability information systems

• The provision of appropriate databases, systems, and information products

• The development and provision of health and disability information standards and quality-audit programmes for data

• Coordination of ongoing national health and disability information collections and proposals for their development

• Analysis of health information, performance monitoring, benchmarking, and advice on the use of information obtained from NZHIS

Thus, while the NZHIS is responsible for the lead on data quality, this does not mean that it is solely accountable for solving data quality problems. The role of the NZHIS is rather to define data quality criteria and establish a framework in which health care organisations can assess the quality of their own data. This framework must also ensure that data quality does not degrade when data are moved between organisations. These data move with the patients they refer to creating reciprocal dependence between the organisations so that poor data management in one organisation can adversely and incrementally affect other organisations and the care a patient receives. A national “systems” framework is therefore needed to certify that data used for decision making meet the same quality criteria and standards both within and between organisations.

Assessment of the national health collections at NZHIS (2003) showed that the required framework was unlikely to lead to sustainable improvements unless it was placed in the context of a national strategy that would reengineer data processes and embed the quality-centric revisions in normal, everyday practice. The resources to improve quality were already available, but they were being applied to the wrong processes and without consistency or direction. In addition, short-term priorities needed to focus on areas where benefits could be realised easily with long-term projects concentrating on implementing change.

Thus, as suggested by the previous discussion, the approach to the New Zealand Health Data Quality Improvement Programme is seen to consist of three stages.

1. Determination of the criteria needed to judge health data quality in a New Zealand context

2. Development of a practical framework to apply these criteria to existing and new database collection

3. Development of a strategy to implement the framework and embed it in normal practice

The next sections describe the research methodology and the design and application of these three stages.

Research methodology for The Programme Development

The research utilised several qualitative methodologies — action research, semi-structured interviews, focus groups, and a questionnaire to develop and formally assess a health data quality framework that could be tied to an implementation strategy and promulgated as a data quality programme. The two focus groups were derived from a Ministry Data Quality Team (MDQT) formed specifically to look at ways of improving quality in a consistent way across the organisation. Membership of the MDQT was selected from across the ministry to bring together business units that appeared to have similar issues with data quality but at that time had no formal infrastructure to coordinate quality initiatives. Members were mostly “information users” such as information analysts and business intelligence staff, but some were also members of the already existing operational Clinical Analysis Team. All regularly used data for a wide range of different purposes.

A grounded theory (Strauss & Corbin, 1998) approach was used to analyse the data for content themes that could reveal new concepts. The research concentrated on eliciting the opinions of the participants on areas such as:

• The applicability of the criteria, characteristics, and dimensions for the assessed collection

• Proposal of other dimensions that may be applicable

• The language used in the framework

• The language and examples provided in the user manual

• The length of time required to complete the assessment using the framework

• The value to users of the information provided from using the framework

• The table of contents for the data quality documentation folder

The DQEF then went through a pilot evaluation process using two health data collections. Initial assessment was made on the mortality data collection which is a national health collection considered to have good data quality in relation to other collections. The mortality collection has been established to provide data for public health research, policy formulation, development and monitoring, and cancer survival studies. A complete dataset of each year’s mortality data is sent to the World Health Organization to be used in international comparisons of mortality statistics.

The second data collection consisted of clinical data held in a local hospital setting. These data are used to determine best health outcomes for clinical care pathways and they are consequently stored at a more granular level than the national health data.

Further details of methodology are given with the discussion of the research findings.

research findings

selection of Health Data Quality Dimensions

The development of suitable dimensions for assessing health data quality in a New Zealand context was based as indicated on the Canadian Institute of Health Information (CIHI) data quality framework (Long, Richards, & Seko, 2002). This framework is itself based on Statistics Canada guidelines and methods, information quality literature, and the principle of continuous quality improvement (Deming, 1982). The CIHI is comparable in function to NZHIS and the health care systems of the two countries are also similar in many respects.

The CIHI data quality framework operation-alises data quality as a four-level conceptual model (Long & Seko, 2002). At the foundation of the model are 86 criteria and these are aggregated using the framework algorithm into the second level of 24 characteristics that in turn define 5 dimensions of data quality: accuracy, timeliness, comparability, usability, and relevance. Finally, the five dimensions can be reduced using the algorithm into one overall database evaluation. Figure 1 provides a summary of the four-level conceptual CIHI model.2

The CIHI framework was first assessed for completeness, applicability, and ease of adaptation in New Zealand against current ministry information strategy documents. These include regional health care providers strategic plans and the WAVE report (working to add value through e-information) (WAVE Advisory Board, 2001), which is New Zealand’s national information management strategy for health. Compliance with New Zealand legislation was also considered.

The MDQT focus groups made minimal changes to the basic content of the CIHI framework retaining the hierarchical approach but removing some criteria that were thought inappropriate and adding others that were considered important in the New Zealand context, yielding a total of 69 from the original 86. The most significant change was the inclusion of an additional dimension, privacy and security, to satisfy New Zealand concerns. The CIHI states that privacy and security are implicit requirements that are embedded in all their data management processes. Whilst the same could also be said of the Ministry of Health, the pervading culture in New Zealand requires that privacy and security of information, in particular health information, are paramount. Therefore, the MDQT felt there was a requirement for explicit and transparent consideration of these quality dimensions. The underpinning characteristics for these new dimensions were developed by the senior advisors on health sector privacy and security to ensure alignment with the ongoing development of new privacy and security policies.

The six data quality dimensions chosen for the DQEF by analysis of the feedback from the focus groups are as follows with accuracy as the most important:

1. Accuracy is defined within the framework as how well data reflect the reality they are supposed to represent.

2. Relevancy reflects the degree to which a database meets the current and potential future needs of users.

3. Timeliness refers primarily to how current or up-to-date the data are at the time of use.

4. Comparability is defined as the extent to which databases are consistent over time and use standard conventions (such as data elements or reporting periods), making them similar to other databases.

5. Usability reflects the ease with which data may be understood and accessed. If data are difficult to use, they can be rendered worthless no matter how accurate, timely, comparable, or relevant they may be.

6. Security and privacy reflect the degree to which a database meets the current legislation, standards, policies, and processes.

Within the research participant group the understanding of the meaning of these dimensions varied. No one definition could be found for even the most commonly used data quality dimensions (Wand & Wang, 1996). For this reason it is important for the development of the DQEF to explicitly define each dimension. It was decided to utilise the definitions provided by the CIHI Framework where possible, as these aligned with the characteristics and criteria adopted throughout the framework. Importantly, the dimensions were found to be mutually exclusive and collectively exhaustive.

Definition of the Health Data Quality Evaluation Framework

The development of the health DQEF began with a current state analysis (New Zealand Health Information Service, 2003) through a preliminary survey of managers and users from across the ministry. This survey consisted of open questions requiring free-text answers to elicit information on a set of factors including historical and contextual information about the collection, the data collection processes, any changes made to data from within the ministry, what the data are used for, where they reside, and the nature and perceived effectiveness of existing data quality initiatives.

The gathering of this information proved difficult. The survey results showed there were currently no compiled and complete records of data quality for any of the national data collections administered or managed by the Ministry of Health and neither was there clear accountability for data quality in the health sector. The information is spread over a range of business units, people, and documents so that the Ministry cannot easily assess the scope or effectiveness of its data quality measures. Furthermore, the varying requirements for quality between centrally held and managed collections and those “at the coalface” led to considerable uncertainty as to what “quality” entailed. This situation and extensive discussions with data users involved in the development of the Ministry of Health Information Systems Strategic Plan served only to confirm the need for the DQEF and the availability of assessment tools that could provide information on the nature and levels of data quality and identify the sources of problems so as to allocate responsibilities.

For the New Zealand Ministry of Health, therefore, the DQEF takes the form of a tool that allows a consistent and accurate assessment of data quality in all national health data collections, enabling improved decision making and policy development in the health sector through better information. The DQEF standardises information on data quality for users, provides a common objective approach to assessing the data quality of all health information databases and registries, and enables the identification and measurement of major data quality issues.

The draft DQEF, consisting of the aligned set of quality criteria, characteristics, and dimensions, was sent to all group participants. A presentation to the MDQT was made prior to the focus groups to ensure all participants had a common understanding of the purpose of the DQEF and the desired outcome goals. The group participated in two focus groups of two hours each. A member of the Ministry’s Health Information Strategy and Policy Group (the researcher) led the focus groups and an administrator was present to make audio recordings and to later transcribe the recordings, noting also the interaction between group members on discussion points.

A template was developed to assist data managers and users to assess and document the effectiveness of the DQEF, its user manual, and the proposed data quality documentation folder for each data collection. The documentation folder houses all information pertaining to the quality of each data collection and makes it available in both paper and online format for access by all staff at the Ministry of Health. Following the focus group sessions, a second review of the DQEF, assessed using criteria developed by Eppler and Wittig (2000) ensured that the DQEF remained robust after localised changes.

The information provided by the DQEF evaluation highlighted data quality problems that were already known to the data custodians who had already initiated work to make improvements. However, several other deficiencies were highlighted, indicating where improvements in the current data quality work could be made, or in the case of timeliness, highlighting known issues that perhaps should be made a priority for improvement.

The issue of timeliness of the suicide in themortality data collection data highlights a tradeoff debate between dimensions. For example, data users are aware that some datasets are likely to be incomplete; then the data can be released without delay making them become usable at an earlier date. Eppler and Wittig (2000) note that tradeoffs are not commonly addressed in frameworks. Research participants did not raise this issue and the DQEF does not explicitly address trade-offs between dimensions. However, an analysis of possible trade-offs and their implications could be included in the user manual. Different tradeoffs would apply to different collections, and again may be different for specific uses of the data (Ballou & Pazer, 2003).

An issues register was kept to supply feedback to the researcher on the usability of the DQEF and user manual. Overall, the DQEF was found to provide useful data quality information by collection users and custodians and to provide sufficient information to make at least preliminary prioritised lists of essential data quality improvement projects. A detailed analysis of the feedback is provided.

Training

Further training was required to ensure that evaluators use the DQEF consistently and that it is a practical and easy tool to use. During training sessions it was found that many users of the DQEF initially made assumptions about the meaning of criteria and needed to refer to the metrics and definitions contained in the user manual. Consistency of meaning of criteria is important since many different users complete the evaluation both within an evaluation cycle and also from one evaluation to the next. For this reason a section was included under each criterion that asked the evaluator to describe in free text how they came to their decision, including references to any data quality information used to make their decision. This requirement also aids any subsequent evaluation of a collection.

Time Taken to Complete Evaluations

As noted in the findings of the Current State Analysis of Data Quality in the Ministry of Health, data quality information is held in many different locations and finding this information takes considerable time. The time taken to complete an evaluation of a data collection by already busy staff was estimated to be a minimum of four hours when all relevant documentation was available. In practice, the information was held in disparate locations by different staff and the initial evaluations took far longer. Repeated evaluations of the same collections would be completed much more efficiently, as much of the information could remain the same or merely need updating.

Intended Unit of Analysis

The researcher and participants also discussed the granularity of evaluation or intended unit of analysis. Participants asked, “was the DQEF able to evaluate a collection as a whole and also a column within a collection?” Price and Shanks (2005) found similar issues when implementing a data quality framework and noted the limitations of a framework on data that are not uniquely identifiable, such as those found in non-key columns. It was therefore decided to measure each collection as a whole. However, there are specific data elements that can be assessed individually and are extensively used for analysis and decision making. An example would be a registry, such as the National Health Index (NHI, a unique patient identifier), where each field of demographic information is the reference for other collections and provides information to prevent duplicate allocation. In this case, the DQEF can be used as a two-dimensional tool, assessing the registry as a whole and then each element.

User Manual

Extensive changes to the CIHI user manual were required to make it useful to the New Zealand health care environment. The findings show that the target audience for the accompanying DQEF user manual needed careful definition to ensure that the language was appropriate and the documents easy to use. In particular, the language used in the original CIHI version was found to be too simplistic for the intended audience. DQEF users are likely to be systems administrators, data quality advisors, and members of the Business Intelligence Team, but the language assumed little underlying knowledge of data and systems. The user manual could also be shortened by moving some of the underlying theory used to develop the Data Quality Improvement Programme to other documents intended for potential users taking part in education programmes.

Extent of Data Quality Information

Initial feedback from the data quality team at NZHIS showed that the DQEF did not provide sufficient information to make decisions on the quality of data. There appeared to be insufficient detail to manage data quality effectively. However, once the team became more familiar with the DQEF and began to understand the context of its use, the feedback became more positive. Users realised that the purpose of the DQEF was to provide guidance on the regular processes and measures needed to answer criteria questions. By presenting a more complete view of the data quality of a collection, these “answers” suggested new measures, not previously thought of by the data quality team. In effect, the DQEF is an invaluable guide or checklist that facilitates the consistent application of appropriate and wide ranging measures across all collections. It has the ability to raise the standard of work expected in data quality by bringing about awareness of areas of deficiency in the current work programme.

Interestingly, findings from the mortality data collection evaluation show that the Business Intelligence Team (as data consumers) required the most detailed information on how the assessment was made for each criterion, whereas managers required only summary information. This variation reflects the different decisions made by the different groups and hence the distinctive uses and required granularity of data quality information. Clearly, reports on the outcomes of assessments should be tailored to provide this information specific to audience requirements. In support of this observation, Chengalur-Smith, Ballou, and Pazer (1999) noted the importance of information format in their research on the impact of data quality information on decision making. They found that users required complex data quality information to make simple decisions but that they did not make use of this information for complex decisions. We consider this reciprocity may be due to “information overload” where too much information can be counterproductive to decision making in more complex decision environments.

Although decisions made in the health care environment are often “complex,” what a complex decision is to one user may not be to another. This implies that users need to have input into decisions on the granularity of the supplied data quality information. Chengalur-Smith et al. (1999) note that it may be most effective to provide users with data quality information that focuses on one or two key criteria that exhibit the greatest impact on the data.

A later study by the same group (Fisher, Chengalur-Smith, & Ballou, 2003) observes the impact of experience on the decision maker, whereby increasing use of data quality information is found as experience levels progress from novice to professional and suggests that data quality information should be incorporated into data warehouses used on an ad-hoc basis.

Metrics

The DQEF uses subjective metrics for data criteria that have been developed by the collections’ data custodians. Whilst this is a valid form of measurement, the robustness of the DQEF requires additional objective metrics and these are derived from a structured system based on statistical process control (Carey & Lloyd, 2001). The metrics include measures that assess customer requirements for levels of data quality, trends in historical data within the national health collections, current key performance indicators for contracted data suppliers, and legislative requirements for the provision of data by health care providers and the Ministry of Health to international bodies such as the World Health Organisation. Further work is still required to develop applicable metrics.

In summary, the pilot of the DQEF in the NZHIS environment elicited these issues:

• Training was required before using the DQEF.

• Users felt considerable time was needed to complete the evaluations.

• The extent and detail of information (particularly about how evaluation decisions are made) provided by the DQEF assessment process must meet the needs of data users, such as the Business Intelligence Team.

• Granularity/units of analysis need to fit the type of collection; registries such as the NHI require more granular, element level analysis.

• Language in the user manual and DQEF is an important consideration.

• Further work is required on metrics development.

Overall, collection users and managers found the DQEF to offer sufficient information on data quality to make preliminary, prioritised lists of essential data quality improvement projects. Further training has been necessary to ensureassessors use the DQEF consistently and that it is a practical and easy tool to use. The CIHI also found considerable training and change management were necessary to implement the CIHI Framework due in part to already heavy workloads (Long & Seko, 2002).

Pilot of the Framework in a Hospital Environment

The purpose of this pilot was to assess the application of the DQEF to the clinical integrated system (CIS) model that is currently used in clinical practice at Auckland District Health Board (A+). The assessment would indicate the applicability of the DQEF and its chosen data quality dimensions in the wider health sector. The CIS Model is an interdisciplinary computerised model of patient care that replaces paper notes and requires all staff members to document care via the computer. The software programme was developed by the A+ Network Centre for Best Patient Outcomes. This model has been in clinical practice since 2000 and has been used for 6,000 patients (Fogarty, 2004). The main objectives were to review the DQEF criteria against the CIS model database to:

• Determine the appropriateness of the criteric for a clinical database

• Assess the documented data quality of the CIS model cgcinst the DQEF criteric

• Assess the clarity of DQEF documentation

Positive feedback was given on the usefulness of the information provided by the assessment process and the applicability of the DQEF. The majority of the criteria could be applied to an external clinical database as shown in Table 1. This table highlights that 52 out of the possible 69 criteria used in the DQEF conform to the data quality requirements of the clinical database held at the local hospital level. The language used in the DQEF was further improved to ensure consistent understanding using the detailed feedback on each criterion provided by this assessment.

The DQEF evaluation process also proved valuable to the hospital submitting the clinical data set. It was suggested by the hospital data analyst who completed the evaluation that some formal, sectorwide criteria based on the DQEF, together with a certification process such as accreditation, would help to ensure that clinical databases are valid and reliable.

Evaluation of the Framework Against Eppler and Wittig criteria

The resulting DQEF, with changes as recommended by focus group participants and the hospital environment, was assessed using Eppler and Wittig’s (2000) criteria. The feedback can be found in Table 2. The recommendations centre on the development of business processes that support the effective use of the DQEF in the Ministry of Health environment and the content of the manual that instructs users on the implementation methodology. This supports Eppler and Wittig’s (2000) theory that a framework requires tools, such as guidelines and manuals, to be effectively implemented. CIHI also found that a detailed step by step guiding process is required to implement a framework (Long & Seko, 2002).

The summary information gained from evaluations of all national health data collections was collated to form a prioritised list of data quality improvement initiatives across the ministry. Ongoing assessment using the DQEF will provide information on the success of such initiatives. The DQEF remains an iterative tool, whereby those that use the tool will improve its usefulness with growing knowledge of data quality theory, the level of data quality of the national health collections, and the priorities for improvement. In particular, the data quality improvement strategy described in the remainder of the topic will support the ongoing refinements to the DQEF.

Development of the Health Data Quality strategy

As mentioned earlier, little has been published on what constitutes a data quality strategy, let alone an evaluation of a structured and tested improvement programme. It is particularly important to note that a data quality improvement strategy is not an “information technology strategy,” nor an “information systems strategy.” Although such strategies may provide insight and tools to underpin a data quality improvement strategy, data quality improvements cannot be attained merely through information technology; the problem is one of processes and people. As noted in Ward and Peppard (2002), “clearly, technology on its own, no matter how leading edge is not enough.”

Technology can support the achievement of an organisation’s business goals at an operational level by increasing efficiency and at a strategic level by reengineering the business processes to be more effective. At both levels, however, the technology is merely an enabler and it is information and its management that makes the difference. Since the quality of the information rests on the quality of the raw data used to derive it, data quality is a critical issue. Technology can be used to improve data quality in some processes and also to exploit the superior quality for better business outcome in others.

Unfortunately, whilst the technical aspects of data quality assurance are usually well-documented, documentation of the business processes that support good data quality is often lacking. Information processes must be carefully documented to make the meaning of data transparent to all users. For example, data sources should always be identified to provide the data user with context around the data collection process. Imposing standards and maintaining consistency across business units with data definitions, business rules, and even systems architecture can lead to greater data integration and utility and hence perceived quality. This is where a “whole of health sector” strategy begins to have a significant effect on the quality of data and where centralised leadership is required to ensure that the views of all those who manage and use data are taken into account. Thus, good documentation carries the two-fold importance of establishing standards and managing change.

Change management is the main driver of the health data quality strategy (DQS) that, together with the data quality dimensions and framework, comprise the overall Data Quality Improvement Programme at the New Zealand Ministry of Health.

The DQS uses the dimensions derived in the DQEF to set data quality standards for designing, developing, and maintaining the national health data collections and data management throughout New Zealand. A practical strategy has to consider the complexity of the health sector since the data, their structure and use, and the products, are potentially much more varied than are found in financial or manufacturing organisations.

The development of the strategy was informed throughout by several stakeholders. Ongoing communication with health sector groups, including the Ministry of Health staff was essential to ensure sector buy-in and to maintain input and interest in the strategy development and implementation. Full consultation with a wide range of data suppliers and users was also necessary. Finally, surveys and discussions with organisations outside of health, both within New Zealand and overseas, on the management of data quality and implementation of strategy help to inform the ongoing and iterative development of the strategy.

The standards established by the DQS aim to achieve the improved health outcomes by:

• Better decision making to target areas of improvement in national health data collections with a transparent prioritisation tool for data quality improvement projects

• Improved relationships with data suppliers, developing a whole-of-sector responsibility for data quality

• Improved awareness of a data quality culture throughout the sector

• Improved understanding of the processes involved in developing an information product

• The development of best practice guidelines leading to accreditation of data suppliers

• Education and support to data suppliers

• Minimum data quality requirements for existing and new national collections

• Minimum requirements for regular operational data quality initiatives across all national health collections and an approach to rooting out persistent quality errors

In particular, the last bullet point demands a systematic approach to data quality management such as total data quality management (TDQM) mentioned earlier in the topic.

Some approaches to data quality management target specific errors within a collection or an entire collection but often do not devise solutions to prevent systemic problems. In contrast, TDQM focuses on two key aspects of data management: the data flow processes that constitute the organisation’s business and the recognition that information is a product, rather than a by-product, of these processes (Wang, 1998). Regarding the processes themselves, TDQM seeks to ensure that none of them changes the initial meaning of the data leading to systematic errors and repeated data quality problems. Systematic process errors can be prevented by several means, some of which will depend upon the nature of the business unit and its data. However, we can identify generic prevention mechanisms across business units to include:

• The systematic and ongoing education of data suppliers

• Education within the Ministry of Health

• Regular reviews of recurrent data quality problems from suppliers and information feedback to suppliers on issues with support provided for improvement

• Internally developed data quality applications to reduce time spent on assessment of data quality problems (limited in use for complex health data)

• A continuous cycle of define, measure, analyse, and improve using the framework to inform the assessment process

Concerning the role of information as a product of process, TDQM draws attention to the need to manage information so that organisations:

• Know their customers/consumers of the information and their information needs

• Understand the relationship between technology and organisational culture

• Manage the entire life cycle of their information products

• Make managers accountable for managing their information processes and resulting products

TDQM at the Ministry of Health

The foundation of the New Zealand data quality strategy is the institutionalisation of TDQM principles by which data quality management processes are embedded in normal business practice and accepted at all levels in the sector. This installation is achieved by the “define, measure, analyse, improve” sequence, as described in the next four sections.

Define

The definition of what data quality means to health sector professionals is clearly a fundamental requirement and this was done through the development of the data quality dimensions and the data quality framework. The dimensions define what health care data collectors, custodians, and consumers consider important to measure.

A cross-organisational data quality group is used to facilitate the discussion of data quality issues from an organisation-wide perspective. The group includes representatives of business units that collect, manage, or use data, as well as clinicians and clinical coders. The strategy then provides for whole of sector input into the data quality dimensions considered important, where data quality improvement should be targeted, and how and where accountability should lie. Continual assessment of the needs of data customers is required and will be managed through a yearly postal survey of customers and discussions at forums and training sessions.

Measure

Active and regular measurement of data quality avoids a passive reliance on untested assumptions as to the perceived quality of the data. Reporting on data quality levels becomes transparent and justifiable. The regular measurement programme involves:

• Regular use of the data quality dimensions and evaluation framework on national collections

• Developing appropriate statistical process control (SPC) measures

• Developing data production maps for all major information products outlining flow of data around the organisation and the health sector and possible areas of data quality issues

• Checking the availability of data quality assessment software or the potential for in-house development

Measuring the quality of data supplied by organisations to the national collections involves performance measurement based on key performance indicators (KPIs) developed through SPC (Carey & Lloyd, 2001). SPC uses historical information to define the ranges of acceptable data so that outliers or variations identify areas for improvement that can be negotiated with data suppliers.

Data production maps for all major information products are another active management tool for improving data quality. These maps model the flow of data around the organisation and highlight potential quality issues by detecting points where data quality may be compromised. When properly developed and supported by senior management, they can be effectively applied to solve complex, cross-functional area problems (Wang, Lee, Pipino, & Strong, 1998). Where data cross from one organisation to another, significant data quality issues can arise, and this is evident within the New Zealand health care system. Maps can also be used to summarise issues for managers and provide them with a tool to understand the implications of data quality.

Analyse

The complexity of health care requires an extensive range of decisions, both administrative and health-related, and a single data element can substantiate many decisions and carry impacts across the sector. But what level of data quality do we need to make good decisions? The importance of data quality analysis is that it allows us to determine the appropriate quality levels and identify areas for improvement where current processes do not support good data quality work. Analysis can draw upon:

• Clinical analysis and data modelling: what cannot be done due to data quality

• Surveys of customer needs

• Consideration of current and future uses of data

• Realistic expectations vs. expectations of stakeholders

• Corporate knowledge: what the organisation already knows is important and/or relatively easy to fix

• International standards: does the organisation compare well with appropriate standards and are our expectations realistic?

• Resources available to make improvements

Improve

Actively finding data quality issues before they cause problems is made easier by the regular assessment of collections using the data quality measurements outlined previously. Prevention, rather than cure, is now a large part of the ministry’s work through the proactive approach outlined here:

• Regular minimum data quality initiatives for all collections with specific initiatives for single collections only

• Preventing poor data quality in new collections through a “data quality plan” developed in the early phases of projects with full involvement of the data quality team. Data quality is embedded in the system and processes of the new collection prior to going live

• Continuing “what works” as the data quality team have learnt a considerable amount about the operational requirements of maintaining data quality levels

• The endorsement of standards for the collection, management, storage, and use of specific data elements by the Health Information Standards Organisation; this way all stakeholders know and agree to all steps in the data flow process

• Stewardship and data access policies that provide clear ownership and access guidelines for the data

• The proposed development of a National Data Dictionary to address the standardisation of the major data elements found in the National Health Data Collections. This standardisation is likely to provide for considerable improvement to data quality through a nationally consistent understanding of data definitions

Through these initiatives it is expected that expensive, one-off projects can be avoided and money can be allocated for regular operational requirements that will enable an ongoing prevention programme. Projects may only improve processes and data in one collection, whereby regular prevention mechanisms help to ensure all data are of high quality.

TDQM in the Wider Health Sector

Some of the wider benefits of institutionalising TDQM through the strategy include:

• Getting everyone talking about data quality through the agreement of a strategic direction

• Raising the profile of data quality within health care organisations through organisation-wide data quality groups

• Getting the sector moving in the same direction through the development of organisation-wide data quality strategies that align with the national strategic direction

• Drawing on current practice/knowledge through best practice guidelines developed by the sector and widely disseminated

• Clear expectations of data suppliers through accreditation/KPIs/contracts

• Actively reviewing strategic direction regularly

A key issue raised by the data quality proposals described in this topic is how to identify and disseminate best practice and embed it in normal day-to-day operation. The approach favoured in New Zealand is accreditation rather than audit. Accreditation suggests devolvement, ownership, and a supporting role from the ministry. For example, the ministry requires District Health Boards to produce annual plans but the boards themselves will be responsible for addressing data quality issues within a national framework and getting them approved at the highest level of regional health care funding and provision.

The ministry, through the Quality Health New Zealand and/or the Royal Colleges health care provider’s accreditation process, will provide sector organisations with clear guidelines on how to achieve accreditation as good data suppliers. The accreditation process will be developed in consultation with the sector following the development and testing of best practice guidelines. Those organisations that have been able to implement extensive data quality programmes, as outlined previously, will be accredited as good data suppliers. This should lead to a reduction in the need for peer review and audit.

The proposed sector work programme requires that health care organisations can achieve accreditation if they:

• Take part in a sector peer review/audit process

• Meet KPIs

• Implement an in-house data quality education programme

• Develop and implement a local data quality improvement strategy

• Organise regular meetings with a cross-organisational data quality group

• Align with best practice guidelines (when developed)

CONCLUSION

This topic has described the development of a health data quality programme for the Ministry of Health in New Zealand. The ministry’s purpose is to realise the full value and potential of the data that it collects, stores, and manages. Building ” trust” in the data throughout the health sector will ensure that data are used frequently and to their greatest possible benefit. With the right framework and strategy, data that are highly utilised for a range of reasons will incrementally improve in quality. Extensive data mining, combining currently disparate collections, will also provide far more granular information and knowledge to improve these collections. A data quality strategy will provide coherent direction towards total data quality management through a continuous cycle of work. The improved data quality will then ensure that the health sector is better able to make informed and accurate decisions on health care policy and strategy.

These aims are relevant to national and regional health authorities and providers in many countries and the key principles, described here and developed from pioneering work by the Canadian Institute for Health information, can be deployed successfully in many health care and related scenarios.

In summary, for the development and application of the data quality evaluation framework, it is important to:

• Define the underpinning data quality criteria carefully involving all stakeholders to ensure common understanding and direction

• Consider the critical quality dimensions that reflect how the organisation uses data and how data flow throughout the business processes

• Document business processes identifying data sources and their reliability

• Appreciate that the framework requires practical supporting tools (e.g., documentation) to make it effective

• Customise the language of the data quality user manual with regard to the level and experience of the intended users

• Be aware of the importance of both education and training at all necessary stages and levels; training is essential to affect the culture change that must accompany the realisation of the importance of data quality

• Be aware that application of the framework is an iterative, ongoing process; the required outcomes cannot be achieved in a single pass

For the data quality improvement strategy, it is important to:

• Derive and impose standards that facilitate data and information transfer whilst preserving quality

• Reengineer the business processes to deliver the quality data needed for efficient service planning and the effective practice of integrated patient care

• Identify and disseminate best practice to reduce the development time needed to improve data quality

• Ensure data quality levels are not unnecessarily rigorous to maintain user ownership and workloads at reasonable levels

• Define user accountabilities for data quality and the mechanisms to enforce them

• Seek to embed the search for data quality in normal working practices and recognise its achievement in appropriate ways such as with accreditation

Work on extending and improving the health data quality programme continues, particularly with regard to the formulation of objective metrics for the underlying data quality criteria.

Next post:

Previous post: