Semantic Web Standards and Ontologies in the Medical Sciences and Healthcare

 

Abstract

This topic will discuss Semantic Web standards and ontologies in two areas: (1) the medical sciences field and (2) the healthcare industry. Semantic Web standards are important in the medical sciences since much of the medical research that is available needs an avenue to be shared across disparate computer systems. Ontologies can provide a basis for the searching of context-based medical research information so that it can be integrated and used as a foundation for future research. The healthcare industry will be examined specifically in its use of electronic health records (EHR), which need Semantic Web standards to be communicated across different EHR systems. The increased use of EHRs across healthcare organizations will also require ontologies to support context-sensitive searching of information, as well as creating context-based rules for appointments, procedures, and tests so that the quality of healthcare is improved. Literature in these areas has been combined in this topic to provide a general view of how Semantic Web standards and ontologies are used, and to give examples of applications in the areas of healthcare and the medical sciences.

Introduction

“One of the most challenging problems in the healthcare domain is providing interoperability among healthcare systems” (Bicer, Laleci, Do-gac, & Kabak, 2005). The importance of this interoperability is to enable universal forms of knowledge representation integrate heterogeneous information, answer complex queries, and pursue data integration and knowledge sharing in healthcare (Nardon & Moura, 2004). With the recent emergence of EHRs and the need to distribute medical information across organizations, the Semantic Web can allow advances in sharing such information across disparate systems by utilizing ontologies to create a uniform language and by using standards to allow interoperability in transmission. The purpose of this article is to provide an overview of how Semantic Web standards and ontologies are utilized in the medical sciences and healthcare fields. We examine the healthcare field as the inclusion of hospitals, physicians, and others who provide or collaborate in patient healthcare. The medical sciences field provides much of the research to support the care of patients, and their need lies in being able to share and find medical research being performed by their colleagues to build upon current work. Interoperability between these different healthcare structures is difficult and there needs to be a common “data medium” to exchange such heterogeneous data (Lee, Patel, Chun, & Geller, 2004).

Decision making in the medical field is often a shared and distributed process (Artemis, 2005). It has become apparent that the sharing of information in the medical sciences field has been prevented by three main problems: (1) uncommon exchange formats; (2) lack of syntactic operability; and (3) lack of semantic interoperability (Decker et al., 2000). Semantic Web applications can be applied to these problems. Berners-Lee, Hendler, and Lassila (2001), pioneers in the field of the Semantic Web, suggest that “the semantic web will bring structure to the meaningful content of web pages”. In this article published in Scientific American, they present a scenario in which someone can access the Web to retrieve information—to retrieve treatment, prescription, and provider information based on one query. For example, a query regarding a diagnosis of melanoma may provide results which suggest treatments, tests, and providers who accept the insurance plan with which one participates. This is the type of contextually based result that the Semantic Web can provide. The notion of ontologies can be utilized to regulate language, and standards can be used to provide a foundation for representing and transferring information. We will focus on the lack of semantic and syntactic interoperabilities in this article. The semantic interoperable concept will be utilized in the context of ontologies, and syntactic interoperabilities are referred to as standards of interoperability.

background

The Semantic Web is an emerging area of research and technology. Berners-Lee (1989) proposed to the Centre Europeen pour la Recherche Nuclaire (CERN) the concept of the World Wide Web. He has been a pioneer also in the concept of the Semantic Web and has expressed the interest of the healthcare field to integrate the silos of data that exist to enable better healthcare (Updegrove, 2005). He has been involved with the World Wide Web Consortium (W3C) Web site (http://www. w3.org ), which offers a vast array of Semantic Web information in a variety of subject areas, including the medical sciences and healthcare. Miller (2004) states that the Semantic Web should provide common data representation to ” facilitate integrating multiple sources to draw new conclusions;” and to “increase the utility of information by connecting it to its definitions and context”. Kishore, Sharman, and Ramesh (2004) wrote two articles which provide detailed information about ontologies and information systems.

The concept of the Semantic Web is to extend the current World Wide Web such that context and meaning is given to information (Gruetter & Eikemeier, 2004). Instead of information being produced for machines, information will be produced for human consumption (Berners-Lee et al., 2001). There are two main aspects of Semantic Web development: (1) ontologies for consistent terminology and (2) standards for interoperability.

ontologies

Ontologies have been defined in many ways through the areas of philosophy, sociology, and computer science. For the Semantic Web context, ontology is the vocabulary, terminology, and relationships of a topic area (Gomez-Perez, Fernandez-Lopez, & Corcho, 2004). Ontology gives the meaning and context to information found in Web resources (databases, etc.) for a specific domain of interest, using relationships between concepts (Singh, Iyer, & Salam, 2005). According to Pisnalli, Gangemi, Battaglia, and Catenacci (2004), ontologies should have:

1. Logical consistency and be expressed in a “logical language with an explicit formal semantics.

2. Semantic coverage such that it covers “all entities from its domain.”

3. Modeling precision and represent “only the intended models for its domain of interest.”

4. Strong modularity for the domain’s “conceptual space. . .by organizing the domain theories.”

5. Scalabilityso that the language is expressive of intended meanings.

The domain of an ontology should include a taxonomy of classes, objects, and their relations, as well as inference rules for associative power (Bern-ers-Lee et al., 2001). This shared understanding of the concepts and their relationships allows a means to integrate the knowledge between disparate healthcare and medical science systems. Much of the Semantic Web research in the medical sciences area has been specific in either generating more efficient and effective information searching or to the interoperability of the EHR. Health information is inherently very tacit and intuitive, and the terminology often implies information based on physical examinations and expressions of the patient. While it uses standardized terminology, the difficulty lies in the expression of this tacit knowledge to others, especially across a network of computers. The two great needs in the medical sciences and healthcare that can be fulfilled by Semantic Web are to standardize language and to provide a consistent foundation for transferring EHR information (Decker et al., 2000).

standards

While ontologies represent the conceptual basis for the information to be transmitted, standards allow for consistent transmission of the data between disparate systems. The data in different clinical information systems silos are in multiple formats, and relevant medical and healthcare knowledge must be accessible in a timely manner. This can be performed through interoperability standards which can enable information integration, “providing transparency for healthcare-related processes involving all entities within and between hospitals, as well as stakeholders such as pharmacies, insurance providers, healthcare providers, and clinical laboratories” (Singh et al., 2005, p. 30). The main standard for interoperability in the Semantic Web is Resource Description Framework (RDF), which is recommended by the W3C. RDF is an object-oriented based standard, which provides reusable components for data interchange over the web (Decker, Mitra, et al., 2000). It is unique in that every concept represented in RDF has a universal unique identifier (the Uniform Resource Identifier [URI]), which identifies every e-mail address, Web page, and other Web elements. This ensures no semantic ambiguity. RDF also enables knowledge representation through a series of concepts such as class, data type, and values. In order to express representations of ontologies for context, RDF allows for extensions such as the DARPA Agent Markup Language +Ontology Inference Layer (DAML+OIL) standard, which is the basis for the Web Ontology Language (OWL) standard that has recently gained popularity (Nardon & Moura, 2004).

Semantic web applied standards and ontologies in the medical sciences AND healthcare

“The semantic web initiative has resulted in a common framework that allows knowledge to be shared and reused across applications” (Health Level 7, 2004) and organizations. An infrastructure of common transmission standards and terminology will enable an interconnected network of systems that can deliver patient information. There have been various calls for the decrease of medical errors via utilization of information technology, and the increase of medical information accessibility and Semantic Web technology has a critical role to play. Besides the delivery of patient information, the Semantic Web can also assist medical sciences research in providing greater accessibility and the sharing of research. In the search for information, the Semantic Web can impart a context and meaning to information so that queries are more efficient in producing results more closely related to the search terms.

Table 1 displays only a few of the main standards currently used for interoperability in the Semantic Web. The affiliated organizations are listed, showing that there are many grassroots efforts involved in generating standards. There are three main organizations that are involved in international standards for EHRs. These include the International Organization for Standardization (ISO), Committee European Normalization (CEN), and Health Level 7 (HL7)—U.S. based (HL7, 2004). Standards are also important to develop on an international basis because countries also report national health status statistics to the world community (Cassidy, 2005).

A list of ontologies in the medical domain is listed in Table 2. For clarification, a logical association to an ontology is that of the ICD-9 (ICD-10 is the new version) coding for diseases. When a patient visits the physician, the physician records a standard ICD-9 code for the diagnosis of the patient and a CPT code for the procedure that was performed on a patient. These are standardized codes that are found in manuals for medical coders; and they allow insurance companies and other medical affiliates to understand information from many different sources. For example, if a patient is seen for a mole, the mole can have many particular qualities. Is it to be removed for cosmetic purposes, or is the mole potentially cancerous? The location of the mole will be important to know, as well, because the treatment may be determined by the location. The difference in the context may determine whether the insurance company will pay for the treatment of the mole. A cancerous melanoma on the nose would have the diagnosis code of 172.3 and a benign neoplasm would be coded as 238.2. If a tissue sample were taken so that the lab could test the mole for cancerous cells, the diagnosis would be 239.9, which is unspecified until the lab results return for a firm diagnosis. The CPT procedure code for the treatment would be applied and would be determined by a number of factors including the location of the mole, amount of tissue excised, whether a modifier needs to be added to the code if the services is charged with an office visit, and the type of excision utilized. While we have CPT and ICD-9 as a vocabulary for procedure and diagnosis codes, they function only as a part of ontology’s purpose. An ontology gives context to the patient’s medical history and allows the diagnosis and procedure to be automatically linked, possibly with appropriate medications, lab tests, and x-rays. The next section discusses ways that the Semantic Web has been applied in the medical sciences field.

Table 1. Sample standards for interoperability


Name

Purpose

Associated Organization

 

XML

extensible Markup Language; creation of tags

 

 

RDF

Standardized technology for metadata; for interpreting meanin^^^

W3C

 

Clinical Document Architecture CDA

Leading standard for clinical and administrative data exchange among organizations

HL7

 

Guidelines Interchange Format (GLIF)

specification for structured representation of guidelines

InterMed Collaboratory

 

CORBAmed

Provides interoperability among health care devices

Object Management Group

 

HL7

Messaging between disparate systems

HL7

 

Table 2. Sample ontologies

Niune

 

Associated Organization

 

OIL

Oil Interchange Language; representation and inference language

European Community (IBROW and On-To-Knowledge)

 

Ontology Web Language (OWL )

Aim is to be the Semantic Web standard for ontology representation

W3 Consortium

 

DAML

Extension of RDF which allows ontologies to be expressed; formed by DARPA Markup

DAML Researcher Group

 

Arden Syntax

Standard for medical knowledge representation

HL7

 

Riboweb Ontology

Facilitate models of ribosomal components and compare research results

Helix Group at Stanford Medical Informatics

 

Gene Ontology

To reveal information regarding the role of an organism’s gene products

GO Consortium

 

LinkBase

Represents medical terminology by algorithms in a formal domain ontology

L&C

 

GALEN

Uses GRAIL language to represent clinical terminology

OpenGALEN

 

ADL

Formal language for expressing business rules

openEHR

SNOWMED*

Reference terminology

SNOMED Int’l

 

LOINC (Logical

Database for universal names and codes for lab and clinical observations

Regenstrief Institute, Inc.

 

UMLS—Unifie d Medical

Language

Facilitates retrieval and integration of information from multiple sources; can be used as basic ontology for any medical

US National Library of Medicine

 

ICD-10*

Classification of diagnosis codes; is newer version after ICD-9

National Center for Health Statistics

 

CPT Codes*

Classification of procedure codes

American Medical Association

 

SEMANTIC WEB APPLICATIONS IN MEDICAL SCIENCE

Table 3 lists only a few of the sample projects being conducted in the medical science and healthcare field. Previous research in this area has dealt with two main topics: (1) efficient and effective searches of medical science information and (2) the interoperability of EHRs. Our purpose is to provide a comprehensive review of this research to understand the current status of the Semantic Web in healthcare and medical sciences and to determine what future research may be performed.

Electronic Health Records

EHRs are comprehensive patient medical records which show a continuity of care. They contain a patient’s complete medical history with information on each visit to a variety of healthcare providers, as well as medical tests and results, prescriptions, and other care histories. (Opposed to EHRs, Electronic medical records [EMRs] are typically those which reside with one physician.) Figure 1 shows the main stakeholders in the healthcare industry, and thus, the necessity for enabling these partners to communicate. Physician’s, hospitals, Independent Practice Organizations (IPOs), and pharmacies interact to exchange patient information for medical purposes.

The government requires that healthcare organizations report medical data for statistical analysis and so that the overall health of the nation can be assessed. Medical information is aggregated so that patient identifiers are omitted and reported to the government for public health purposes and to catch contagious outbreaks early as well as to determine current health issues and how they can be addressed. For example, cancer registries report specific aggregated cancer information, and healthcare organizations report instances of certain infectious diseases such as the Avian influenza (bird flu), for the welfare of the public. The importance of sharing this information is the improvement of patient safety, efficiency, self-health management (through access of medical information), and effective delivery of healthcare (HL7, 2005). Figure 2 shows how two entities may interact to share information (adapted from HL7).

Table 3. Sample medical Semantic Web projects

PROJECTS

 

 

Name

Purpose

Associated Organization

 

Good European Health Record Project

To produce a comprehensive multimedia data architecture for EHRs

CHIME

 

Brazilian National Health Card

Aimed at creating infrastructure for capture of encounter information at

 

 

Artemis

Semantic Web Service-based P2P Infrastructure for the Interoperability

Six

participating

 

Active Semantic Electronic Patient Record

Development of populated ontologies in the healthcare (specially cardiology practice) domain; an annotation tool for annotation of patient records, and decision support algorithms that support rule and ontology based checking/validation and evaluation.

LSDIS (large Scale

Distributed Information Systems and AHC (Athens Heart Center)

 

MedISeek

Allows users to describe, store, and retrieve medical images; metadata model

 

 

Active Semantic Electronic Patient Record

Development of populated ontologies in the healthcare (specially cardiology practice) domain; an annotation tool for annotation of patient records, and decision support algorithms that support rule and ontology based checking/validation and evaluation.

Figure 1. The coordination of the healthcare industry is very diverse in its information needs

 The coordination of the healthcare industry is very diverse in its information needs

Figure 2. The sharing of information between healthcare entities can enable more efficient and effective quality of care

 

The sharing of information between healthcare entities can enable more efficient and effective quality of care  

Indeed, a commission on systemic interoperability has been established through the Medicare Modernization Act of 2003 and recommends product certification, interoperable standards, and standard vocabulary as a way of ensuring that healthcare data is readily accessible (Vijayan, 2005). At a North Carolina Healthcare Information Communications Alliance, one recurring theme was that of interoperable EHRs. Brailler (2005), the first National Health Information Technology Coordinator in the U.S., spoke about standards harmonization for EHRs. The discussion of developing standards for interoperability emphasized the need to “stitch together different efforts” put forth by organizations such as HL7, IEEE, ISO, and SNOMED. Undoubtedly, he recognized that “standards are about economic power” and they need to be analyzed to determine which standards are available for the commercial market. In doing so, the office of National Health Information Technology suggests that there be a compliance certification for EHR based on criteria such as security, interoperability, and clinical standards—basically a seal of approval that if a healthcare organization purchases such a product, it will be “guaranteed” to have specific interoperability certification. Brailler stated “if it’s not certified, it’s not an EHR.” Given this, it has been suggested that the second generation of EHRs is being developed to communicate with structured datasets, middleware, and messaging between systems (Bernstein, Bruun-Rasmussen, Vingtoft, Andersen, & Nohr, 2005). Perhaps the third generation will provide full scale Semantic Web capabilities in which interoperability is seamless.

Currently, patient information is kept in silos across the aforementioned organizations; the Semantic Web will enable access to these silos through interoperability standards and consistent language. According to a white paper published by HL7 (2004), an organization which has developed HL7 standards for healthcare, improvements in the following five areas can be made through EHR standards: (1) interoperability, (2) safety/security, (3) quality/reliability, (4) efficiency/effectiveness, and (5) communication. To improve these areas, the standards proposed by HL7 include both standardized service interface models for interoperability, but also standardized concept models and terminologies. The current use of the HL7 standard is for the messaging of data to populate other disparate systems. For example, admissions data of a patient is also sent to the billing system. The problem with current messaging systems, such as HL7, is that they duplicate information across systems. Patient demographic information, for example, can be copied from one system to another, and maintenance of such data can create more messaging between systems (usually within an organization).

In Denmark, the examination of EHR use and interoperability has also been an issue of interest (Bernstein et al., 2005). The Danish Health IT Strategy project’s goal is to analyze the variety of grassroots models for EHR information modeling and informatics. The National Board of Health is currently analyzing the SNOWMED ontology for use in its EHR. SNOWMED is an ontology that encapsulates classification systems such as ICD9. As a reference terminology, it is much more detailed in the medical concepts that it conveys. This level of detailed information allows the data to be used for quality assurance and resource utilization purposes and allows the EHR to relay more information than ICD9 coding for diagnoses. For example, there are around 13,000 ICD9 codes for diagnoses and SNOWMED contains 365,000 codes (Cassidy, 2005). Similar to the Denmark project, the Artemis project focuses on developing Semantic Web technology such as ontologies as a foundation to interoperability for medical records. Rather than standardizing the actual documents in the EHR, the goal is to standardize the accessibility of the records through wrappers, Web Services Description Language (WSDL) and Simple Object Access Protocol (SOAP) (Artemis, 2005). Bicer et al. (2005) discuss a project with Artemis in which OWL ontologies are used to map information messages from one entity to another.

Partners Healthcare uses RDF to enable medical history from EHRs to be accessible through computer models which select patients for clinical trials (Salamone, 2005). They utilized Semantic Web Rules Language (SWRL) to write decision support rules for this purpose. The advantage in using the Semantic Web approach is that the coding is concise, flexible, and works well with large databases. As Eric Neumann of the pharmaceutical company, Sanofi-Aventis suggests, “with the semantic web, you publish meaning, not just data” (Salamone, 2005).

Information searching and sharing

“Ontologies can enhance the functioning of the Web in many ways. They can be used in a simple fashion to improve the accuracy of Web searches” (Berners-Lee et al., 2001). The difficulties and complexities of searching for medical information are discussed by Pisnalli et al. (2004) in their research on medical polysemy. Because polysemy (a word having more than one meaning) can be critical to finding correct medical information, the application of ontologies can be of value in information searching. For example, the ontology of the term inflammation can vary depending on the context of its use. As Pisnalli et al. state, inflammation can include the size, shape, evolution, severity, and source. When one searches for the term inflammation, many results may be provided, but time is required to sort through the “hits” for relevance. The ON-9 ontology is utilized by Pisnalli et al. to map contexts for the term inflammation. As Nardon and Moura (2004) emphasize, the relationships among medical terminology is also essential to representation of the information in a logical format. Allowing for specific context to be interpreted through ontologies will enable more efficient and effective searching. Usually, this involves the creation of metadata to identify the relevant data elements and their relationships (Buttler et al., 2002).

Medical vocabularies used to represent data include the Unified Medical Language System (UMLS) from the U.S. National Library of Medicine and Arden Syntax. UMLS is perhaps the most frequently used ontology in the healthcare and medical sciences field. The purpose is to aid in integrating information from multiple biomedical information sources and enabling efficient and effective retrieval. It defines relationships between vocabularies and includes a categorization of concepts as well as the relationships among them. For example, the National Health Card System in Brazil contains an extensive knowledge base of 8 million patients in which complex queries can be run (Nardon & Moura, 2004). Through ontologies and UMLS, mapping of business rules can be applied to medical transactions to infer information and achieve semantic interoperability. For example, if a patient can undergo only a certain procedure once within a 30-day time period, a transaction for a patient setting up an appointment for that procedure can be mapped to business rules to infer that the same person cannot schedule the same procedure within that time period. UMLS would determine the ontology for the appointment and procedures and ensure that the patient is indeed the same, and RDF defines the business rules for sharing the information (Nardon & Moura, 2004).

When querying multiple medical data sources for research purposes, there are many medical science repositories in which data may not be in machine-processable format and stored in non-standard ways. Most of the interfaces to search and retrieve medical sciences research require human interaction. Data extraction of such large data sources can be very complex and often the data is reused by researchers such as those in Genomics (Buttler et al., 2002). Large databases containing bioinformatics research can be unified through ontologies such as Riboweb, Generic

Human Disease Ontology, Gene Ontology (GO), TAMBIS, and LinkBase. These allow a standard vocabulary to exist over disparate ribosomal, disease, gene product, nucleic acid, and protein resources. As an example, the Generic Human Disease Ontology, currently being developed with information from the Mayo Clinic, allows a physician to search by symptom to determine the disease or for type of appropriate treatment, and researchers can search for possible causes of a disorder (Hadzic & Chang, 2005).

MedISeek is an interesting example of using semantic vocabularies to search for medical visual information, such as x-rays and other images (Carro et al., 2003). Biomedical Imaging Research Network (BIRN), a project of the National Institute of Health, examines human neurological disorders and their association with animal models. A significant aspect oftheir work is through brain imaging. Their goal is to make this information available to others through the Semantic Web via graphical search tools; standard identifiers through ontologies; and cross-referencing of imaging (Halle & Kikinis, 2004). The Semantic Web will enable BIRN, MedISeek, and other healthcare and medical science projects to filter out less appropriate data by searching for a context to the information. RDF is being utilized with MedISeek and BIRN to allow interoperability between metadata patterns.

Conclusion and future trends

Sharing of EHR information allows for improved quality of care for patients. Sharing medical science knowledge allows scientists to gather information and avoid redundant experiments. Searching for medical science information on the Semantic Web will be made more efficient and effective by the use of common ontologies and standards for transmissions. “Trusted databases exist, but their schemas are often poorly or not documented for outsiders, and explicit agreement about their contents is therefore rare.” The opportunity to share such large amounts of information through the Semantic Web suggests that knowledge management can exist on a comprehensive level with ontology as a unifying resource (Hadzic & Chang, 2005).

While there has been some research in the area of medical sciences information searching on the Semantic Web, there have been few studies on how to better enable healthcare consumers to search for medical information on the Web. Lay terminology of consumers often increases the number of results returned when searching for medical information on the Web. Polysemy creates a multitude of results within which the consumer must further search. The goal should be to use Semantic Web technology to minimize the semantic distance between a search term and its polysemy of translations (Lorence & Spinks, 2004).

The future of the Semantic Web will involve important developments in the emergence of e-healthcare through the use of intelligent agents. Singh et al. (2005) suggest that emerging Semantic Web-based technologies offer means to allow seamless and transparent flow of semantically enriched information through ontologies, knowledge representation, and intelligent agents. Intelligent agents can enrich the information by interpretation on behalf of the user to perform an automated function. The example given at the beginning of this article in which someone queries for melanoma information and receives information regarding treatments, tests, and providers in that person’s location which accept his insurance, shows how intelligent agents can be utilized to search the Semantic Web. Agents can also be utilized to verify the source of the information. When sharing of information occurs across the Web and is pulled automatically by agents, the source of the information needs to be verified. This is especially true in healthcare with Health Insurance Portability and Accountability Act (HIPAA) 1996 regulations. If the foundation of ontology and interoperable standards exists, intelligent agents will be able to search the Web for information within the context desired.

Legal issues associated with the dispersion of healthcare information need to be identified. With HIPAA (1996)), healthcare organizations are required to keep patient personally identifiable information secure and private. This means encryption, access control, audit trails, and data integrity must be insured in the transmission process (Jagannathan, 2001). Who has rights to the data and who “owns the data,” particularly in EHRs? Similarly, there is an issue of trust involved with sharing medical science and healthcare data, and this is an area ripe for further research. How can authentication be provided so that others know the source of data is trusted and how can it be ensured that the data will be edited by a trusted entity? The area of e-commerce can be a foundation for future research in trust, as well.

Semantic Web technology can function as a foundation for the sharing and searching of information for the healthcare and medical sciences fields. Because of the intuitive nature of patient care, the Semantic Web will enable context and meaning to be applied to medical information, as well as the conveyance of relationships between data. With the generation of standards for transmission of data between disparate systems, the quality of healthcare through better research and the sharing of information between healthcare providers will be a critical step in the evolution of patient care. This will enable the third generation of EHRs to be seamlessly interoperable for more efficient and effective patient care. These innovations can lead to improved work satisfaction, patient satisfaction, and patient care (Eysenbach, 2003).

Next post:

Previous post: