Integrative Document and Content Management Solutions

INTRODUCTION

Developments in office automation, which provided multiple end-user authoring applications at the computer desktop, heralded a rapid growth in the production of digital documents and introduced the requirement to manage capture and organization of digital documents, including images. The process of capturing digital documents in managed repositories included metadata to support access and retrieval subsequent to document production (D’Alleyrand, 1989; Ricks, Swafford & Gow, 1992).
The imperatives of documentary support for workflow in enterprises, along with widespread adoption of Web-oriented software on intranets and the Internet World Wide Web (WWW), has given rise to systems that manage the creation, access, routing, and storage of documents, in a more seamless manner for Web presentation. These content management systems are progressively employing document management features such as metadata creation, version control, and renditions (Megill & Schantz, 1999; Wiggins, 2000), along with features for management of content production such as authoring and authorization for internal distribution and publishing (Addey et al., 2002; Boiko, 2002; Hackos, 2002; Nakano, 2002).
If business applications are designed taking into account document and Web content management as integral constructs of enterprise information architecture, then the context of these solutions may be an integrative document and content management (IDCM) model (Asprey & Middleton, 2003). As the name implies, the IDCM model aspires to combine the features of a document management system with the functionality of Web content management. An integrative business and technology framework manages designated documents and their content throughout the continuum of their existence and supports record-keeping requirements.
The IDCM model supports system capabilities for managing digital and physical documents, e-mail, engineering and technical drawings, document images, multimedia, and Web content. These systems may be deployed individually to address a specific requirement. However, due to the volume and varied formats of important documents held in digital format, these systems are often deployed collectively based on a strategic IDCM approach for better managing information assets. An organizational approach to IDCM supports enterprise knowledge strategies by providing the capability to capture, search, and retrieve documented information.


SCOPE

IDCM depends upon effective integration of organizational systems that together are used for managing both digital and physical document types. The scope of this management is across all stages of document lifecycles. It includes provision for distribution of the document content over intranets and the Internet.
Features of enabling IDCM technologies are described in the following section. The technologies may be differentiated into those with core capabilities and supporting technologies.
Core capabilities are: document management; e-mail management; drawing management; document imaging; Web content management; enterprise report management; and workflow. Supporting technologies include: Web services; database management systems; digital signatures; portals; universal interfaces; and network management.
Significant issues that need to be addressed with respect to IDCM solutions include the provision of seamless functionality that may be employed across different capabilities so that currency, integrity, and authority are managed effectively. These in turn must be complemented by user interfaces that provide stylistic consistency and that are augmented by metadata that enhances retrieval capabilities through the supporting technologies.
The following section itemizes the types of features that are required.

SYSTEM FEATURES—CORE TECHNOLOGIES

Document Management

An encompassing approach to document management sees documents within a framework that supports integrity, security, authority, and audit, and that are being managed so that effective descriptions of them are used to support access, presentation, and disposal (Bielawski & Boyle, 1997; Wilkinson et al., 1998). In this context document management applications implement management controls over digital and physical documents. The general capabilities of a document management application are:
• Document production and capture—interface with common office productivity software.
• Classification—support business classification schemes (e.g., folder structures, document properties).
• Metadata—capture of properties that describe document.
• Check-in/checkout—maintain document integrity during editing.
• Version control—increment versions of document to support integrity.
• Complex relationships—manage links and embedded content within digital documents.
• Security—implement user/group access permission rights over documents.
• Document lifecycles—manage the transition of document states through pre-defined lifecycles.
• Integrated workflow to automate review and approvals; controlled distribution of documents.
• Search and information retrieval—search metadata or text within documents, or both.
• Viewing—view documents in native application or using integrated viewer.
These should be associated with recordkeeping features such as disposal scheduling and archiving.

E-MAIL MANAGEMENT

The growth in e-mail has brought a high demand for solutions that allow enterprises to manage e-mails that have value to the business. The IDCM model offers two types of capabilities:
• Direct capture—These applications are often referred to as e-mail archiving applications.
• End-user capture—These capabilities are typically offered as a module within document management systems.
Direct capture or archiving facilities intercept incoming and outgoing e-mail. They operate by taking a copy of incoming and outgoing messages that are managed by the e-mail messaging system, and use customized business rules to extract e-mail that may not have a business context. Unwanted e-mail such as spam, or that received from news lists or information bulletins, can be eliminated.
These systems may feature auto-categorization based on metadata such as that contained in e-mail message headers, and possibly also within attachments. Categorization can also occur using the content and context of email by applying techniques such as learning by example from previously processed e-mail. These types of solutions might be valuable for capturing statistical information differentiated by the types of requests made by customers. For example, statistics can aid call centers to monitor turnaround timeframes for responding to e-mail requests, or undertake trend analysis.
Search options include the capabilities to search messages and text attachments. Depending on the capabilities of the system, searches might be invoked from an email client, desktop client application, or Web browser.
Some systems are able to apply rules defined in disposal authorities so that e-mails are purged from the system within a legal framework. In some cases, different retention schedules can be applied to specific categories of e-mail.
End-user capture facilities are adopted by some enterprises to save relevant sent and received e-mails that evidence business transactions into an e-mail management repository, such as a document management system, leaving it up to the user to identify e-mails that need to be saved according to organizational guidelines.
The document management system would need then to integrate effectively with the existing enterprise e-mail client software. This capability would enable end-users to save e-mails and/or attachments to the managed repository, automatically derive metadata from the header of email messages, add custom metadata, and store the e-mail and attachment/s (where appropriate) as a digital record.

DRAWING MANAGEMENT

Many systems for registering or managing drawings have been developed independently of more generic approaches to document management. They may include information systems that enable users to register or index physical drawings in a database, along with generation of transmittals for issue of new documents, and management of the distribution of revisions to drawings and technical documents.
A drawing management system may be differentiated from a registry system in that the software implements automated management controls over the digital drawing objects maintained within a vault-like repository. This capability evolved to support the capture and management of drawings created by Computer Aided Design (CAD) packages. Functionality should include base capabilities such as:
• Integration with CAD tools for capturing electronic drawings.
• Automated features for drawing revision control and revision numbering.
• Management of electronic and hardcopy drawings, technical specifications, and manuals.
• Management of parent-child relationships between multiple drawings.
• Registration and tracking of physical copies of controlled drawings.
• Management of incoming and outgoing transmittals.
• Electronic document review and authorization using integrated workflow.
• Provision of viewing, red line, markup, and annotation functions.
• Maintenance of history logs and audit trails.
Extended capabilities may include automation of drawing numbering, synchronization of digital and physical drawing objects, synchronization of title block and metadata registration and updates, and management of drawing status during engineering change lifecycle transitions.
Drawing management capabilities may be provided by a dedicated drawing management system; as unified functionality within a document management system; as an inbuilt module of an Enterprise Resource Planning (ERP) system, maintenance management system, or similar; or an integral component of a document management application.

DOCUMENT IMAGING

Imaging systems have evolved from the principles of film-based imaging and may now be characterized in two groups for document imaging, these being (a) film-based imaging (micrographics) and (b) digital imaging systems.
In film-based imaging, micrographics technology is used to capture images of physical documents on microfilm, so that the images may subsequently be viewed using a reader, and printed if required. In digital imaging, images of physical documents are captured in a digital file format, with subsequent viewing or printing from the image format.
Digital imaging systems may be differentiated as desktop (ad hoc scanning), workgroup (shared tools in network), or production (high volume, diverse type). IDCM normally implies a workgroup or production environment. Capabilities offered include image manipulation functions such as:
• Capture of hardcopy documents into digital format.
• Capability for scanning and conversion of different sizes, sides (duplex scanning), physical orientation, and physical structure of documents.
• Managing multi-page images as a single entity (e.g., multi-page TIF file).
• Images may be saved to specified file formats. For example, a document might be saved in PDF or JPG for publishing on a Web server, or as a multi-page TIF for viewing/transmission.
• Support for a range of resolution, contrast, threshold, and size settings to meet the diverse requirements of document capture.
• Capture of color and/or grayscale images to suit forms processing and other applications (e.g., colored contour maps).
• Despeckling/deskewing and border removal.
• Multi-level registration capabilities, including batch-, folder-, envelope-, and document-level indexing.
Imaging systems are often integrated with recognition systems to facilitate capture and retrieval. These include technologies for automatically capturing data encoded in barcodes, integration with optical character recognition (OCR and ICR) technologies to enable text information to be extracted from scanned images, and integration with optical mark recognition (OMR).

WEB CONTENT MANAGEMENT

IDCM has the capability to provide a managed environment for the processes associated with publishing Web content. It has been said that a content management system is a concept rather than a product (Browning & Lowndes, 2001). This adds weight to analysis of it within the context of an IDCM model, where documents and their content may be considered more broadly than in terms of Web presentation. Document creation, management, and utilization can thus be undertaken with reference to business requirements and workflow of business processes.
Typically, functionality is characterized in terms of content creation, presentation, and management (Arnold, 2003; Robertson, 2003). Increasingly this functionality is seen to be employed within a unified content strategy for an enterprise (Rockley, Kostur & Manning, 2003).
Content creation functionality includes separation of presentation and content, utilization of elements of documents such as illustrations in different contexts, and continuation of associations between pages after restructuring. Metadata support should also be available, and markup should be transparent to the content creator.
Presentation elements include multiple formats for distribution of internal material such as manuals and business forms over intranets, and for external material such as marketing information and application forms. Other features expected are template availability through style sheets, integration of multiple formats as compound documents, provision of alternative renditions, and personalization of display according to user profiles.
Management features include version control and integrity maintenance among multiple users, and associated security procedures and audit trails. Managed interfaces to other subsystems should provide for dynamic provision of content to pages so that current data can be presented in validated form within compound documents. There should also be utilization of workflow for accommodating distributed users, content review, and approval processes.
These capabilities are shared at least in part with other IDCM systems’ functionality. The IDCM environment has the capability to manage Web content within a continuum that includes initial document creation processes, potentially in a distributed environment, through to managed archiving of content.

ENTERPRISE REPORT MANAGEMENT (COLD)

As digital media have been developed, businesses with high volumes of management information reporting have made increasing use of Enterprise Report Management (ERM). These capabilities enable organizations to capture reports from business application databases and store them in a managed repository, to reduce printing, improve information accessibility, and maintain records.
Technologies that provide support for ERM include output reports in a range of formats. Examples of these include text-based digital format (e.g., XML) that is stored and searched via a document management application, and image format such as TIF or JPG that can be captured and accessed via a document management or imaging application. Reports may be captured on optical disk, using a capability known as Computer Output Laser Disk (COLD), which stores digital reports and enables data to be represented with graphical overlays to facilitate interactive communications.
General ERM capabilities are defined as follows:
• Capture of digital report objects to managed repository (document management, imaging, or COLD application).
• Utilization of indexing capabilities for capturing metadata relevant to the report.
• Support for inquiry and retrieval of metadata or report contents (where applicable).
• Management of database growth to support performance.
• Support for repository that can include different data objects.
• Support for document integrity.
• Control of processes through workflow.
• Provision of extraction and use of parts of reports.
• Support for high-volume printing.
• Management of security—user authentication, group and user levels.

WORKFLOW

Workflow management systems are designed to automate and implement controls over a diverse range of business processes, from the initiation of a process through to execution of all tasks, and process closure. The need for transparent interfaces between the workflow management system and IDCM is vital to maintain the integrity of documents or Web content files during their transition through a workflow process.
There are a number of technology options for enterprises that are seeking a workflow management capability. The most suitable workflow engine will depend on the nature and complexity of the requirement and the functionality supported by the workflow technology options. Options for workflow within the context of IDCM are:
• Messaging/collaboration systems/workflow: IDCM should support ad hoc and cooperative review and production of documents and reports, and it may be desirable to support integration with electronic forms for recordkeeping purposes.
• Embedded workflow: This capability is offered in systems such as document and Web content applications, or in application suites such as ERP systems. The host application provides inbuilt workflow for facilitating document-centric or process-centric modules.
• Autonomous workflow: These types are functional without any additional application software, with the exception of database and message queuing. When used in the context of IDCM, the functionality may support automation of document or Web content review and approval processes.

IDCM ARCHITECTURE

The IDCM system model should feature scalable, flexible, and extensible applications that integrate with the enterprise information architecture, enabling the system to grow with the organization and facilitate knowledge sharing. Some architectural scenarios include:
• Scalability, flexibility, extensibility.
• Intuitive interface, preferably Web based, to facilitate usability, software upgrades, and support.
• Integrate with heterogeneous operating environments (where required).
• Implement three-tier (or “n-tier”) client server architecture to facilitate Web client functionality and usability.
• Support distributed computing environment (databases, document/Web content repositories, replication services).
• Support mobile workers—access from remote sites, limited bandwidth.
• Integration with enterprise backup/recovery and business continuity regimes.

REASONS FOR UTILIZING IDCM SOLUTIONS

The business justification for implementing IDCM solutions ranges widely with respect to policy, compliance, and economics. For example, policy initiatives may include support for customer service initiatives, knowledge management (Laugero & Globe 2002), or risk reduction in relation to brand damage. Compliance with legislative requirements may include administrative requirements that support privacy and freedom of information legislation. Economic justification may include support for timely delivery of product to market, reduction in operational costs, continuous process improvement initiatives, and profit maximization strategies.

CRITICAL ISSUES OF DOCUMENT AND CONTENT TECHNOLOGIES

There are critical issues that must be managed, and it is imperative that organizations undertake risk analysis in order to identify risks and develop strategies to mitigate them. Table 1 summarizes some of the critical issues.

Table 1. A summary of critical issues of IDCM technologies

Business Issues Technology Issues
Executive management lacks resolve
Lack of executive management engagement, both at the start and during the project, may impair outcomes.
Inadequate infrastructure
Client, server, and network architecture needs to be adequate to optimize performance.
Inadequate planning
Poor definition of scope and inadequate product and project lifecycle management may impair outcomes.
Incorrect technology application
Inadequate business definition and failure to examine solution options results in implementation of inappropriate technology solution.
Inadequate specifications
Lack of analysis and determination of requirements leads to project complications and inhibits extensibility of applications across enterprise.
Integrity of metadata
Metadata can be abused in non-validated fields, which may create significant retrievability issues.
Mandated use may cause rejection
Document management is often mandated, without appropriate consultation and analysis.
Security
IDCM solutions contain vital documents and content files, and security should reflect importance.
Lack of process integration
Mandated use is often accompanied by failure to integrate capabilities with existing processes, often meaning duplication of effort.
System incompatibilities
Lack of proven integration capabilities may impact delivery document/content management enabled end-to-end business solutions.

CONCLUSION

The IDCM model supports a business value proposition that aligns enabling systems and technology with an enterprise’s strategic, tactical, and operational planning imperatives. The IDCM system architecture provides a range of enabling applications and technologies that support end-to-end business process improvement initiatives and provide a key foundation for knowledge management strategies.

KEY TERMS

Content Management: Implementation of a managed repository for digital assets such as documents, fragments of documents, images, and multimedia that are published to intranet and Internet WWW sites.
Document Capture: Registration of an object into a document, image, or content repository.
Document Imaging: Scanning and conversion of hardcopy documents to either analogue (film) or digital image format.
Document Management: Implements repository management controls over digital documents via integration with standard desktop authoring tools (word processing, spreadsheets, and other tools) and document library functionality. Registers and tracks physical documents.
Drawing Management System: Implements repository management controls over digital drawings by integration with CAD authoring tools and using document library functionality. Registers and tracks physical drawings.
E-Mail Management: Implements management controls over e-mail and attachments. These controls may be implemented by direct capture (e-mail archiving software) or invoked by the end-user in a document management application.
Recognition Technologies: Technologies such as barcode recognition, optical character recognition (OCR), intelligent character recognition (ICR), and optical mark recognition (OMR) that facilitate document registration and retrieval.
Workflow Software: Tools that deal with the automation of business processes in a managed environment.

Next post:

Previous post: