Speech Recognition - Important Concepts in Signal Processing, Image Processing and Data Compression

Digital Signal Processing Reference

In-Depth Information

The first speech recognizer appeared in 1952 and consisted of a device for the recognition

of single spoken digits Another early device was the IBM Shoebox, exhibited at the 1964

New York World's Fair.

One of the most notable domains for the commercial application of speech recognition in

the United States has been health care and in particular the work of the medical

transcriptionist (MT). According to industry experts, at its inception, speech recognition

(SR) was sold as a way to completely eliminate transcription rather than make the

transcription process more efficient, hence it was not accepted. It was also the case that

SR at that time was often technically deficient. Additionally, to be used effectively, it

required changes to the ways physicians worked and documented clinical encounters,

which many if not all were reluctant to do. The biggest limitation to speech recognition

automating transcription, however, is seen as the software. The nature of narrative

dictation is highly interpretive and often requires judgment that may be provided by a real

human but not yet by an automated system. Another limitation has been the extensive

amount of time required by the user and/or system provider to train the software.

A distinction in ASR is often made between "artificial syntax systems" which are usually

domain-specific and "natural language processing" which is usually language-specific.

Each of these types of application presents its own particular goals and challenges.

Applications

Health care

In the health care domain, even in the wake of improving speech recognition

technologies, medical transcriptionists (MTs) have not yet become obsolete. The services

provided may be redistributed rather than replaced.

Speech recognition can be implemented in front-end or back-end of the medical

documentation process.

Front-End SR is where the provider dictates into a speech-recognition engine, the

recognized words are displayed right after they are spoken, and the dictator is responsible

for editing and signing off on the document. It never goes through an MT/editor.

Back-End SR or Deferred SR is where the provider dictates into a digital dictation

system, and the voice is routed through a speech-recognition machine and the recognized

draft document is routed along with the original voice file to the MT/editor, who edits the

draft and finalizes the report. Deferred SR is being widely used in the industry currently.

Many Electronic Medical Records (EMR) applications can be more effective and may be

performed more easily when deployed in conjunction with a speech-recognition engine.

Searches, queries, and form filling may all be faster to perform by voice than by using a

keyboard.

Search WWH ::

Custom Search

Home