Digital Signal Processing Reference
In-Depth Information
The first speech recognizer appeared in 1952 and consisted of a device for the recognition
of single spoken digits Another early device was the IBM Shoebox, exhibited at the 1964
New York World's Fair.
One of the most notable domains for the commercial application of speech recognition in
the United States has been health care and in particular the work of the medical
transcriptionist (MT). According to industry experts, at its inception, speech recognition
(SR) was sold as a way to completely eliminate transcription rather than make the
transcription process more efficient, hence it was not accepted. It was also the case that
SR at that time was often technically deficient. Additionally, to be used effectively, it
required changes to the ways physicians worked and documented clinical encounters,
which many if not all were reluctant to do. The biggest limitation to speech recognition
automating transcription, however, is seen as the software. The nature of narrative
dictation is highly interpretive and often requires judgment that may be provided by a real
human but not yet by an automated system. Another limitation has been the extensive
amount of time required by the user and/or system provider to train the software.
A distinction in ASR is often made between "artificial syntax systems" which are usually
domain-specific and "natural language processing" which is usually language-specific.
Each of these types of application presents its own particular goals and challenges.
Applications
Health care
In the health care domain, even in the wake of improving speech recognition
technologies, medical transcriptionists (MTs) have not yet become obsolete. The services
provided may be redistributed rather than replaced.
Speech recognition can be implemented in front-end or back-end of the medical
documentation process.
Front-End SR is where the provider dictates into a speech-recognition engine, the
recognized words are displayed right after they are spoken, and the dictator is responsible
for editing and signing off on the document. It never goes through an MT/editor.
Back-End SR or Deferred SR is where the provider dictates into a digital dictation
system, and the voice is routed through a speech-recognition machine and the recognized
draft document is routed along with the original voice file to the MT/editor, who edits the
draft and finalizes the report. Deferred SR is being widely used in the industry currently.
Many Electronic Medical Records (EMR) applications can be more effective and may be
performed more easily when deployed in conjunction with a speech-recognition engine.
Searches, queries, and form filling may all be faster to perform by voice than by using a
keyboard.
Search WWH ::




Custom Search