While many of the tools for searching and storing data are effective, and accurate, when it comes to audio, no such level of accuracy or ease yet exists for the purpose of searching for specific information. There are currently three means of searching audio: phonetic search, transcribing by hand, and automatic transcription.
Phonetic search technology matches wave patterns, or phonemes, to a library of known wave patterns. For example, the acronym “B2B” would be represented by the following phonemes: “_B _IY _T _UW _B _IY” (Wikipedia example from Nexidia, a company involved in speech recognition systems). Given the wide variation in modes of speaking, pronunciation, accents and dialects, the accuracy of this method is spotty. It produces many false hits. And while it may identify sections and phrases that are of interest, it doesn’t transcribe the audio into text – the audio must then be listened to.
Manual transcription of audio so that transcribed text can then be automatically searched, is time-consuming. As it depends upon a listener to type the words as they are heard, this labor-intensive task can also be very expensive. There may be security concerns, as the audio goes outside the company (or perhaps the country) to be transcribed.
Machine transcription is the one automated means of converting audio to text. But it suffers from accuracy issues. It compares “heard” audio with known libraries, again facing issues of differing pronunciations, terms not in existing libraries, and clarity of recording. While high-quality recordings can lend themselves to recognition rates of 85% or so (a positive-looking number until compared with the nearly 100% accuracy of pure text searches), when dealing with voice mail, accuracy dips down as low as 40%.
The new Federal Rules of Civil Procedure (FRCP) require companies to have a means of identifying key communications and data sources. That data must then be saved. For the sake of efficiency, both in the optimizing amount of storage required, and diminishing the volume of data that must be identified and produced for litigation, it is also important to be able to accurately identify data that is unnecessary.
While requirements for retention of data increase, and storage costs go down, identifying what audio should be kept and what should be deleted can be costly. As such information is digitized, it must nonetheless be stored and indexed (or searched after the fact). The technology is not mature, and is evolving. There may be an opening for an innovative company to prosper here, especially if able to produce some kind of breakthrough in voice-to-text technology. In the meanwhile, companies face a difficult issue in deciding what stays and what goes.
via Audio Files Present Challenges For Computer Forensics and E-Discovery.