“Personal Toolkit”, Capturing complex conversations

This article appears in the issue February 2005 [Volume 14, Issue 2]

In the past, I have talked about the value of carrying digital recorders for capturing spontaneous ideas and conversations, as well as dictating text or documenting interviews and conferences (see "Personal Toolkit" March 2002).

I've mentioned that recorders from Olympus and a few other vendors offer a compact Digital Speech Standard (DSS) format that captures almost 45 hours of sound on a 128-MB memory card at fairly decent levels of quality.

However, among the latest models from Olympus are three units that add stereo recording to their options: the DM-10, DM-20 and DS-2200. In addition to the DSS format, those three will all record both mono and stereo in the ubiquitous Window Media format WMA. I learned a long time ago that stereo recording could make transcription much easier because the voices could be separated into two channels. Tests with the DS-2200 clearly demonstrated stereo's advantages.

Specifications for all three are mostly similar, except that the DMs have fixed internal memory (64 and 128 MB), while the DS takes removable xD-Picture cards up to 512 MB. Both the DM-20 and DS-2200 list for $289 and include a remote control with a built-in microphone, USB cable and cradle, etc. The difference is that the DMs have excellent music playback capability, while the DS comes with a stereo plug-in microphone and more sophisticated transcription software. Between the two approaches, it's an assortment of great features, but it's a shame they wouldn't offer the music capability on the DS unit.

Capturing a cacophony of voices

I tested the DS-2200 side-by-side with my DM-1, conducting several interviews using the higher-quality WMA setting and the t-shaped stereo condenser microphone snapped onto the top of the unit. (The device's flat bottom makes it possible to stand the recorder on a table between two speakers.) While the DSS format is good, the richness of the WMA recording makes it easier to listen to and—more importantly—easier to transcribe. Even a dialogue in a hotel ballroom was clear and distinct, despite the fact that the staff was loudly collapsing tables and stacking chairs all around us.

However, what made the review really appealing was the chance to try the Digital Conference Microphone Kit ($449), which features a tiny pair of high-end microphones from Austrian audio experts AKG Acoustics ( in a padded aluminum case meant to carry all the recording paraphernalia, including microphone stands.

I used the microphones for a series of interviews for a project I'm working on. With an air conditioner blowing overhead in a Long Beach, Calif., office, the exceptional clarity of the WMA recordings, compared to the DSS recordings, helped greatly when it came time to transcribe the interviews.

I also took the kit to Harvard in December for a meeting in the library at the Harvard Faculty Club, with about three dozen corporate and academic gurus wrapped around a u-shaped table configuration. In the past, there was reluctance to record the meetings because the audio setup might prove too cumbersome or obtrusive. But with the Olympus setup, every participant was represented on the recording, even if speaking from the back of the room near the coffee urn.

Once you have the recordings

Multi-speaker sessions can be transcribed from digital recorders using transcription software. Olympus upgraded its professional transcription products to include the AS-4000, which offers foot pedal control, auto-backspace, variable playback speed and index control, plus routing of files, to now include with DSS and WMA formats. Other transcription applications will handle those file formats, such as Express Scribe from NCH Swift Sound, which is available as freeware.

Your own dictated voice also can be routed through IBM ViaVoice (directly) or Scansoft's Dragon NaturallySpeaking (after converting to a WAV file) for machine transcription. I have done this extensively, using IBM ViaVoice and previous Olympus models. I got fair results with the DS-2200 using both applications—but accuracy was discounted since the recorders no longer let you go into training mode.

ScanSoft, by the way, released Version 8 of NaturallySpeaking, claiming about 25% more accuracy over V. 7, promising to convert speech into text at up to 160 wpm. Dictation tests using a Plantronics noise-canceling microphone confirmed the application is significantly improved. In particular, NaturallySpeaking's unique auto-punctuation capability has grown into a truly workable feature, inserting commas and periods automatically for more, er, natural speaking.

Let someone else's fingers do the walking

Now there is another option for turning recorded speech into usable text. I submitted the same 14-minute interview section to two transcription services that accept—among other methods—digital files over the Web and promise fast turnaround.

iDictate returned a Word file about 12 hours later, charging only $42.64. But the accuracy was only fair and would require careful correction—comparable to what I used to get from neighborhood secretarial services. However, the rate of two cents per transcribed word (less for single-speaker recordings) is a bargain.

eScriptionist sent my transcript back in about 20 hours. The document was neatly formatted and the accuracy was so good that I only caught 2 errors in 2,300 words. The bill for this transcript was actually less, $31.50, even though I splurged for the RUSH rate of $2.25 per minute of recorded audio rather than their standard $1.50 per minute. Rates get a little higher if more people are recorded at the same time, if audio quality is poor or if the subject matter is too technical.

To be fair, the two services transcribed the same interview, but did not receive the same audio files. iDictate could accept DSS, but not WMA. I used both my DM-1 and the DS-2200 to record the interview, and the DS-2200 had the high-end microphones on their stands. The sound quality was dramatically better with the DS-2200, and the stereo channels made it easier to separate interviewer from interviewee, especially when we were speaking at the same time.

Steve Barth ( writes, teaches and consults on personal knowledge management and knowledge worker productivity. In the interest of disclosure, some of his consulting engagements cover the issue of desktop search

