Rooting for the Underdog: How TalkIQ’s Transcription Engine Beat IBM’s Watson

Just how does a startup take on the big guy? Read on to find out.
Etienne Manderscheid

One of the things customers often ask us here at TalkIQ is how accurate our transcripts are. We now say, “more accurate than anyone else’s.”

But we get it. You don’t automatically trust a startup to get you the transcription quality you want, especially since Siri confuses words enough to admit “sometimes my accuracy socks!”

So last week, instead of going on intuition alone, we tested the accuracy of our speech recognition engine against a real industry leader–IBM’s Watson. Our goal? To see how our transcriptions stacked up against Watson.

The results? TalkIQ came out on top, producing 19% fewer errors than Watson on randomly selected conversations from our clients!

We know. It sounds unbelievable. But it’s not.

So how is it that an up-and-coming company like TalkIQ can beat an industry behemoth?

TalkIQ relies on state-of-the-art speech recognition for our transcribing solution. While we can’t share everything that goes into our secret sauce, one key standout is client-specific language modeling.

Language modeling is the process of capturing the unique words and word sequences your company uses, and building a statistical model of how likely any word combination is used. For example, our model would know that “sales automation” is more likely in the conversations of a B2B SaaS sales tool than “sails auto motion,” and it will use that knowledge to disambiguate speech. By adapting our language models to every client, we ensure the best possible transcripts for your organization.

How can this help with mumbled or accented speech? Speech recognition is a handshake between acoustic modeling, which infers phonemes from sound (such as \b or \p) and language modeling, which assigns probabilities to word sequences. A strong Boston accent might say “automation” more like “art-oh-motion,” so the role of the language model is to supplement the pure sound with a solid notion of what words should follow each other. This is especially valuable for telephone calls, where frequencies above 8 KHz are lost, limiting acoustic information.

At its core, this is nothing new. The difference now is that we apply it specifically to your company, your industry, and your call guides.

Every sales organization speaks differently. You have unique words that define your product and vision. We make it our mission to nail the words that make you special, because we know those are the words that pack meaning.

As a sales manager, you know accuracy matters. Accurate transcripts give you the visibility you need into what your reps are saying, which makes you more effective at your job. On top of that, Conversation Science™ analytics help you identify patterns that distinguish effective calls from those that fail to close the deal. After all, it’s hard to train reps when you don’t know where their pitch is going wrong.

Our goal is to provide sales managers data to make smart decisions, while helping reps do what they do best–sell. Leave the transcribing and data analysis to us. After all, we’re pretty darn good at it.

**Test details: for the number geeks out there, the test calls (which we did not train on) comprised 20,000 words. The 95% confidence interval for error reduction relative to IBM-Watson was 10.6 to 26.8% (paired t-test, df=29, p < 0.0001). We ran IBM Watson speech recognition using their featured in-browser module.