The Pronuncinator

As I understand text-to-speech engines (the vocal technology behind Apple’s Siri, Microsoft’s Cortana, and Google’s Pam [that’s how I’ve come to know her; Pam is Map backwards]), they contain a finite set of rules for how a voice should pronounce things. General rules like “what does the ‘ly’ suffix sound like in American English?” and “what does an ‘eau’ cluster sound like in British English? And French?”

I’m massively oversimplifying of course, but it should be clear a text-to-speech engine can’t contain individual pronunciations for every single word in every language. That’d take too much space, and it’d fail outright on misspellings, brand new words, and unusual inflections of existing words (like redisassembled or unfartingly). A TTS engine needs heuristics, not specific instructions.

But there are some words that rarely follow the rules. Names, for instance. My phone can’t pronounce the surnames of many of my best friends. It can’t pronounce the freeway exit I take into San Francisco (Duboce street, which is pronounced dewBOHss but Maps says DEWbuss). And, since I’m currently vacationing with my parents, in Hawaii neither Siri nor Pam can pronounce a fucking thing.

But we have this expectation now that our devices are always online. Mapping apps certainly expect it. So why can’t special pronunciations be sent over the air to the TTS engine? The Internet has practically unlimited storage, and for applications like mapping where the data is structured rather than freeform text the phonetic spellings would be easy to integrate.

I’m sure people across the world would gladly contribute to a crowd-sourced corpus of correctly pronounced place names, too. They have civic pride. Nobody likes hearing a robot mangle streets in their home town, and the Wikipedia For Saying Stuff Right™ would probably only need a couple of contributors in each city to make millions of people less frustrated and confused with their driving directions.

So somebody get on that.