Microsoft recently announced that Speech Server 2007 will provide support for speech applications written in VoiceXML. In order to penetrate the enterprise market for speech applications, Microsoft really had no choice. SALT-based applications remain as rare as hen’s teeth in the enterprise. Ok, maybe not that rare, but certainly the number pales in comparison to the number of VoiceXML-based applications. The press release says “More than 40,000 telephony ports of capacity have been licensed, and Speech Server customers are successfully answering more than 10 million calls per month on the platform”. I know of individual companies that by themselves handle more than that many calls per month with VoiceXML applications.
Also, it’s become pretty clear that VoiceXML is winning the mindshare of the standards committees. Of course, VoiceXML had a big advantage by preceding SALT by several years. Even in the multimodal space, SALT is very unlikely to become the anointed standard. Some of SALT will likely live on in VoiceXML 3.0 and beyond. That’s a very good thing for all of us, though, as I believe VoiceXML 3.0 and XHTML+V are going to be much better standards due to some of the good ideas that originated from the work on SALT.
I’m curious if part of the reason for Microsoft picking up some of the technology assets and a few people from failed start-up Unveil was to gain some additional VoiceXML experience in advance of this plan. After all, the headline of the press release I linked to above was “Microsoft Unveils Road Map for Speech Server 2007″. Then again, maybe not.
Very few serious applications are written in VoiceXML by humans. It is mostly generated dynamically from some higher level object layer that isolates REAL speech applications from VoiceXML which then becomes a middlware interface (albeit a redundant one) between a speech platform (ASR, TTS, telephony) and some proprietary higher API level.
Those who write Javascript mixed with VoiceXML create little ugly pieces of code that are hard to maintain (they look like 30 year old Unix nroff/troff stuff). Anytime, humans use machine oriented data exchange tool like XML as a human oriented programming environment, it is a dead end effort. VoiceXML is an overcomplicated, confused and misdirected mess which is becoming even more complicated and it will collapse under its own weight or people will simply stop using it as simpler tools will finally be created. SALT is incomplete and does not solve real issues. If you are running something like VoiceObjects, you really don’t need VoiceXML. If the only problem is to have a standard speech platform for all apps to be portable, that’s a different issue without the need for poor XML-based programming language such as VoiceXML. All you need to talk to your speech platform is a minimal SOAP over http protocol with an industry standard XML enclosure. Now, what you really need on the app side is separating speech UI from business logic. Speech Application is really 70% speech UI design task (if you do it right) and 30% or less busines logic software development task, but for some reason, the whole industry gets it in reverse and thus it is all misdirected.