When one envisions the future of technology, it most likely involves designating menial tasks to our robotic underlings on whom we bestow personality and human traits that allow them to seemingly converse with us with witty repartee. Voice control is, without doubt, the future. Each competing smartphone platform today has its own form of assistive voice control, including Windows Phone which uses Microsoft’s Tellme Speech technology. The feature set of the speech recognition technology in Windows Phone 7 is somewhat limited in that it allows you to speak specific commands such as “Call Alex on Speakerphone” or “Find coffee in Seattle,” as well as dictate text messages, listen to incoming texts and respond without manual typing. However, Tellme is not conversational in its current implementation. It works well but there are some missing features that many users are clamoring for such as using voice to control music playback- a particularly useful feature while driving. Windows Phone thankfully has amazingly dedicated 3rd party developers that have bought apps to users to fix missing gaps in functionality. Diego Carlomagno is one such developer. He created a fantastic app, aptly titled Hey DJ!, that augments the Windows Phone voice control experience by allowing users to be a hands-free Disc Jockey of sorts.
Hey DJ! is an application that enables you to use your voice to control music playback on your device. The app uses a panoramic layout of three panes with playback controls at the base of the page for manually skipping tracks. The first pane features a giant “tap to speak” button that activates the voice recognition with a little beep sound to alert you to speak your command.
The commands you can speak are surprisingly varied and encompass most of what you’d use to ‘DJ’ your tunes. A “What can you say” link provides a handy guide on which commands you can use to play tracks including:
- Play Artist Amy Winehouse
Play Genre Pop
Play List Bedroom Music
Play Album Some Nights
Play Song Call Me Maybe
The second pane features a “now playing” screen which shows current and upcoming tracks. You can manually select tracks from this list to play by tapping on the track. The third pane shows tiles of previously played commands for quick re-selection. You can also turn on shuffle or repeat mode. Two other useful commands include “surprise me” which lets Hey DJ! queue up a list of music to play, and “play all” which plays your entire music library in alphabetic order or shuffle mode if you enable it.
Windows Phone differentiates itself from other smartphones with its visually iconic Modern UI (popularly codenamed Metro), featuring live tiles. Hey DJ! capitalizes on the OS’s unique style and abilities by providing a main live tile that updates with the current playing song’s album art, as well as an optional secondary live tile titled “Play Now” for a one tap way to launch the voice recognition.
Hey DJ! launches almost instantly. It’s smooth and easy to use. I have been using the app for several months now and I have noted no crashes or performance hang-ups. In fact, Hey DJ! is an excellent example of how 3rd party apps can and should perform in tandem with Windows Phone – an adherence to the design principles and ease of use the OS is known for.
While driving or walking with a headset with inline microphone, Hey DJ! can prove to be invaluable. In the app settings there is an option to always listen for commands on start. This provides a complete hands free experience if you launch the app using the Windows Phone Tellme commands. The app then launches already listening for a command. I have a pretty thick accent (I grew up in Trinidad) yet Hey DJ! recognizes my commands at least 80% of the time. Your mileage may improve if you don’t mispronounce words the way I inadvertently do.
Hey DJ! costs .99c in the marketplace with a fully functional trial version that is ad supported. Developer Diego Carlomagno has been releasing a steady set of updates that have added increased stability and features to the already highly rated app. He took time out of his busy schedule for a short interview, and we can’t thank him enough. Download the full version here or the trial here.
Interview with Diego Carlomagno, developer of Hey DJ!
Tell us a bit about yourself, your background or anything interesting you’d share to with the community.
My name is Diego Carlomagno and I’m a software engineer lead at Microsoft on the Exchange Mobility team. I have a degree in Electronics engineering (specialized in telecommunications and signal processing) from the Universidad Nacional de La Plata in Buenos Aires, Argentina. After getting my degree I worked for a few companies in Buenos Aires then moved to the US to join a startup in New York. Later I moved to Madrid for a while before joining Microsoft in Redmond, WA. In my spare time I’m also a musician – guitar is my primary instrument but I also love making music with keyboards and computers.
How did you start working for Microsoft and how has your experience been there?
I joined the Microsoft Exchange Unified Messaging team in 2005. I found an interesting job posting online that was a perfect match given my previous background in telephony and internet applications. I find Exchange to be an amazing team full of talented people, and you never get bored because you have the opportunity to work on many areas from client/UX, mobile devices and complex scalable server side components.
Microsoft has relaxed their strict rules by allowing employees to work in their spare time as Windows Phone developers. The employees get to keep the resulting intellectual property and most of the revenue. Has this been a good incentive in your decision to become a developer for the platform?
Yes definitely. It’s very easy to become a Windows Phone developer if you are a MSFT employee thanks to a moonlighting policy that specifically covers WP development. There’s a simple process to follow, you just need to complete a few forms and that’s it!
I know of at least one other Microsoft employee also developing, Jeff Wilcox, creator of 4th & Mayor. Have there been a lot of employees responding to this incentive by Microsoft?
Yes, I don’t know Jeff personally but I follow him on twitter His 4th & mayor app is really cool. I know there are a lot of Microsoft employees making apps, I believe the number is probably in the thousands.
What has your experience been as a Windows Phone developer in terms of the development tools, app hub, msdn documentation, approval process, etc? Where do you see room for improvement?
If you are a .Net or silverlight developer it’s very easy to start developing for Windows Phone. The dev tools are really good and since it’s .NET you can find a lot of resources online. “Hey DJ!” was my first app and it took me about 4 weeks to build v1 including client and server code. I wish there was support for native code in WP7 because Hey DJ! requires some real-time audio processing on the client (speech encoding, silence detection, etc). I also wish WP7 exposed a speech recognition API on the phone itself, that would allow Hey DJ! to work offline without requiring an internet connection (this tops the list of user requests). The current App Hub version is OK, but previously it had some performance problems – I hit a few issues the first time I tried to submit my app. Another feature I would love to see is to have the ability to reply to marketplace comments .
You developed Hey DJ! under the company name “Punoz Apps,” any significance behind that name?
Yes, “puno” is an old nick name, my wife calls me that – but I’ll tell you where that comes from some other time . Basically when I tried to register puno.com I found it was taken so I got punoz.net/com instead. There’s no web site running on that site yet (working on it as we speak) but for now it’s just the domain hosting Hey DJ’s backend servers.
Hey DJ! works really well. It started off as “Speak to Play” before evolving into a fully featured voice command app for music. Tell us a bit about how the app idea came about and how it has evolved since its release.
I always wondered why windows phone 7 didn’t have this built-in. I used to own an old windows mobile device that allowed you to play songs using voice commands (that even worked without a server connection!). But the idea of building this app came to mind one day while I was driving and wanted to play a specific song on my phone, of course it was too dangerous to mess with the music app while driving so I decided to build my own app (initially for personal use). Once I showed the app to a few coworkers they loved it and encouraged me to make it public. The initial version was called “Speak to Play” but after a few updates I decided to change the name to “Hey DJ!” The new name also better represents the direction I’d like the app to evolve in the future.
In terms of the technical aspects of the app, how exactly does it work? What technology/APIs do you use and how is it so accurate? Does it “learn” from users over time? ( I have a thick accent and Hey DJ! understands me much more than the built in Tellme!)
Hey DJ! has a client (the app you download to your phone) and a server component. I hit technical challenges on both sides: on the client you need to process audio almost in real-time to get low latency and good performance (here’s where having support for native code would’ve made it easier). On the server side is where the heavy work of doing the speech recognition happens: The speech recognition code is based on a Microsoft’s speech engine but I had to get really creative to tune the engines and grammars for the Hey DJ scenario, mainly around having good support for multiple cultures – for example if you want to play U2 in Spanish some folks will say “You Too” but some others will say “U dos”. Both should just work!
This particular speech recognition engine does not automatically learn from users over time but I do use user feedback and some anonymous data to improve the speech grammars. The reason why it works relatively well is because the list of phrases you can say is limited (command and control grammars). Hey DJ’s speech algorithms are not as complex as other services like Ask Ziggy or TellMe.
Hey DJ! has great reviews in the marketplace. What has the customer feedback been like? I remember lamenting about having to first tap the button to start listening and this option was quickly added in the next update (if you pin ‘play now’). Has the feedback shaped your development work?
Yes, definitely! I take user feedback very seriously. I read all reviews and e-mail feedback to improve the app. Almost every update I push to the marketplace has fixes and features requested by user feedback: Like dropping tap & hold by a single tap, pin to start, live tile with song background, have a setting to display the song info always at the top, etc.
As you may have noticed, I’ve been quite irritated with the announcement that no current Windows Phone, including the newly launched and much hyped Nokia Lumia 900, will get Windows Phone 8. Did you expect this? How does this affect you as a developer in terms of app support?
I kind of expected this and I think folks are worrying too much about it . Since WP7 apps are supposed to just work on WP8, as a developer I will continue to build for 7.x until the WP8 user base is substantially bigger than the current one. Sure, if you are on WP7 you might not get some new cool WP8 game that takes advantage of multi-core processing but that’s about it. Plus people switch phones every couple of years (at least in the US) so eventually they will upgrade to WP8.
Windows Phone 8 has many new features developers can take advantage of that are not backwards compatible to wp7.x devices. One particular feature that I find interesting is the innovation of Microsoft’s speech platform. All developers can tap into this speech recognition engine and make their apps converse with the user. Do you see this as an opportunity to take Hey DJ! to the next level?
Yes, that feature looks very interesting to me. I haven’t had the chance to look into that yet but I will as soon as the new SDK is released. I imagine this would make it easier to launch Hey DJ with a specific command right from the start screen, and I’m also hoping I can even have some sort of offline support – which is one of the top user requests I get.
Have you ever wondered “What if Microsoft builds music voice commands into Windows Phone?” Any new app ideas in the pipeline?
I really wish they do this because it will make the user experience better. If that happens there would be no need for my app but I’m sure I can come up with some other creative extensions to add more value. I have some ideas for future updates but I prefer to keep them as a secret for now – you know 80% of an app is the idea itself .
Is there anything you wish to share with the Windows Phone community?
I think Windows Phone users are very passionate about the platform and have high standards on what a good user experience should be – I see this all the time in twitter/emails/app reviews, users just hate bad/slow apps, or apps that don’t align well to the metro experience.
I love being part of the WP community and would like to see it grow a lot. I’m also really excited about the W8/WP8 convergence and all the possibilities this will offer to both users and developers.