Wednesday, June 3, 2009

Aw C'mon, be a Minion!

Come my minions! Gather 'round me and do my bidding!!!

Nuts! That never works.

I'm starting a minion list. It's not a list of minions, it's a list of projects for my minions.

I'm going Internet because minions are sparse on the ground around these parts. I believe it's because my readership is made up of selfish swine who won't surrender their sense of personal identity for the betterment of me. But I press on! I'm brave that way.

My minion list is a list of projects, usually computer projects, that strike me as a good idea, but I don't have time to do them. It would be nice if a minion would do them (hint, hint).

I've talked about the minion list in the abstract many times, but now I've decided to codify it. That way, when someone else commercializes one of my ideas, say "Feels on Wheels" (I'll explain later), I can sue them for plenty cash.

Wanna Taste?

My first entry for the minion list is "Voice Messaging". Voice Messaging is like Instant Messaging, augmented with sound.

The concept is simple. A person wants to send you a message. Instead of typing, they have the option to record a message. The message is sent to you, along with any typing they wish to do. On your side your VM program converts the sound file to text and displays it like a normal IM. If the speech conversion garbles it too badly, you just highlight the part you can't read and it plays that part of the sound file.

This would rid the world of emoticons. If you want to know how someone is feeling, play the sound file.

I Object!

* Speech recognition sucks.

You're right as far as you take it. However, in this case we're not producing a finished document. It doesn't have to be perfect, just readable. The recognition only has to be gud enuf 2 undrsAnd. If you're unsure what was said, play the sound file.

* What's to stop someone from saying one thing, but typing another?

The conversion is done on the receiver's end. The sender can send text along with the sound, but the final arbiter of the text is the receiver.

* What's to stop someone from screaming obscenities or pulling other "funny" jokes.

Before playing a sound file, the program would normalize the sound levels. It could even warn you if the original has loud points in it.

* Couldn't someone just use the phone?

Yea, but VM is closer to IM than a phone call. The sender can send a VM, but you can ignore it until you're ready, just like IM.

* What about people who don't stop yakking?

Put a size limit on how big you'll receive. Make it settable per person. Even if someone does get carried away, it's still better than voice mail.

Almost any sound player lets you replay selected parts of a file so you can deal with the witlings who leave 5 minute messages and wait till the end to mumble their contact information.

* What file format should we use?

This is getting a bit low level, but something open format is an obvious requirement. It turns out that there is an open format called Speex that gets the job done.

A 14 second PCM, 16 bit, mono 48000 Hz .wav file is 1.4 meg. Convert it to an .mp3 shrinks it to 123K. Speex mauls it down to 66k. All with no real loss of sound quality for what we're doing.

* What about security?

As long as an open audio format is used, and you trust your audio player, then you're all set. If you use an audio player that does odious stuff like pop up web browsers (oy!) then you're asking for trouble. Playback should be based solely on the contents of the file, not it's extension.

* What technology should the be built on?

I'd bet that most IM protocols allow file transfer. It would simply be a case of the client programs handling audio file differently.

I Submit Like the Schweinhund I Am!

You have convinced me oh minion master. I supplicate at your feet. What is the first step?

For a start, someone could write a standalone application that reads Speex files and transcribes the contents. Some college out there has to have information on phonetic translation. Lets see what the state of the art is.

After that, add playback of selected sections. Once we have that settled then the rest would be cake, and how many masters let their minions have cake?

No comments: