Tuesday, June 2, 2009

Sound Bite Me!

One of the true pleasures of programming is finding some niggling little problem and solving it with a simple bit of code. If you can solve it in a couple of hours, even better.

When I blog I tend to speak what I'm typing. If it doesn't sound right when I say it, it probably wont sound right when you read it.

One of the problems I have when I blog is that I talk faster than I type. I talk faster than I think. I start typing, and then I get a flash of inspiration. I run the idea through my head, and then, half a virtual page later, I realize that I haven't written anything down.

Then I have to try to recreate the idea from memory, but the flash is gone. By the time I've rebuilt it, or accept that I've forgotten it, my original thought is out playing in the yard and won't come back.

It frustrates the hell out of me.

I started playing around with ideas to making blogging easier.

At first I tried to check out the state of computer speech recognition. I figured I'd just blab on in my blog, and then I'd go through and clean it up by hand.

Computer speech recognition is still slow, expensive and it still sucks. Trying to run a editing session without using a keyboard is slower than typing with 2 fingers. I also wanted to blog via Linux, so firing up a Windows product ain't getting the job done.

Next it tried to do some sort of integration of speech and text together.

I envisioned loading a sound file into a sound editor, where I could chop it up and move it around. While editing the sound, I'd join text to it. When I moved a blob of sound, text that went with it would move to. Eventually I'd piece a blog out of all my rattlings on.

I still think that this is an interesting solution, but man, it would be some work! I also I think I'd end up with a crappy sound editor linked to a crappy text editor. No dice.

I also had a minor epiphany. I'm not going to keep these sound files around forever. I'm just brain dumping to a file for a few minutes until I've finished typing my original though. After that I can replay the recording and transcribe anything I think is useful.

I already know how to record from a mic on my Linux box. Adding that to my new insight I wrote a shell script that turns on the sound recorder, dumps the contents of the microphone into a file and, when I stop recording, plays it back (that way I can tell if I forgot to turn on the mic or something).

It worked like a charm! Every time I needed to make a note, I just fired up the script, decided on a name for the sound file and away I went. It was a little awkward, but a big step forward.

I called the script "sound_bite".

After than I needed to come up with a way to play back my sound bites, so I started adding flags to play the last sound bite or the first sound bite or list the sound bites and let me pick. Then, another epiphany! They're just frigging .wav files! Maybe I could just double click on them in my file manager. Oooo. Me one smart monkey!

Actually, once I got the file manager into the game, it cleaned up a lot of code. I didn't have to tell the recorder where to put the sound files, I would always put them into the same directory and give them a time stamp for a name. If I wanted to organize them better, I'd use the file manager to rename them or move them elsewhere.

The only thing left was making it easier to use. Typing in the command every time is a minor pain. I needed a quicker way to access it. That was easy too.

I hooked up sound_bite to a shortcut which put all the .wave files into the directory "sound_bites". I made another shortcut to bring up the file manager in "sound_bites" directory. I'm sure I could come up with a few dozen little tweaks, I know enough to stop typing when I'm done.

I'm now an official audio driven blogging fool!

Here is the entire source code for sound_bite. You may have to play with it a bit because the blogger code likes to play with it.

Enjoy!


#!/bin/sh -

# Sound_bite: Written by Dale Wiles 6/2/09.

# Exit if an error occurs.
set -o errexit

if [ $# -eq 0 ]; then
  cat <<EOM
Usage: $0 directory

Move to DIRECTORY and start recording a wave file from the microphone.
The name of the wave file is yymmdd_hhmmss.wav.
EOM
else 
  sound_dir="$1"
  cd "$sound_dir" || exit 1

  # Make the output name based on the time, down to the second.
  # That way I can't overwrite existing files.
  # Alright, in theory during daylights savings time it could
  # overwrite.  I've added code for that almost impossable situation.
  while :; do
    out=`date +%y%m%d_%H%M%S`.wav
    if [ ! -e $out ]; then
      break
    fi
    echo "Waiting...."
    sleep 1
  done

  echo "Recording $sound_dir/$out"
  sox -t alsa default -v 7 $out
  echo "Playing $sound_dir/$out"
  aplay $out
fi

No comments: