Research Tools: Searching mountains of multimedia with Soundbite

Capturing audio and/or video is standard practice for many of our research and testing oriented engagements. It is valuable both for analysis and, in condensed form, for communicating findings to clients and team members. There is real power in showing participant reactions and spontaneous commentary, in their own words, with associated inflection and emotion. But damn it can be time consuming! And woe be to the researcher who finds himself wanting to go back and find out what participants had to say about some topic they didn't initially capture in their notes or hunt for a quote an observer vaguely remembers hearing from either Sam, Steve or Joe on Day 3. Even reviewing sessions at 3x, about as fast as I can go, it can take a lot of time. So you can imagine my excitement when I saw this tweet a few months back: A tool for searching audio? Sounds incredibly useful for me! Here is how the company (Boris) describes the tool in their marketing materials:
"Boris Soundbite quickly and accurately finds any word or phrase spoken in recorded media. Video editors, producers, and journalists can instantly play all occurrences of a spoken phrase in their media, then insert the perfect take into their FCP or Premiere Pro project, organize clips around keywords, and even find replacement words for problematic audio. Based on Nexidia’s patented dialogue search technology - which has received accolades from Creative COW, DV Magazine, Post, and others - Boris Soundbite greatly reduces logging and transcription costs and lets you spend your time being creative instead of manually searching hours of video." ->
So how does it stack up to those claims? Do we even need to take notes anymore?  The tool has a free 14-day trial so I evaluated it last month during a project I was working on that generated a good amount of audio and video. My experience after that trial was that for the types of conditions that I, as a user researcher, encounter, it cannot live up to all its promises, but it is definitely worth knowing about (and trying out for yourself!), and it is far better when compared to other tools I've used which attempt automatic video/audio transcription. A note about recording quality and context: Media used in my trial included recordings of web meetings (WebEx, GotoMeeting) as well as files recorded from a portable voice recorder, my computer's mic, and a video camera. None of my participants wore a lapel mic, which would certainly have resulted in better audio quality. First the positives:
  • Indexing is fast! I was quite impressed with how quickly you could begin searching 8 hours of media.
  • The tool is lightweight and usable, so it is quick and easy to ingest large amounts of heterogeneous media (video and audio, as long as it is in a quicktime compatible format). Searching is easy. Reviewing results is also satisfyingly easy and quick, left and right arrowing through the hits and up and down through the files.
  • I was pleasantly surprised at how well it handled longer phrases along with single words.
  • When it does work, it feels magical.
The limitations:
  • I found a prevalence of false negatives and thus I could not trust it as a first line research companion. In information retrieval you have the concept of a result either being true or false, that is it either does or does not refer to the subject of interest, and positive or negative, which refers to its existence or absence in the corpus being searched. So a true positive would be a correct result found and returned, and a false negative would be when I know the participants talked a lot about their pain trying to schedule something, but no results are returned for the term 'schedule'. You know it exists, but the tool doesn't pick it up. That is when it becomes difficult to trust those results where you only get 2 or 3 hits, because it makes you wonder what else is out there, what you are missing. False positives happen a lot, but that is fine, since they are easy to skip over.
  • I found Soundbite didn't perform well with my participants who had accents.  In those cases, the results I got were predominantly my own voice asking questions, which is marginally useful as it gets me near topic areas, but not much better than noting rough times as I go along.
So, ultimately, it is worth trying, and the clipping features could be a huge time saver (I don't use FCPro so I didn't test that functionality) but it works best with good audio and narrower range of English pronunciation than I encounter in and around the Silicon Valley. And since I can't trust that it will catch all occurrences, I still need to take notes.