Giving Voice to Metadata


I think anyone using PhraseFind, ScriptSync, and SoundBite¬† appreciates what dialogue can bring in finding what you’re looking for as part of the editorial process. At times it is akin to finding a needle in a haystack. So it was interesting to see the Apple patent on voice-tagging photos and using Siri to retrieve them as part of the claims expanding the capabilities of voice based interaction with Apple devices. It will be seen as new and innovative if and when it hits a future version of iOS.

It reminded me an awful lot of a pending patent and prototype I had designed and built at Avid over three years ago that used multiple descriptive tracks on any given piece of media. Currently metadata tagging is either clip based, frame based, or span based and can be a drawn out process. The idea behind this solution was to use voice annotation and descriptions to the video. In its most simple form, a single track would describe what is going on in the scene. Because it is time based as the tagging happens during a record/playback - all search results layer line up to that portion of the clip. Or using “context based range searches” can further refine search results. Things get even more “descriptive” when creating multiple metadata tracks, where each track can be of a certain category, for example:

  1. Characters, placement, movement, position, etc.
  2. Camera, angle, framing, movement, zooming, etc.
  3. Location, objects, day, night, interior, exterior, colors, etc.

Any search can now use all tracks, or just a subset of tracks to filter out results as needed. Combining voice tagging metadata with pre-existing “metadata: such as camera¬† name, shoot date, scene, take can make for a very powerful media management system that could not only be considered new and innovative, but extremely useful as well to productions dealing with not hundreds, but thousands of hours of source material. Some customers I discussed this with had needs for forty or more descriptive tracks on any given source. One could even consider recording a tagged “descriptive” track directly to camera during production and used anywhere downstream in the production cycle.

Voice, the new metadata.

Comments are closed.