The amount of meta data available for the video clips used on our sites, such as TV4Play, is often very limited. Often the video clip have just a title and belong to a category. It has no description and no keyword. When the amount of videos is large it isn’t feasible to manually annotate all of them with more meta data despite it being critical if you want to find a certain video or videos on a certain topic.
When it comes to text there are a large number of maturing techniques for extracting useful information. Latent semantic indexing (PDF) and inverse document frequency (PDF) has become standard tools in systems that operate on text. To extract information from image and video data is however a more complicated feat. The research on image analysis and face recognition has led to software and services offering some basic support for browsing and cataloging objects and people in media but we are still far from functions that are reliable enough not to confuse unprepared users.
For some of our programs we actually have meta data that aren’t used in our search system so far. We have every word spoken recorded, not from using speech recognition but typed in by humans. Yes, it is the captioning/subtitling that we have for some of our programs, e.g. our home improvement show Äntligen hemma, the comedy panel game show Parlamentet, and Emmerdale (Sw. Hem till gården). By adding the texts from the shows to our Solr index it would be possible to find episodes on building your own sauna, that discusses the mishaps of the Swedish king or the episode where one of the characters is drunk.
Another use for this data is to visualize i. Word clouds for some of our shows looks like this when entered into Wordle (and being careful not getting too horrendous font choices):