SUBSCRIBE TO NEW SCIENTIST

Feeds

Home | News

Search engines learn how to watch and listen to video

MISSED the winning goal in that crucial football match at the 2010 FIFA World Cup? Just get on the net and you'll find hours of user-generated video content of every moment in every match.

But with 650,000 - and counting - World Cup 2010 videos uploaded to YouTube alone, finding the right replay is a challenge. Existing video search tools struggle to deal with such a volume of content, but the search giants are on the case. Microsoft is sharpening the ability of its search engine, Bing, to find video content. Google, meanwhile, is set to launch an internet TV service later this year, using its video search technology to deliver the right footage.

The core strength of these engines has been in text search, but video search seems likely to move away from this approach. That's because sorting video content using metadata - the keyword tags manually attached to videos - is like searching via an interpreter. Tags encapsulate one person's judgement of a video's content, and a tag-only search system will produce a lot of irrelevant results, says Suranga Chandratillake, chief executive of online video and audio search engine Blinkx. "For video search to be really effective, you need better ways to understand what is going on in the actual footage."

As well as metadata, Blinkx uses speech recognition algorithms to interrogate a video directly. The transcripts it generates provide more data for the firm's text-based search engine. Blinkx's algorithms attempt to parse a chunk of speech into phonemes - the small sound segments that make up individual words. The speech recognition tools then attempt to reconstruct a sentence out of the phonemes. It is by no means a foolproof approach, however. "Two distinct sentences may contain indistinguishable phonemes," Chandratillake says. "So 'recognise speech' could be transcribed as 'wreck a nice beach'."

The algorithm could get confused: 'recognise speech' could transcribe as 'wreck a nice beach'

Blinkx has been working on improving its speech recognition capabilities by building in feedback mechanisms. For instance, the user-added tags provide context to help decide which of two transcripts is most likely to be correct.

The drawback with this type of phonetic transcription analysis is that it is only suited to video with good quality sound, says David Gibbon at AT&T; Labs Research in Middletown, New Jersey. "It encounters real problems with user-generated video, where the audio track may not be great," he says - and such videos make up a sizeable chunk of online content.

Still, it might be possible to use the images themselves as part of the search. Next year, the US Defense Advanced Research Projects Agency (Darpa) will complete its $20 million Video and Image Retrieval and Analysis Tool (Virat) project, which uses computer vision algorithms to analyse surveillance footage for significant events.

More modest academic projects hint at the approaches Darpa might adopt. It's relatively easy to capture a series of stills that summarise a video, says Martin Halvey, a computer scientist at the University of Glasgow, UK. Image analysis tools can then search those stills for a target image by identifying objects, faces, textures letters and numerals. This is difficult on a large scale, however, because the processing power needed to compare one image with another becomes a problem when looking at huge numbers of files, Halvey says.

A different approach - semantic querying - could be the answer. It involves teaching a search engine to recognise semantic concepts, such as "grass", "football" and "stadium", using so-called supervised learning techniques, says Marcel Worring, a multimedia analysis researcher at the University of Amsterdam in the Netherlands. During a teaching phase, the system is fed with examples of the concept. Software algorithms define the concept by its colour, texture or shape to create models of each one.

"So with a new video, the model is applied and automatically a measure is given of how likely it is that the concept is present in that video," says Worring.

The strength of the semantic querying approach is that it can work at multiple levels, so it can narrow the search more effectively. Worring and his colleague, Jun Wu, created a relatively simple two-layered algorithm that first distinguishes videos based on genre - news broadcast or sports footage, for instance. The system then goes on to refine the search results according to the style of the content - distinguishing, for example, a video packed with close-up action from one containing graphics.

Wu and Worring tested their system on over 200 clips ranging from 2 to 31 minutes long, and genres including sport and pornography. It was able to classify the six genres it was trained to recognise, and identify seven semantic concepts with about 83 per cent accuracy. The researchers will present their work at the International Conference on Image and Video Retrieval in Xi'an, China, next week.

To search much larger video libraries, a good strategy might be to use keywords first to whittle down the number of results, then apply semantic querying to improve the quality and relevance of the videos finally presented to the searcher, says Worring.

Gibbon sounds a word of caution, though. The level of semantic detail that video search algorithms can actually recognise is still fairly limited, and a training session is required for each new concept. "I think there's a long way to go before we can say we're able to understand all that complexity," he says.

If Gibbon is right, finding the ultimate video of that crucial goal might still be a problem even when the football party rolls into Brazil for the 2014 World Cup.

Issue 2767 of New Scientist magazine
  • Subscribe to New Scientist and you'll get:
  • New Scientist magazine delivered to your door
  • Unlimited access to all New Scientist online content -
    a benefit only available to subscribers
  • Great savings from the normal price
  • Subscribe now!

If you would like to reuse any content from New Scientist, either in print or online, please contact the syndication department first for permission. New Scientist does not own rights to photos, but there are a variety of licensing options available for use of articles and graphics we own the copyright to.

Have your say

Only subscribers may leave comments on this article. Please log in.

All comments should respect the New Scientist House Rules. If you think a particular comment breaks these rules then please use the "Report" link in that comment to report it to us.

If you are having a technical problem posting a comment, please contact technical support.

ADVERTISEMENT

Latest news

Decoding the ancient Egyptians' stone sky map

11:39 05 July 2010

Why did the Egyptians create the Zodiac of Dendera, and what was it intended to represent? asks Jo Marchant

iPads go live, hydrogen hits the road and CSI: PigMovie Camera

10:45 05 July 2010

In this month's New Scientist TV, see how a smoking pig is a forensic telltale, watch the first iPad concert and peer into motoring's hydrogen future

From sea to sky: Submarines that flyMovie Camera

08:00 05 July 2010

The Pentagon wants a vehicle that can soar like an eagle and swim like a stingray – and engineers are rising to its challenge

Sense of touch influences our decisions

13:00 04 July 2010

Tactile sensations remind us of metaphors we use to describe our lives, and so influence our decision-making process

TWITTER

New Scientist is on Twitter

Get the latest from New Scientist: sign up to our Twitter feed

ADVERTISEMENT

Partners

We are partnered with Approved Index. Visit the site to get free quotes from website designers and a range of web, IT and marketing services in the UK.

© Copyright Reed Business Information Ltd.