Follow TG Daily

Most Discussed Articles

More Discussions»

Articles By Tag

amd Android antitrust apple ARM blackberry china Dell Firefox 3.5 google Green Dam hp ibm intel iphone microsoft mozilla netbook nintendo nokia PS3 Samsung security smartphone Sony twitter upgrade wii Windows 7 Xbox 360
Read more at
   SmallNetBuilder.com
Try our new and free
Price Comparison Service

Partners

Reviews & Rankings



Google turns audio into Text with GAudi

PDF Print E-mail
Software
By Christian Zibreg   
Wednesday, September 17, 2008 11:41
Mountain View (CA) - Google quietly released its GAudi service, a new take on online information retrieval. It uses a remarkably accurate speech recognition engine to extract information from audio and video content and turn it into indexed text that can easily be searched. GAudi is currently limited only to politicians' speeches on YouTube, but Google said it will gradually increase its scope.

Google Audio Indexing (GAudi) is currently available as an experimental Google Labs project with very limited content. Currently only covering political content, Google said that "political videos and election materials are a special case of broadcast news content, a domain that has received a lot of academic and industry attention and is known to perform well." According to the company, the service may help people to better follow the views, actions and platforms of the two presidential candidates?"

"The US election is just a first step," Google said. "We see it as an experiment platform where we can learn what features make the best user experience for people looking for spoken content on the Web." So how does it work?  In short: Remarkably well, although it is apparent that GAudi is limited by the weaknesses of today's speech recognition algorithms.

The user interface is just what you would expect from Google: Clean, crisp and focused on search. It enables users to quickly find a video based on simple search queries or even entire sentences. The magic that works behind the curtain is speech recognition technology developed in-house by Google's dedicated speech research group. Machine-assisted speech recognition has been around for some time, but GAudi clearly showcases how much it has improved. GAudi scans audio information from YouTube videos and recognizes spoken words that are turned into text to supplement the existing indexing information, which greatly enhances its search capabilities.

Searches return a list of YouTube videos that contain your queries. When selecting a video from a list, GAudi shows it with an embedded YouTube player that is modified with up to ten yellow marks on a timeline to indicate query mentions within this video. You can navigate to the the yellow marker to read the transcript or click on a mark to directly jump to a portion of the video with query mentions. There is an additional search box to initiate a new search within the current video.

Overall, GAudi's accuracy is impressive, but its engine is not bullet proof. For example, in this video, GAudi incorrectly detected country name "Czechoslovakia" as "tech also but there." Common words are also prone to mistakes: GAudi mistakenly replaced "free" with "forty" in the same video. So expect the current weaknesses of speech recognition to show up in GAudi as. The quality of audio, background noise and preciseness, punctuality and emphasis of the speaker also affect the outcome.

We have no doubt in our mind that GAudi will become an important part of the Google search engine. The technology has the potential to significantly enhance usefulness of the search engine and enable searching rich content in new ways.

Comments (5)Add Comment
Sep 17, 2008 12:20     
Sep 17, 2008 17:11     
Sep 17, 2008 23:00     
Sep 18, 2008 10:19     

Write comment
smaller | bigger

busy
Recommend article:
Slashdot
Digg
Delicious
Technorati
YahooMyWeb
Stumble
NewsVine
Ma.gnolia
Subscribe to the TG Daily Newsletter
Email:
 

Shop Keywords: Google, Gaudi, YouTube

-view -software -140 --140
Powered By Page_Cache by Ircmaxell
Generated in 0.704745054245 Seconds