Search engines: Diving into the "Deep Web"

Posted by Samantha Rose Hunt

Chicago (IL) - "The Now Web" is the Internet we all know. It contains many billions of trillions of data packets all constantly moving around. Every time an update comes from MySpace or Facebook, even simple things like the number of friends online and available for chat, that information zips across The Now Web. The data is there only for a short lifetime, and then disappears forever having been received, displayed and discarded just as easily. Only a very minimal amount of specific information is associated with these events, so there is no URL for general access -- meaning the information cannot be captured and searched on the web. But what if there was a way to somehow make it searchable?





The Internet contains a wealth of data such as financial information, flight schedules, medical research, and other material stored in various databases -- nearly all of which is invisible to web surfers unless you happen to be on those pages directly.



Events and information such as these are not pertinent to a Google index or catalog. And these bits of information lack the necessary URL information which currently drives Google algorithms for their PageRank concept. As such, they'll never turn up as answers in any search queries.



Google has come a long way in their ability to catalogue the Internet. Last summer marked the one trillionth address added to the list of web pages the site knows about. But even with a number that large, it hardly puts a dent in the entire Internet as it exists today.



Search engines utilize programs called crawlers (or spiders) that gather information by following hyperlink trails which connect the web. The searchable web is just the beginning.



In order for data to be extracted from the Deep Web, search engines will have to analyze the terms users use when they search, and then figure out how to place those queries into datatables. For example if you are searching for "Radiohead", a search engine would have to know what databases to crawl in an attempt to return your search. In this instance the search engine would have to know where to find information about bands, music, concert venues, etc. Additionally, it would have to know what types of queries these sites, and databases would accept.



This space that goes uncrawled (and spidered) has been deemed the "Deep Web". The web is literally endless. There are millions of databases connected to it and it's always changing. Still, this leaves the opportunities for searching out data equally endless.



At this point it is impossible for any search engine to search every data combination available.



Major search engines face challenges which prevent them from diving into the Deep Web. This is why trying to find the answers to questions such as "What is the cheapest hotel rate in Las Vegas in March?" is difficult. The answers are on the web; however search engine algorithms are not designed to find them.



But very soon that could change with technology developing that's designed to extend the search engine's reach into currently unavailable areas. The day could come when typing a question into a search engine yields the simple, concise answer for which individuals are searching. Ultimately it could change the face of the Internet all together.



Google is now working on a Deep Web search strategy which involves sending out a program designed to analyze the contents of all databases it encounters. For instance, if the search engine stumbles upon a page with a form related to music, it would begin guessing likely terms such as "concerts", "bands", "orchestras", thus developing a predictive model of what the database is likely to contain.



Google is actually playing catch up to Deep Web research. Founded by Human Genome Project scientists, Deep Dyve promises to search 99% of the hits that other search engines aren't capable of picking up. Other search engines typically return pages based on their popularity and how easy they are to find. If there is content hidden behind firewalls, or content is not linked to enough sites to gain a certain page rank rating, then a page might not turn up in a common Google search. This is where Deep Dyve comes in.



As other major search engines start trying to incorporate Deep Web content in their search results, businesses will be able to utilize the data and search engines in a different manner. Data integration would become much more prominent.



The Deep Web and search engine's utilization of it, has the real potential to change the face of the Internet.