To go where Google has not gone before: The promise of a new search engine

Posted by Samantha Rose Hunt

Chicago (IL) – The developers of a research-focused search engine claim that the technology is capable of reaching depths of the web that are inaccessible to Google.

Founded by Human Genome Project scientists, DeepDyve promises to search 99% of the hits that other search engines aren’t capable of picking up. Other search engines typically return pages based on their popularity and how easy they are to find. If there is content hidden behind firewalls, or content is not linked to enough sites to gain a certain page rank rating, then a page might not turn up in a common Google search. This is where DeepDyve comes in.

DeepDyve utilizes techniques used in genomics that are designed to identify DNA strands using pattern and symbol matching. The developers of the search engine claim that this approach will help individuals in searching for information that they know must be out there and they just can’t put their fingers on it.

The company’s technology relies on an algorithm called “KeyPhrases”. KeyPhrases is described to be capable of indexing passages up to 20 words in length, and not just individual keywords. Because the technology was designed to identify long and complex DNA sequences, there isn’t a need for semantics.

The most interesting feature of DeepDyve is that it is capable of basing a search on a large amount of text, or even an entire article up to 25,000 characters, whereas Google only allows users to search for up to 32 words.

DeepDyve can scan entire segments of text looking for segments. It will then order them, and find the article that is most relevant to whatever it is you are looking for.

UC Berkeley conducted a study in 2003 of the “deep web”. In that study, Hal Varian found that the deep web contains almost 91,000 TB of information and only 167 TB is actually revealed on the surface.

DeepDyve claims that it currently indexes almost 500 million pages and has entered into partnerships with numerous publications for free access to their content. While the service currently focuses on topics such as health, life sciences, and patents, the company said it intends to expand into IT, clean technology, energy and physical sciences.