New search engine indexes and ranks table content

Posted by Wolfgang Gruener

A new search engine announced by Penn State researchers is focusing on content that is published in tables, table titles, references and footnotes.

According to the developers, "TableSeer" is the first scholar-targeted search engine that allows users to extract information from HTML or PDF documents that is packed into tables and charts – a feature that is available in a somewhat limited form today: The new search engine is described to be able to gather data from tables across documents, which means that users do not have to manually browse documents in order to find tables.

The researchers said that TableSeer was able to correctly identify and retrieve 93.5% of tables created in text-based test documents.

TableSeer can be tested online. The source code will be made available near the completion of the project, the developers said.