Google rarely include scanning document in their search results page, because it can not determine the nature of the content. However, this situation will change. Google said it will use optical character recognition (OCR) software to enable Web surfers to search any PDF files which are developed by Adobe.
Evin Levey, Google product manager said that Google will use this technology to scan document into a text file. This will search for documents, index and return the answers to Google search queries.
Google's such a OCR application is expected to help Google Book Search. This is an ambitious and controversial plan releaseed by Google at the Frankfurt Book Fair 2004. Since then, Google scans 3000 books/day from the world's major libraries.
Although the plan was raised copyright concerns firstly. However, Google has reached a settlement with the Authors Guild and the Association of American Publishers on this issue. According to the agreement, in the United States, Google will be able to expand online access to millions of copyrighted books and other written content. The agreement to resolve a number of legal issues which challenge Google on the search and display the content of the copyrighted books. Google does not need copyright owners' approval and share digital versions of books with the libraries.
At present, due to the content of the network continues to multiply grow, the current search engine technology based on the nature of the text is clearly inadequate. This is because the current generation of search engines can only find multimedia files with notes text. Notes text is a laborious and time-consuming process. Content producers often overlook this issue.
David Wadhwani, vice president of Adobe explained that they are initially cooperate with the Google and Yahoo to significantly improve the multimedia content search. They intend to expand the such applications so that all the publishers, developers and users benefit.
Originally Posted: China Business Daily
Author: Angulo Fu