A New Approach To Focused Crawling: Combination of Text summarizing With Neural Networks and Vector Space Model
Abstract
Focused crawlers are programs designed to browse the Web and
download pages on a specific topic. They are used for answering
user queries or for building digital libraries on a topic specified
by the user. In this article we will show how summarizing of web
pages is needed for improving performance of a crawler which
uses vector space model to rank the web pages. A neural network
is trained to learn the relevant characteristics of sentences that
should be included in the summary of a web page. Then the
neural network will be used as a filter to summarize web pages.
Finally, the crawler will use vector space model to rank
summaries instead of web pages.
download pages on a specific topic. They are used for answering
user queries or for building digital libraries on a topic specified
by the user. In this article we will show how summarizing of web
pages is needed for improving performance of a crawler which
uses vector space model to rank the web pages. A neural network
is trained to learn the relevant characteristics of sentences that
should be included in the summary of a web page. Then the
neural network will be used as a filter to summarize web pages.
Finally, the crawler will use vector space model to rank
summaries instead of web pages.
Keywords
Focused Crawlers; Search Engine; Neural Network; Vector Space Mode; Text summarizing