To search, Click
below search items.
|
|

All
Published Papers Search Service
|
Title
|
Extracting Content for News Web Pages based on DOM
|
Author
|
Hua Geng, Qiang Gao, Jingui Pan
|
Citation |
Vol. 7 No. 2 pp. 124-129
|
Abstract
|
Nowadays, RSS is becoming a hot topic for Web applications. A lot of famous Web sites have provided RSS for users. However, making RSS files manually is boring, and so far, most sites haven¡¯t provided such a service. In this paper, we mainly describe the design, implementation and evaluation of HTML2RSS, a system to extract content from HTML Web pages based on DOM structure, and generate RSS files automatically with the extracted content. We introduce two algorithms to extract information from semi-structured Web data. The goal of HTML2RSS is to provide users with RSS files as a substitute of the HTML pages.
|
Keywords
|
Web information extracting, DOM, XML, time pattern, RSS
|
URL
|
http://paper.ijcsns.org/07_book/200702/200702A17.pdf
|
|