| 
	    
	    
 
 
                
                    | 
                            To search, Click 
                            below  search items. |  | 
   
                
                    | All 
                        Published Papers Search Service |  
    
        | Title | Extracting Content for News Web Pages based on DOM |  
        | Author | 
			Hua Geng, Qiang Gao, Jingui Pan |  
        | Citation | Vol. 7  No. 2   pp. 124-129 |  
        | Abstract | Nowadays, RSS is becoming a hot topic for Web applications. A lot of famous Web sites have provided RSS for users. However, making RSS files manually is boring, and so far, most sites haven¡¯t provided such a service. In this paper, we mainly describe the design, implementation and evaluation of HTML2RSS, a system to extract content from HTML Web pages based on DOM structure, and generate RSS files automatically with the extracted content. We introduce two algorithms to extract information from semi-structured Web data. The goal of HTML2RSS is to provide users with RSS files as a substitute of the HTML pages. |  
        | Keywords | Web information extracting, DOM, XML, time pattern, RSS |  
        | URL | http://paper.ijcsns.org/07_book/200702/200702A17.pdf |    |