To search, Click below search items.


All Published Papers Search Service


Digital Library of Online PDF Sources: An ETL Approach


Gohar Zaman, Hairulnizam Mahdin, Khalid Hussain, Atta-ur-Rahman, Nehad Ibrahim, and Noor Zuraidin Mohd Safar


Vol. 20  No. 11  pp. 172-181


It is evident from day to day web usage experience that a huge number of PDF sources have been uploaded on daily basis. For example, there are several scientific societies that publish volumes of articles and periodicals like IEEE, ACM, Elsevier, and Springer etc. Most of these resources are unstructured or semi-structured that makes it difficult to search and retrieve information. In this paper, an effective model for digital library creation is proposed which is originally motivated by an automated ontological information extraction framework (OFIE). The framework takes a PDF published paper, extracts its structural information like title, authors, abstract, funding information, table of contents, references etc. with the help of fuzzy rule-based system (FRBS) and word sense disambiguation (WSD) approach. Consequently, this extracted information is converted to RDF triples. The proposed scheme takes this extracted information and converts into a digital library stored in MS-SQL databased by Extract, Transform and Load (ETL) process. This digital library can be an institute’s library or an individual scholar’s library who is interested in synthesizing his downloaded PDF files for better search and retrieve purposes. Moreover, by using the SQL queries based front-end design, the information can be searched, retrieved, and exported in the form of reports.


Ontology, Digital Library, ETL, SQL, RDF, OFIE, FRBS