A Methodology to Identify Topic of Video via N-Gram Approach


Ramsha Pervaiz, Khalid Aloufi, Syed Shabbar Raza Zaidi and Kaleem Razzaq Malik


Vol. 20  No. 1  pp. 79-94


Keyword helps in extracting the main idea from any document. It plays an important role in information retrieval from the content of the document. Keyword extraction is the process of detecting a keyword from any document that is easily understood by the users about the content of the documents. Keyword extraction is of vital importance in natural language processing. The keyword extraction is used for information retrieval, visualization, text summarization, classification, clustering, and web searching and topic detection. There are two main classification of keyword extraction, first one is supervised learning technique in which data is trained through dataset. Second is unsupervised learning in which no need to train the data and the data is collected from statistical approach. In this research, topic is generated from the video lectures according to the content of the videos. The topic is generated from the videos in which course code is mentioned instead of topic. Those videos cause problem in understanding main idea and content of the video lecture. The user have to listen all the videos without knowing the content of the video which is the wastage of time. Using unsupervised learning, frequency of words and combination of words is counted by N-Grams. The keyword extracted from these N-grams are compared with the data set of computer terms and the topic is generated of the video.


Natural Language Processing (NLP), NLTK, N-grams, Keyword, Extraction, video lecture