To search, Click
below search items.
|
|

All
Published Papers Search Service
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|
Title
|
Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
|
Author
|
Jaffar Atwan
|
Citation |
Vol. 22 No. 7 pp. 65-74
|
Abstract
|
In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf¡¯s law, and Combined Stop-list. An experiment was conducted using a selected file from the Ara- bic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.
|
Keywords
|
Arabic, Normalization, Preprocessing, StopWords, Zipf¡¯s law 2012 ACM Computing Classification System: Computing methodologies, Artificial intelligence, Natural language processing, ACM Computing Classification System: Computing methodologies, Artificial
|
URL
|
http://paper.ijcsns.org/07_book/202207/20220709.pdf
|

|
|