Please use this identifier to cite or link to this item:
https://rda.sliit.lk/handle/123456789/2011
Title: | Dynamic stopword removal for Sinhala Language |
Authors: | Jayaweera, A. A. V. A Senanayake, Y. N Haddela, P. S |
Keywords: | Sinhala Language Dynamic Stopword Removal |
Issue Date: | 8-Oct-2019 |
Publisher: | IEEE |
Citation: | A. A. V. A. Jayaweera, Y. N. Senanayake and P. S. Haddela, "Dynamic Stopword Removal for Sinhala Language," 2019 National Information Technology Conference (NITC), 2019, pp. 1-6, doi: 10.1109/NITC48475.2019.9114476. |
Series/Report no.: | 2019 National Information Technology Conference (NITC);Pages 1-6 |
Abstract: | In the modern era of information retrieval, text summarization, text analytics, extraction of redundant (noise) words that contain a little information with low or no semantic meaning must be filtered out. Such words are known as stopwords. There are more than 40 languages which have identified their language specific stopwords. Most researchers use various techniques to identify their language specific stopword lists. But most of them try to define a magical cut-off point to the list, which they identify without any proof. In this research, the focus is to prove that the cut-off point depends on the source data and the machine learning algorithm, which will be proved by using Newton's iteration method of root finding algorithm. To achieve this, the research focuses on creating a stopword list for Sinhala language using the term frequency-based method by processing more than 90000 Sinhala documents. This paper presents the results received and new datasets prepared for text preprocessing. |
URI: | http://rda.sliit.lk/handle/123456789/2011 |
ISSN: | 2279-3895 |
Appears in Collections: | Department of Information Technology-Scopes Research Papers - IEEE Research Papers - SLIIT Staff Publications Research Publications -Dept of Information Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Dynamic_Stopword_Removal_for_Sinhala_Language.pdf Until 2050-12-31 | 341.13 kB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.