Please use this identifier to cite or link to this item: https://rda.sliit.lk/handle/123456789/2011
Title: Dynamic stopword removal for Sinhala Language
Authors: Jayaweera, A. A. V. A
Senanayake, Y. N
Haddela, P. S
Keywords: Sinhala Language
Dynamic
Stopword Removal
Issue Date: 8-Oct-2019
Publisher: IEEE
Citation: A. A. V. A. Jayaweera, Y. N. Senanayake and P. S. Haddela, "Dynamic Stopword Removal for Sinhala Language," 2019 National Information Technology Conference (NITC), 2019, pp. 1-6, doi: 10.1109/NITC48475.2019.9114476.
Series/Report no.: 2019 National Information Technology Conference (NITC);Pages 1-6
Abstract: In the modern era of information retrieval, text summarization, text analytics, extraction of redundant (noise) words that contain a little information with low or no semantic meaning must be filtered out. Such words are known as stopwords. There are more than 40 languages which have identified their language specific stopwords. Most researchers use various techniques to identify their language specific stopword lists. But most of them try to define a magical cut-off point to the list, which they identify without any proof. In this research, the focus is to prove that the cut-off point depends on the source data and the machine learning algorithm, which will be proved by using Newton's iteration method of root finding algorithm. To achieve this, the research focuses on creating a stopword list for Sinhala language using the term frequency-based method by processing more than 90000 Sinhala documents. This paper presents the results received and new datasets prepared for text preprocessing.
URI: http://rda.sliit.lk/handle/123456789/2011
ISSN: 2279-3895
Appears in Collections:Department of Information Technology-Scopes
Research Papers - IEEE
Research Papers - SLIIT Staff Publications
Research Publications -Dept of Information Technology

Files in This Item:
File Description SizeFormat 
Dynamic_Stopword_Removal_for_Sinhala_Language.pdf
  Until 2050-12-31
341.13 kBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.