Please use this identifier to cite or link to this item:
https://rda.sliit.lk/handle/123456789/2010
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Senanayake, S. Y | - |
dc.contributor.author | Kariyawasam, K. T. P. M | - |
dc.contributor.author | Haddela, P. S | - |
dc.date.accessioned | 2022-04-22T05:27:28Z | - |
dc.date.available | 2022-04-22T05:27:28Z | - |
dc.date.issued | 2019-10-08 | - |
dc.identifier.citation | S. Y. Senanayake, K. T. P. M. Kariyawasam and P. S. Haddela, "Enhanced Tokenizer for Sinhala Language," 2019 National Information Technology Conference (NITC), 2019, pp. 84-89, doi: 10.1109/NITC48475.2019.9114420. | en_US |
dc.identifier.issn | 2279-3895 | - |
dc.identifier.uri | http://rda.sliit.lk/handle/123456789/2010 | - |
dc.description.abstract | Tokenization process plays a prominent role in natural language processing (NLP) applications. It chops the content into the smallest meaningful units. However, there is a limited number of tokenization approaches for Sinhala language. Standard analyzer in apache software library and natural language toolkit (NLTK) are the main existing approaches to tokenize Sinhala language content. Since these are language independent, there are some limitations when it applies to Sinhala. Our proposed Sinhala tokenizer is mainly focusing on punctuation-based tokenization. It precisely tokenizes the content by identifying the use case of punctuation mark. In our research, we have proved that our punctuation-based tokenization approach outperforms the word tokenization in existing approaches. | en_US |
dc.language.iso | en | en_US |
dc.publisher | IEEE | en_US |
dc.relation.ispartofseries | 2019 National Information Technology Conference (NITC);Pages 84-89 | - |
dc.subject | Sinhala Language | en_US |
dc.subject | Enhanced | en_US |
dc.subject | Tokenizer | en_US |
dc.title | Enhanced Tokenizer for Sinhala Language | en_US |
dc.type | Article | en_US |
dc.identifier.doi | 10.1109/NITC48475.2019.9114420 | en_US |
Appears in Collections: | Department of Information Technology-Scopes Research Papers - IEEE Research Papers - SLIIT Staff Publications Research Publications -Dept of Information Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Enhanced_Tokenizer_for_Sinhala_Language.pdf Until 2050-12-31 | 452.92 kB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.