Please use this identifier to cite or link to this item:
https://rda.sliit.lk/handle/123456789/2015
Title: | Document Clustering with Evolved Single Word Search Queries |
Authors: | Hirsch, L Haddela, P. S Di Nuovo, A |
Keywords: | Document Clustering Evolved Single Word Search Queries |
Issue Date: | 28-Jun-2021 |
Publisher: | IEEE |
Citation: | L. Hirsch, A. D. Nuovo and P. Haddela, "Document Clustering with Evolved Single Word Search Queries," 2021 IEEE Congress on Evolutionary Computation (CEC), 2021, pp. 280-287, doi: 10.1109/CEC45853.2021.9504770. |
Series/Report no.: | 2021 IEEE Congress on Evolutionary Computation (CEC);Pages 280-287 |
Abstract: | We present a novel, hybrid approach for clustering text databases. We use a genetic algorithm to generate and evolve a set of single word search queries in Apache Lucene format. Clusters are formed as the set of documents matching a search query. The queries are optimized to maximize the number of documents returned and to minimize the overlap between clusters (documents returned by more than one query in a set). Optionally, the number of clusters can be specified in advance, which will normally result in an improvement in performance. Not all documents in a collection are returned by any of the search queries in a set, so once the search query evolution is completed a second stage is performed whereby a KNN algorithm is applied to assign all unassigned documents to their nearest cluster. We describe the method and compare effectiveness with other well-known existing systems on 8 different text datasets. We note that search query format has the qualitative benefits of being interpretable and providing an explanation of cluster construction. |
URI: | http://rda.sliit.lk/handle/123456789/2015 |
ISBN: | 978-1-7281-8393-0 |
Appears in Collections: | Research Papers - IEEE Research Papers - SLIIT Staff Publications Research Publications -Dept of Information Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Document_Clustering_with_Evolved_Single_Word_Search_Queries.pdf Until 2050-12-31 | 2.22 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.