Rule-based topic mining to assist user-centered visual exploration of document collections.
Título da Revista
ISSN da Revista
Título de Volume
We propose a three step iterative and interactive visual text mining process to assist users in exploring document collections. In the proposed approach (i) topics are automatically extracted from a document collection , (ii) users explore a similarity-based document map and its related topics, while refining a topic list, and (iii) map quality itself and topic list definition can both be improved based on user interaction. A selective and sequential covering association rule induction strategy is employed to extract the topics. In this strategy, association rules are sequentially induced from selected (manually or automatically) groupings in the similarity-based document maps. Resulting topics are displayed on a Topic Tree control window that assists users in exploring the collection by (i) identifying documents related to specific topics in the map, (ii) removing uninteresting documents from the map, based on their topics, (iii) comparing related topics and documents, (iv) extracting new topics from user selected map regions or from the entire map, (v) building derived maps, and, (vi) eventually exporting sets of labeled documents. Derived maps inherit the previous topic definitions, while benefiting from the removal of undesired documents and, optionally, from the use of terms descriptive of relevant topics to compute document similarity. We present two case studies – on an online news corpus and on a collection of scientific papers – to illustrate our process and its suitability to explore document collections.