Best sports: a portuguese collection of documents for semantics-concerned text mining research.
Título da Revista
ISSN da Revista
Título de Volume
The availability of labeled text collections is a common need in the text mining research community. These collections are used for both learning and evaluating text mining models. In this technical report, we present the BEST sports collection. This collection of documents written in Portuguese was collected, prepared, and provided to be used as benchmarking collection in text mining research. Considering real application scenarios, we created four datasets, which correspond to problems of different semantic complexity levels. The use of different datasets of the same collection allows the evaluation of text mining methods at different levels of semantic complexity.