SiSPI: a short-passage clustering system.
Título da Revista
ISSN da Revista
Título de Volume
We describe SiSPI, a clustering tool based on an unsupervised and incremental approach which aims at arranging short passages from one or multiple documents written in Brazilian Portuguese into clusters. In order to identify similar passages, SiSPI makes use of a statistical model, named TF-ISF (Term Frequency - Inverse Sentence Frequency). By grouping similar passages into the same cluster, SiSPI enables a subsequent alignment/fusion component to transform each cluster into a single sentence by fusing common information. We present a pilot experiment which evaluates the system performance in the news domain. The results obtained suggest that SiSPI has potential.