SiSPI: a short-passage clustering system.
Carregando...
Data
2008-01
Autores
Nunes, Maria das Graças Volpe
Título da Revista
ISSN da Revista
Título de Volume
Editor
Resumo
We describe SiSPI, a clustering tool based on an unsupervised and incremental approach which aims at arranging short passages from one or multiple documents written in Brazilian Portuguese into clusters. In order to identify similar passages, SiSPI makes use of a statistical model, named TF-ISF (Term Frequency - Inverse Sentence Frequency). By grouping similar passages into the same cluster, SiSPI enables a subsequent alignment/fusion component to transform each cluster into a single sentence by fusing common information. We present a pilot experiment which evaluates the system performance in the news domain. The results obtained suggest that SiSPI has potential.
Descrição
Palavras-chave
Linguistica computacional, Clusters