An account of the challenge of tagging a reference corpus of Brazilian Portuguese.

Resumo

This article identifies and addresses the major issues faced in the manual morphosyntactic annotation of a huge corpus, named MACMorpho, a Brazilian Portuguese corpus of newspaper articles in the Lacie-Web Project. Rather than simply presenting the annotated corpus and describing its tagset, we elaborate on the criteria for establishing the tagset, make an account of how the annotation process was designed and conducted, including the results of the inter-annotator agreement evaluation for MAC-Morpho, and analyze some interesting cases amongst the linguistic problems we faced in this work.

Descrição
Palavras-chave
Inteligência artificial
Citação