An account of the challenge of tagging a reference corpus of Brazilian Portuguese.


This article identifies and addresses the major issues faced in the manual morphosyntactic annotation of a huge corpus, named MAC-Morpho, a Brazilian Portuguese corpus of newspaper articles in the Lacio-Web Project. Rather than simply presenting the annotated corpus and describing its tagset, we elaborate on the criteria for establishing the tagset, make an account of how the annotation process was designed and conducted, including the results of the inter-annotator agreement evaluation for MAC-Morpho, and analyze some interesting cases amongst the linguistic problems we faced in this work.

