An account of the challenge of tagging a reference corpus of Brazilian Portuguese.

Aluisio, Sandra Maria; Pelizzoni, Jorge Marques; Marchi, Ana Raquel; Oliveira, Lucélia Helena de; Manenti, Regiana; Marquiafável, Vanessa; Teles, Jorge

An account of the challenge of tagging a reference corpus of Brazilian Portuguese.

Arquivos

Relatório Técnico_188_2003.pdf(643.58 KB)

Data

2003-02

Autores

Aluisio, Sandra Maria

Pelizzoni, Jorge Marques

Marchi, Ana Raquel

Oliveira, Lucélia Helena de

Manenti, Regiana

Marquiafável, Vanessa

Teles, Jorge

Resumo

This article identifies and addresses the major issues faced in the manual morphosyntactic annotation of a huge corpus, named MAC-Morpho, a Brazilian Portuguese corpus of newspaper articles in the Lacio-Web Project. Rather than simply presenting the annotated corpus and describing its tagset, we elaborate on the criteria for establishing the tagset, make an account of how the annotation process was designed and conducted, including the results of the inter-annotator agreement evaluation for MAC-Morpho, and analyze some interesting cases amongst the linguistic problems we faced in this work.

Palavras-chave

Inteligência artificial

URI

http://repositorio.icmc.usp.br//handle/RIICMC/6798

Coleções

Publicações do ICMC

Página do item completo