Gaius

Conception et évaluation d’un nouveau modèle d’indexation de la documentation juridique

Web-based text classification in the absence of manually labeled training documents

Type de ressource
Auteurs/contributeurs
Titre
Web-based text classification in the absence of manually labeled training documents
Résumé
Most text classification techniques assume that manually labeled documents (corpora) can be easily obtained while learning text classifiers. However, labeled training documents are sometimes unavailable or inadequate even if they are available. The goal of this article is to present a self-learned approach to extract high-quality training documents from the Web when the required manually labeled documents are unavailable or of poor quality. To learn a text classifier automatically, we need only a set of user-defined categories and some highly related keywords. Extensive experiments are conducted to evaluate the performance of the proposed approach using the test set from the Reuters-21578 news data set. The experiments show that very promising results can be achieved only by using automatically extracted documents from the Web.
Publication
Journal of the American Society for Information Science and Technology
Volume
58
Numéro
1
Pages
88-96
Date
janvier 1, 2007
Abrév. de revue
J. Am. Soc. Inf. Sci.
Langue
en
ISSN
1532-2890
Titre abrégé
Web-based text classification in the absence of manually labeled training documents
Consulté le
2016-10-01 13 h 13
Catalogue de bibl.
Wiley Online Library
Référence
Hung, C.-M. et Chien, L.-F. (2007). Web-based text classification in the absence of manually labeled training documents. Journal of the American Society for Information Science and Technology, 58(1), 88‑96. https://doi.org/10.1002/asi.20442
Méthodologie