Automated document classification process extracts information with a systematic analysis of the content of documents. This is an active research field of growing importance due to the large amount of electronic documents produced in the world wide web and available thanks to diffused technologies including mobile ones. Several application areas benefit from automated document classification, including document archiving, invoice processing in business environments, press releases and research engines. Current tools classify or ”tag” either text or images separately.In this paper we show how, by linking image and text-based contents together, a technology improves fundamental document management tasks like retrieving information from a database or automated documents. We present an investigation of a model of conceptual spaces for investigation using joint information sources from the text and the images forming complex documents. We present a formal model and the computable algorithms and the dataset from which we took a subset to make experiments and relative tests and results.

A Multimodal Approach to exploit similarity in documents

TOMAZZOLI, Claudio
2014-01-01

Abstract

Automated document classification process extracts information with a systematic analysis of the content of documents. This is an active research field of growing importance due to the large amount of electronic documents produced in the world wide web and available thanks to diffused technologies including mobile ones. Several application areas benefit from automated document classification, including document archiving, invoice processing in business environments, press releases and research engines. Current tools classify or ”tag” either text or images separately.In this paper we show how, by linking image and text-based contents together, a technology improves fundamental document management tasks like retrieving information from a database or automated documents. We present an investigation of a model of conceptual spaces for investigation using joint information sources from the text and the images forming complex documents. We present a formal model and the computable algorithms and the dataset from which we took a subset to make experiments and relative tests and results.
2014
978-331907454-2
document classification
taxonomy
ontology
clustering
statistical natural language processing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12607/60583
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact