CPL - Chalmers Publication Library
| Utbildning | Forskning | Styrkeområden | Om Chalmers | In English In English Ej inloggad.

Multi-Document Summarization and Semantic Relatedness

Olof Mogren (Institutionen för data- och informationsteknik, Datavetenskap, Algoritmer (Chalmers))
Göteborg : Chalmers University of Technology, 2015.
[Licentiatavhandling]

Automatic summarization is the process of presenting the contents of written documents in a short, comprehensive fashion. Many approaches have been proposed for this problem, some of which extract content from the input documents (extractive methods), and others that generate the language in the summary based on some representation of the document contents (abstractive methods). This thesis is concerned with extractive summarization in the multi-document setting, and we define the problem as choosing the most informative sentences from the input documents, while minimizing the redundancy in the summary. This definition calls for a way of measuring the similarity between sentences that captures as much as possible of the meaning. We present novel ways of measuring the similarity between sentences, based on neural word embeddings and sentiment analysis. We also show that combining multiple sentence similarity scores, by multiplicative aggregation, helps in the process of creating better extractive summaries. We also discuss the use of information extraction for improving the quality of automatic summarization by providing ways of assessing the salience of information elements, as well as helping with the fluency of the output and providing the temporal dimension. Furthermore, we present graph-based algorithms for clustering words by co-occurrence, and for summarizing short online user-reviews by computing bicliques. The biclique algorithm provides a fast, simple algorithm for summarization in many e-commerce settings.

Nyckelord: automatic summarization, semantic relatedness, semantic similarity, multi-document summarization



Den här publikationen ingår i följande styrkeområden:

Läs mer om Chalmers styrkeområden  

Denna post skapades 2015-10-23. Senast ändrad 2015-11-06.
CPL Pubid: 224749

 

Läs direkt!


Länk till annan sajt (kan kräva inloggning)


Institutioner (Chalmers)

Institutionen för data- och informationsteknik, Datavetenskap, Algoritmer (Chalmers)

Ämnesområden

Informations- och kommunikationsteknik
Data- och informationsvetenskap
Datalogi

Chalmers infrastruktur

Examination

Datum: 2015-11-20
Tid: 10:00
Lokal: ML2, Hörsalsvägen 7B, Chalmers University of Technology
Opponent: Tapani Raiko, Assistant Professor, Department of Computer Science, Aalto University

Ingår i serie

Technical report L - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University 1652