CPL - Chalmers Publication Library
| Utbildning | Forskning | Styrkeområden | Om Chalmers | In English In English Ej inloggad.

Homogeneous string segmentation using trees and weighted independent sets

Peter Damaschke (Institutionen för data- och informationsteknik, Datavetenskap, Algoritmer (Chalmers) ; Institutionen för data- och informationsteknik, Datavetenskap, Bioinformatik (Chalmers))
Algorithmica (0178-4617). Vol. 57 (2010), 4, p. 621-640.
[Artikel, refereegranskad vetenskaplig]

We divide a string into k segments, each with only one sort of symbols, so as to minimize the total number of exceptions. Motivations come from machine learning and data mining. For binary strings we develop a linear-time algorithm for any k. Key to efficiency is a special-purpose data structure, called W-tree, which reflects relations between repetition lengths of symbols. For non-binary strings we give a nontrivial dynamic programming algorithm. Our problem is equivalent to finding weighted independent sets with certain size constraints, either in paths (binary case) or special interval graphs (general case). We also show that this problem is FPT in bounded-degree graphs.

Nyckelord: segmentation, dynamic programming, tree computations, weighted independent set, interval graphs, parameterized complexity



Denna post skapades 2010-04-29. Senast ändrad 2015-02-11.
CPL Pubid: 120832

 

Läs direkt!


Länk till annan sajt (kan kräva inloggning)