CPL - Chalmers Publication Library
| Utbildning | Forskning | Styrkeområden | Om Chalmers | In English In English Ej inloggad.

Entity disambiguation in anonymized graphs using graph kernels

Linus Hermansson (Institutionen för data- och informationsteknik (Chalmers)) ; Tommi Kerola (Institutionen för data- och informationsteknik (Chalmers)) ; Fredrik Johansson (Institutionen för data- och informationsteknik, Datavetenskap (Chalmers)) ; Vinay Jethava (Institutionen för data- och informationsteknik, Datavetenskap (Chalmers)) ; Devdatt Dubhashi (Institutionen för data- och informationsteknik, Datavetenskap (Chalmers))
22nd ACM International Conference on Information and Knowledge Management, CIKM 2013; San Francisco, CA; United States; 27 October 2013 through 1 November 2013 p. 1037-1046. (2013)
[Konferensbidrag, refereegranskat]

This paper presents a novel method for entity disambiguation in anonymized graphs using local neighborhood structure. Most existing approaches leverage node information, which might not be available in several contexts due to privacy concerns, or information about the sources of the data. We consider this problem in the supervised setting where we are provided only with a base graph and a set of nodes labelled as ambiguous or unambiguous. We characterize the similarity between two nodes based on their local neighborhood structure using graph kernels; and solve the resulting classification task using SVMs. We give empirical evidence on two real-world datasets, comparing our approach to a state-of-the-art method, highlighting the advantages of our approach. We show that using less information, our method is significantly better in terms of either speed or accuracy or both. We also present extensions of two existing graphs kernels, namely, the direct product kernel and the shortest-path kernel, with significant improvements in accuracy. For the direct product kernel, our extension also provides significant computational benefits. Moreover, we design and implement the algorithms of our method to work in a distributed fashion using the GraphLab framework, ensuring high scalability.

Nyckelord: Entity disambiguation , Entity resolution , Graph kernels , Support vector machines



Denna post skapades 2014-01-07. Senast ändrad 2016-07-13.
CPL Pubid: 191418

 

Läs direkt!


Länk till annan sajt (kan kräva inloggning)