Learning Similarity Functions for Topic Detection in Online Reputation Monitoring

Data, code and results used for the SIGIR'14 paper by Damiano Spina, Julio Gonzalo and Enrique Amigó

View the Project on GitHub damiano/learning-similarity-functions-ORM

Poster at SMIR 2014

Abstract

Reputation management experts have to monitor--among others--Twitter constantly and decide, at any given time, what is being said about the entity of interest (a company, organization, personality...). Solving this reputation monitoring problem automatically as a topic detection task is both essential--manual processing of data is either costly or prohibitive--and challenging--topics of interest for reputation monitoring are usually fine-grained and suffer from data sparsity.

We focus on a solution for the problem that (i) learns a pairwise tweet similarity function from previously annotated data, using all kinds of content-based and Twitter-based features; (ii) applies a clustering algorithm on the previously learned similarity function. Our experiments indicate that (i) Twitter signals can be used to improve the topic detection process with respect to using content signals only; (ii) learning a similarity function is a flexible and efficient way of introducing supervision in the topic detection clustering process. The performance of our best system is substantially better than state-of-the-art approaches and gets close to the inter-annotator agreement rate. A detailed qualitative inspection of the data further reveals two types of topics detected by reputation experts: reputation alerts / issues (which usually spike in time) and organizational topics (which are usually stable across time).

Go to the code on GitHub.

Citation

Please cite the article below if you use this resource in your research:

D.Spina, J.Gonzalo, E. Amigó
Learning Similarity Functions for Topic Detection in Online Reputation Monitoring
Proceedings of 37th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 2014.

BibTex

@InProceedings{spina2014learning,  
    title={Learning Similarity Functions for Topic Detection in Online Reputation Monitoring},  
    author={Spina, Damiano and Gonzalo, Julio and Amig{\'o}, Enrique},  
    booktitle={SIGIR '14: 37th international ACM SIGIR Conference on Research and Development in Information Retrieval},  
    year={2014},  
    organization={ACM}  
    }