|
|
|
|
|
MB-ToT: An Effective Model for Topic Mining in Microblogs |
|
PP: 299-308 |
|
Author(s) |
|
Shaopeng Liu,
Jian Yin,
Jia Ouyang,
Piyuan Lin,
|
|
Abstract |
|
Topic mining on microblogging sites with sheer scale of instance messages and social network information, such as Twitter,
is a hard and challenging problem. Although many text mining techniques and generative probabilistic models have been developed
for static plain-text corpus, they are inclined to achieve unsatisfactory results in microblogs without considering that microblogs are
temporally sequential and concerned with social network information. In this paper, we propose a novel topic model, MicroBlog-
Topics over Time (MB-ToT), which aims for comprehensive topic analysis in microblogs. Firstly, we assume each topic is a mixture
distribution influenced by both word co-occurrences and timestamps of microblogs. This allowsMB-ToT to capture the changes of each
topic over time. Subsequently, we apply users’ intrinsic interests, social contact relations and #hashtags to improve the topic mining
result. Finally, we present a Gibbs sampling implementation for the inference of MB-ToT. We evaluate MB-ToT and compare it with
the state-of-the-art methods on a real dataset. In our experiments, MB-ToT outperforms the state-of-the-art methods by a large margin
in terms of both perplexity and KL-divergence. We also show that the quality of the generated latent topics of MB-ToT is promising. |
|
|
|
|
|