Discovery of Potential Topics from Blog Articles by Machine Learning
Abstract
This paper presents a method for potential topic discovery from blogsphere. We define a potential topic as an unpopular phrase that has potential to become a hot topic. To discover potential topics, this method builds a classifier to detect potentiality of a topic from topic frequency transitions in blog articles. First, this method extracts candidates of potential topics from categorized blog articles because categorization enables us to extract specialists. To extract potential topics from the candidates, a classifier for detecting potential topics is built from topic frequency transition data. For this learning, we propose two types of learning methods: supervised learning and semi-supervised learning. Though supervised learning provides more precise results, it requires enormous size of labeled data. Creating labeled data is costly and difficult. On the other hands, semi-supervised learning can build classifier from small size of labeled data and a lot of unlabeled data. Experimental results with real blog data show the effectiveness of the proposed method.
Keywords
Web Mining; Machine Learning; Blog