Topic Modelling in Social Networks with Formal Concept Analysis
Abstract
In the age of social networks, the amount of the written material published every day exceeds our processing capacity. Topic models can help to organise and to understand extensive collections of unstructured text documents. In machine learning and natural language processing, a topic model is a statistical model for discovering the abstract ‘’topics’’ in a collection of documents, uncovering hidden semantic structures and clusters of similar words. Topic modelling formalises this idea mathematically in a framework that allows discovering the topics and each document’’s balance of topics. One of the first topic models was latent semantic indexing. Later, the probabilistic latent semantic analysis (PLSA) was presented, serving as a basis for many others. Notably, its extension latent Dirichlet allocation (LDA) is one of the most common topic model currently in use. To approach topic modelling in social networks, we use Formal Concept Analysis, a mathematical tool firmly based on lattice theory and logic. Our approach uses the knowledge contained in the concept lattice to extract the topics. Thus, this approach to topic modelling is not statistical. For example, we do not need to assume a prior distribution of terms. Instead, the actual data structure is used to infer the semantic relationships between attributes. The procedure is as follows: a formal context is built from the document-term matrix of the set of documents. Then, we use FCA tools to construct the concept lattice that contains, in each concept, knowledge about the topics in the documents. Once this lattice is built, the concepts are clustered. Concept clusters are then used to induce topic models on the original documents. An experiment with a dataset with tweets about some hashtags is conducted with our approach to show how Formal Concept Analysis can be used in Social Network Analysis. In addition, a comparison with classical techniques is being addressed.