Improvement Topic Modeling tool


TopicModelingTool is a graphical user interface (GUI) to the well-known topic modeling tool MALLET. Unfortunately, it lacks the possibility to specify a custom regular expression (RegEx) to correctly tokenize languages with accents and diacritics, making it useless for most European languages. As part of our research project, Simon Hengchen, with the help of Marijn Koolen, added such a feature in the easy-to-use GUI. The software, written in Java and freely available at the TIC download page, now allows researchers not familiar with command line tools such as the original MALLET to easily perform (Latent Dirichlet Allocation) topic modeling!