View on GitHub

Blackbox Interpretation using LDA

Interpreting blackbox text classifiers with LDA-based topic models

Interpreting Blackbox Text Classifiers with LDA-Based Topic Models

Originaly done in Oved, N., Feder, A. and Reichart, R. (2020) and later presented in EMNLP 2020’s blackbox workshop, a method for interpreting blackbox text classifiers using an LDA model is implemented. While still using a “traditional” LDA model, the implementation offers tools for interpreting the model’s predictions.

The method is based on identifying seperating topics of an LDA model trained over a dataset labeled by the explained model. The separating topics are correlated with the model’s classification confidence and may give insights on the prediction process.

Colaborators

Ellie Rosenman, Amir Feder, Nadav Oved, Roi Reichart

Links

Code

Docs

Demo