Interpreting Blackbox Text Classifiers with LDA-Based Topic Models
Originaly done in Oved, N., Feder, A. and Reichart, R. (2020) and later presented in EMNLP 2020’s blackbox workshop, a method for interpreting blackbox text classifiers using an LDA model is implemented. While still using a “traditional” LDA model, the implementation offers tools for interpreting the model’s predictions.
The method is based on identifying seperating topics of an LDA model trained over a dataset labeled by the explained model. The separating topics are correlated with the model’s classification confidence and may give insights on the prediction process.
Colaborators
Ellie Rosenman, Amir Feder, Nadav Oved, Roi Reichart