Probing Language Models on Their Knowledge Source

Authors: Z Tighidet, A Mogini, J Mei, B Piwowarski, P Gallinari
Venue: BlackBoxNLP@EMNLP2024
Year: 2024
Abstract:
This paper investigates how language models handle different sources of knowledge, particularly focusing on the interplay between parametric and contextual knowledge. We develop probing techniques to understand when models rely on their training data versus contextual information, with implications for mitigating hallucinations in language models. Our approach combines mechanistic interpretability with controlled experiments to analyze knowledge source attribution in transformer architectures.

Key Contributions

Methodology

We employ mechanistic interpretability techniques combined with controlled probing experiments to analyze how language models process and integrate different knowledge sources. Our approach includes systematic evaluation of model responses under various knowledge conflict scenarios, where parametric knowledge (from training) conflicts with contextual information.

The methodology involves designing specific probes that can distinguish between different knowledge sources, using attention analysis and activation patterns to understand the internal mechanisms of knowledge processing. We evaluate our approach on multiple transformer architectures and knowledge domains to ensure generalizability.

Key Findings

Impact

This work contributes to mechanistic interpretability research by providing insights into how language models handle knowledge conflicts. The findings have direct applications to reducing hallucinations and improving model reliability, particularly in scenarios where models must balance different sources of information.

The probing techniques developed in this work can be used by researchers and practitioners to better understand model behavior and develop more robust language models that can handle conflicting information more effectively.