Context Copying Modulation: The Role of Entropy Neurons in Managing Parametric and Contextual Knowledge Conflicts

Abstract

The behavior of Large Language Models (LLMs) when facing contextual information that conflicts with their internal parametric knowledge is inconsistent, with no generally accepted explanation for the expected outcome distribution. Recent work has identified in autoregressive transformer models a class of neurons -- called entropy neurons -- that produce a significant effect on the model while having an overall moderate impact on the ranking of the predicted tokens. In this paper, we investigate the preliminary claim that these neurons are involved in inhibiting context copying behavior in transformers by looking at their role in resolving conflicts between contextual and parametric information. We show that entropy neurons are responsible for suppressing context copying across a range of LLMs, and that ablating them leads to a substantial change in the generation process. These results enhance our understanding of the internal dynamics of LLMs when handling conflicting information.

Entropy Neurons Characteristics

LogitVar

This measure quantifies a neuron's direct effect on output logits variance. For a neuron $i$, it is defined as:

$$ \mathrm{LogitVar}(w_{\mathrm{out}}^{(i)}) = \textbf{Var}\left\{ \frac{ {w}^{(t)}_\mathrm{U} \cdot w_{\mathrm{out}}^{(i)} }{ ||{w}^{(t)}_{\mathrm{U}}|| \times ||w_{\mathrm{out}}^{(i)}|| } ; t \in V \right\} $$

where $V$ is the set of tokens in the vocabulary and $w_U^{(t)}$ is the $t$-th row of $W_U$.

Effective Null Space Projection (ρ)

This measure quantifies how much of a neuron's output aligns with directions that minimally impact the model's final output, forming the effective null space of the unembedding matrix $W_U$, denoted as $V_0$. For a neuron $i$, it is defined as:

$$ \rho_i = \frac{||\mathbf{V}_\mathrm{0}^\mathrm{T} w_{\mathrm{out}}^{\mathrm{(i)}}||}{||w_{\mathrm{out}}^{\mathrm{(i)}}||}. $$

Visualization of entropy neurons characteristics showing LogitVar and $\rho$ measures $\text{(Phi-1.5 model)$

Ablation Results: Entropy Neurons Inhibit Context Copying

Our ablation experiments demonstrate that entropy neurons play a crucial role in inhibiting context copying behavior. When entropy neurons are ablated, the model shows significant changes in how it handles conflicts between parametric and contextual knowledge.

Phi-1.5 ablation scores showing $\text{(a)}$ Global Transition Score distribution, $\text{(b)}$ Conversion Ratio for different knowledge sources, and $\text{(c)}$ Transition Scores between knowledge sources

Key Findings

Global Transition Score: Entropy neurons show a Q-value of 99.0%, indicating their critical role in knowledge source transitions
Conversion Ratios: Ablating entropy neurons significantly increases conversion from ND $\text{(Not Defined)}$ to CK $\text{(Contextual Knowledge)}$
Context Copying Inhibition: The highlighted transition from ND to CK $\text{(2.5% vs 0.3% for random)}$ shows entropy neurons prevent inappropriate context copying

Implications

These results provide strong evidence that entropy neurons are essential components in the model's ability to handle knowledge conflicts appropriately. Their ablation leads to increased context copying behavior, which can result in hallucinations when the contextual information conflicts with the model's parametric knowledge. This finding has important implications for understanding and improving the reliability of large language models in knowledge-intensive tasks.