Mass Editing Memory in a Transformer

An Analysis of "Mass Editing Memory in a Transformer"

MEMIT introduces a scalable approach to edit factual knowledge in LLMs. Through causal analysis it identifies a range of early MLP layers in transformers that have a causal effect on fact recall, shedding light on the internal workings of LLMs. Leveraging the associative memory property of MLPs, it directly updates the weights of the MLPs in the causal range, minimizing undesirable changes to the network.

Biography

Kevin Meng

Kevin Meng

Citations: 353, HI: 5, MSCS MIT, Cofounder of Ember Labs

David Meng

David Bau

Citations: 16,000+, HI: 38, MS CS Cornell, PHD CS MIT, Worked at Google, Microsoft, Professor NEU (currently)

Alex Andonian

Alex Andonina

Citations: 2800, HI: 15, MSCS MIT, Currently pursuing PHD in Electrical Engineering & Computer Science MIT

Yonatan Belinkov

Yonatan Belinkov

Citations: 8100+, HI: 43, PHD in Electrical Engineering & Computer Science MIT, Faculty at Technion Taub Faculty of CS

Arnab Sen Sharma

Arnab Sen Sharma

Citations: 130, HI: 5, Phd in CS Northeastern University

Literature Review

Large Language Models as Knowledge Bases

Due to the large volumes of data that LLMs are trained on, they store the factual information within their parameters, which are recalled when the associated information is presented. As a result, LLMs are often viewed as knowledge bases [1] capable of retrieving relevant knowledge on request. However, unlike traditional knowledge bases like relational databases, the knowledge present is not explicit and cannot be directly located and updated. Therefore, developing tools to interface with information present in such abstract systems is a critical and exciting field of research.

Causal Analysis

Causal analysis [2] are methodologies that aim to identify relationships where one variable directly or indirectly affects another. It goes beyond mere correlation, seeking to establish a cause-and-effect link. Particularly, causal mediation analysis identifies an intermediary variable (mediator) that transmits the effect of one variable on another in a path [3]. Such approaches are traditionally common in fields like social sciences and clinical studies. Recently, with neural networks becoming increasingly complex and opaque, causal analysis has become a powerful toolkit for dissecting and understanding neural networks.

Model Editor Networks with Gradient Decomposition (MEND)

MEND [4] learns to transform raw fine-tuning gradients into more targeted parameter updates. It uses an edit training set to produce a collection of model editor networks which apply edits at the test time. Specifically, it decomposes really large gradients to lower-rank decomposition which allows the model editor networks to efficiently update parameters in a localized way.

Rank-one Model Editing (ROME)

ROME [5] paper analyzes the storage and recall of factual associations in transformer language models. First, the paper develops a causal intervention that identifies neuron activations that are decisive in the model's factual predictions. This reveals that middle layer feed-forward modules play an important role because they control factual predictions when processing subject tokens. To test this, the paper modifies feed forward weights to update specific factual associations using Rank-One Model Editing (ROME). Each fact is represented as a knowledge tuple t = (s, r, o) containing the subject s, object o, and relation r connecting the two. In Figure 1a below, the subject is “The Space Need le”, relation is “is in downtown”, and object is “Seattle”. Figure 1a is a clean run where the factual prompt x is passed into G (autoregressive transformer language model) outputting Seattle. In Figure 1b, the baseline corrupted run, the subject token (“The* Space* Need* Le*”) is a corrupted embedding and the first hidden state of all the tokens have added noise resulting in an output object that is incorrect and most probably not "Seattle". In a third run, corrupted-with-restoration-run, G runs computations on the noisy embedding same as the corrupted run except for some token and layer. For some token and layer, a clean hidden state is presented (purple arrow from Fig 1a to Fig 1b) that shows the clean state in 1b and flow of information. The ability of few clean states to recover the fact “Seattle” even though many states are corrupted can show each state’s contribution towards a correct factual prediction. Specifically, Figure 1g shows that attention plays an effect during later layers in the last token “downtown” before predicting “Seattle”. This seems obvious that the token before prediction would be significant, but what is interesting is Figure 1f. This shows in order to predict “Seattle”, the last subject token is important and in the earlier hidden layers for the MLP.

Based on this analysis and measuring the average indirect effect of MLP’s (seen below in diagram for MEMIT), the next idea of the paper is whether it is possible to change a factual association. This can be seen in the figure below that associates the subject “The Space Needle” with the new object o* (“Paris”). To create this new factual association, ROME modifies the weights of the already trained MLP in the middle layer of the last subject token “le”. More specifically, ROME inserts a new (k*, v*) association where the key k* is determined by the subject and value v* is optimized to select the new object o* (in this case Paris). As can be seen in the diagram below, in order to write the new value vector v* into the layer, a rank one update is calculated ( W-hat = W + Λ(C−1k∗)T) resulting in W-hat k* = v*.

MEMIT

MEMIT builds on the observations from the causal analysis in ROME. Through causal mediation analysis observed that the middle MLP layers play a central role in factual recall. In the figure below, we see that severing the MLP layers in the middle layers results in a significant drop in the causal effect of the hidden state on the output, indicating the significance of the MLP layers in recalling memory.


Further, it generalizes its update method to work with a batch of edits instead of a single edit. The generalized method, which relies on the classical view of linear layers as associative memory, still uses a closed-form solution to update the weights of the desired MLP modules in the network. By explicitly presenting the subject and the desired fact to be edited, the method can directly update the factual information represented by the MLP layers with minimal changes to the network, preventing any undesirable effects on the model’s performance. The figure below illustrates the perspective of MLPs as associative memory.


Moreover, unlike ROME which performs all the updates on a single MLP layer, MEMIT spreads it across a range of causal MLP layers in the transformer model. This is especially important to allow large-scale edits due to the observation that multiple updates on a single layer can result in a bottleneck MLP layer that degrades the performance of the overall model. The figure below illustrates the update process.

As a result of these systemic updates, MEMIT scales (near-linearly) significantly better with the number of edits compared to other methods for knowledge-editing. The scaling curves under various performance metrics are shown below.

MEMIT consistently outperforms other methods across various performance metrics on zsRE dataset, and on several metrics on the COUNTERFACT dataset.

Demo

We run the demo code interface provided by the authors of MEMIT to experiment with the editing approach on the GPT-J 6B model. We run the tests on a A100 GPU. The approach performs quite well. In the example shown in the figure, we perform the following edit provided in the example:

The resulting edit is quite accurate and generalizes across various forms of prompts that query for the same information. For example, the figures below show prompts that are different from the original edit prompt structure. We see that the generated text is coherent and adheres to the knowledge edit we made to the model. However, as seen in the 3rd figure, this sometimes enables the LLM to produce fictitious information.

However, the edits are not symmetric. i.e, when the subject and object is flipped in the original query, the generator does not reflect the performed edit (see figure below). A separate edit has to be requested to make this change.

We also noticed that the edits are less succesful when the requested edit is conceptually close to the original fact. For example, for the edit "Jaguar is known for a luxury watch brand", the generation is sensitive to the prompt used and only works with the exact same prompt used while performing the edit as shown below.

Social Impact

Positive Impact

Tackle Misinformation

Misinformation on the internet is one of the biggest problems in our information era. With public data from the internet being the primary source of training data for LLMs, they are no exception to the issue. Knowledge-editing methods provide us a way to tackle this issue by allowing

Customization for Specific Needs

Knowledge editing allows for the customization of models for specific applications or user groups. For instance, an LLM can be tailored to provide more accurate and relevant information for educational purposes or for users in a particular geographical region. Moreover, it can also be used to keep the LLMs relevant by dynamicaly updating content with the latest information.

Negative Impact

Malicious Inserts

One can insert false information into the model when it was previously present in the data that it was built on. While previously, the challenge of expensive re-training of models made it difficult to alter the information stored and presented by an LLM, these types of approaches make it significantly easier for anyone who can gain access to the models to make inexpensive edits that can have a harmful effect on society.

Industry Application

Crowdsourcing specific factual edits instead of re-training models with latest data. In the example below, we see that ChatGPT service is tied to the information available in its training data which often does not reflect the current events.

Correct misinformation in public models. Since LLMs are generally trained on large amounts of user-generated data from the internet, it is possible that they learn certain misinformation or misrepresent facts. This can be harmful especially for models that are available for public use. Knowledge-editing can be very useful in these circumstances.

Content localization. Companies operating in multiple regions can use knowledge-editing to localize content, ensuring that it is culturally relevant and accurate for each market.

Academic Research

Knowledge-editing in Generative Models

Large foundation models that generate images from descriptive text, also share the same limitations as LLMs when it comes to representing and generating factual information. The observations from methods such as ROME and MEMIT could also be transferred to these models that often use a transformer as a text encoder. Such methods already exist like “Unified Concept Editing” that generalize the concepts that govern MEMIT to generative models, to effectively edit bias, copyright, and offensive content.

Adaptation for Low-resource Languages

Due to the limited information present in data sources of low-resource languages, LLMs might often perform poorly especially with tasks related to knowledge. Since methods like ROME and MEMIT provide a way to isolate knowledge in LLMs. Principles of the method can be used to adapt high-resource languages to improve the performance of low-resource language models.

Peer-Review

Surya Krishnamurthy

Summary

MEMIT introduces a scalable approach to edit factual knowledge in LLMs. Through causal analysis it identifies a range of early MLP layers in transformers that have a causal effect on fact recall. By spreading updates to the weights across these MLP layers, it is able to effectively edit memory in LLMs.

Strengths

  • Through clever experimental design, the work identifies aspects of the network that enable fact recall, and develops a concise method to edit facts in an informed way, instead of treating it as a broad optimization problem.
  • The work addresses issues with excessive knowledge edits and proposes a method that performs well as the number of edits increases.

Weaknesses

  • Methods are implemented only for GPT-J and GPT-NeoX models. Results on transformer models that are different from the GPT family of models, especially the recent open-source LLMs are not included.
  • The knowledge representation studied in the approach is limited to directional relationships.
  • While the weight updates are performed in batch, the calculation of key and residuals of the layers for each edit must be done iteratively.

Questions

  • The paper builds on the Black et al. (2021) reformulation of the transformer model, which is specific to the implementations used in this paper. Does the proposed method only work with models of this form?
  • The paper uses the same symbol r for the relation tokens and the residuals. This can get confusing to the reader especially in sections like Algorithm 1. An alternate symbol could be used for the residuals for clarity.
  • Is the notation used for the key matrix K in Algorithm 1 correct? The definition provided in 4.3.ii should probably be used.

Limitations

Currently the model limits itself to facts of the form (subject, relation, object), which limits the scope of the nature of facts that can be edited. Experiments are also limited to GPT-J and GPT-NeoX models; it is not clear if the observations generalize to different architectures.

Scores

Soundness 3
Presentation 3
Contribution 4
Overall 7
Confidence 4

Karan Shah

Summary

The paper "Mass-Editing Memory in a Transformer (MEMIT)" proposes a novel method for updating large language models with new memories, scaling up to thousands of edits. MEMIT directly manipulates layer parameters of transformers, offering substantial improvements over previous methods in terms of specificity, generalization, and fluency.

Strengths

  • Scalability: MEMIT successfully scales to much larger sets of edits compared to existing approaches, handling thousands of memory updates effectively.
  • Performance on Various Metrics: Demonstrates excellent performance across multiple evaluation metrics, including efficacy, paraphrase, specificity, and fluency, outperforming baseline methods significantly.

Weaknesses

  • Complexity of Implementation: The methodology and algorithms involved are complex, which might pose challenges in practical applications or adaptations.
  • Limited Scope of Knowledge Representation: The method focuses predominantly on directional (subject-relation-object) relations, potentially limiting its applicability for more complex knowledge representations.

Questions

  • How does MEMIT handle conflicting or contradictory memory edits? Is there potential for further optimization in the execution time of MEMIT, particularly in large-scale applications?
  • Could MEMIT be adapted for other types of language models beyond transformers?

Limitations

A notable limitation is its focus on directional relations, which may not cover comprehensive knowledge types like spatial/temporal reasoning or procedural knowledge.

Scores

Soundness 4
Presentation 4
Contribution 4
Overall 8
Confidence 4

References

[1] Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander Miller. Language models as knowledge bases? Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing

[2] Pearl, J. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X.

[3] Vig, J., Gehrmann, S., Belinkov, Y., Qian, S., Nevo, D., Singer, Y., and Shieber, S. M. Investigating gender bias in language models using causal mediation analysis. In NeurIPS, 2020b.

[4] Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D. Manning. Fast model editing at scale, 2021.

[5] Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems, 35, 2022.

[6] Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229 (2022).

Team Members

Surya Krishnamurthy

Karan Shah