Skip to content

Conversation

@cloudhan
Copy link
Contributor

@cloudhan cloudhan commented Jan 10, 2026

What does this PR do?

This PR enable model forward to record optional outputs at specified layers. This will be particularly useful for large model with long context when explorering the aesthetics of the attention maps to design sparse attention. Without it, with moderate size model (say 7B), it can easily OOM with only 1k level context.

outputs = model.forward(input_ids, output_hidden_states=10, output_attentions=[10])

now it only keeps outputs.attentions[10], outputs of other layers are set to None to save memory.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@cloudhan cloudhan force-pushed the record-specified-layers-only branch from 678fc00 to 9f9956c Compare January 10, 2026 15:17
@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43213&sha=9f9956

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant