Skip to content

Conversation

@JJJYmmm
Copy link
Contributor

@JJJYmmm JJJYmmm commented Jan 9, 2026

What does this PR do?

Fix the loading of qwen3vlmoe experts.

test script:

from transformers import AutoProcessor, AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3-VL-30B-A3B-Instruct", torch_dtype="auto", device_map="auto")

before:

model.language_model.layers.{0...47}.mlp.experts.down_proj    | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 768, 2048]) vs model:torch.Size([128, 2048, 768])  
model.language_model.layers.{0...47}.mlp.experts.gate_up_proj | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 2048, 1536]) vs model:torch.Size([128, 1536, 2048])

The reason is that the official checkpoint of Qwen3VLMoe follows the shape [num_experts, out_features, in_features], but in the latest code, these weights are transposed in the last two dimensions. This pr transposes them back during conversion.

The model now loads successfully after the fix. 🫡

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43201&sha=a9b5dc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant