[bugfix] fix qwen3vlmoe loading #43201

JJJYmmm · 2026-01-09T17:30:46Z

What does this PR do?

Fix the loading of qwen3vlmoe experts.

test script:

from transformers import AutoProcessor, AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3-VL-30B-A3B-Instruct", torch_dtype="auto", device_map="auto")

before:

model.language_model.layers.{0...47}.mlp.experts.down_proj    | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 768, 2048]) vs model:torch.Size([128, 2048, 768])  
model.language_model.layers.{0...47}.mlp.experts.gate_up_proj | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 2048, 1536]) vs model:torch.Size([128, 1536, 2048])

The reason is that the official checkpoint of Qwen3VLMoe follows the shape [num_experts, out_features, in_features], but in the latest code, these weights are transposed in the last two dimensions. This pr transposes them back during conversion.

The model now loads successfully after the fix. 🫡

github-actions · 2026-01-09T17:43:06Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43201&sha=a9b5dc

fix qwen3vlmoe loading

a9b5dc4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix] fix qwen3vlmoe loading #43201

[bugfix] fix qwen3vlmoe loading #43201

JJJYmmm commented Jan 9, 2026

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[bugfix] fix qwen3vlmoe loading #43201

Are you sure you want to change the base?

[bugfix] fix qwen3vlmoe loading #43201

Conversation

JJJYmmm commented Jan 9, 2026

What does this PR do?

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant