Optimize Vec push by preventing address escapes #150950

gerben-stavenga · 2026-01-11T02:40:08Z

This change makes RawVecInner non-generic over the allocator, allowing it to be Copy. The allocator is moved to RawVec itself. Key optimizations:

RawVecInner is now Copy (no allocator field)
grow_one uses ptr::read/ptr::write to copy allocator to a temporary, preventing &self from escaping through &dyn Allocator parameter
Drop::drop similarly copies to temporaries before deallocating
deallocate takes self by value instead of &mut self
All these functions are #[inline(always)]

This allows LLVM to keep Vec fields (cap, ptr, len) in registers during push loops instead of storing/loading from memory every iteration.

Benchmark results (push with pre-allocated capacity):

100 elements: 1.74x faster
1000 elements: 1.87x faster
10000 elements: 2.41x faster

Secondary benefit: grow_one_impl and other growth functions use &dyn Allocator, so they are compiled once in libstd rather than monomorphized per allocator type.

Preserves const compatibility with the const_heap feature by using generics for the const allocation path while using &dyn Allocator for runtime paths.

saethlin · 2026-01-11T03:41:10Z

All these functions are #[inline(always)]

Please try to only use that attribute where it is demonstrated to be better than #[inline].

Secondary benefit: grow_one_impl and other growth functions use &dyn Allocator, so they are compiled once in libstd rather than monomorphized per allocator type.

Isn't this a penalty for small custom allocators that can be inlined?

gerben-stavenga · 2026-01-11T04:51:59Z

All these functions are #[inline(always)]

Please try to only use that attribute where it is demonstrated to be better than #[inline].

These are on xxx(&mut self) functions that forward to functions that take by self and return self. If not always inlined the &mut self escapes a reference, producing code that is drastically worse. The inner loop in the benchmarks with this PR is

bf330: mov %r13,(%rdx,%r13,8) ; vec[len] = len
bf334: inc %r13 ; len++
bf337: cmp %r13,%r15 ; compare with target
bf33a: je bf370 ; done if equal
bf33c: cmp %rax,%r13 ; compare with capacity
bf33f: jne bf330 ; loop back

before:

bf600: mov -0x38(%rbp),%rax ; LOAD ptr from stack
bf604: mov %r15,(%rax,%r15,8) ; vec[len] = len
bf608: inc %r15 ; len++
bf60b: mov %r15,-0x30(%rbp) ; STORE len to stack
bf60f: cmp %r15,%r14 ; compare with target
bf612: je bf630 ; done if equal
bf614: cmp -0x40(%rbp),%r15 ; LOAD capacity from stack
bf618: jne bf600 ; loop back

Secondary benefit: grow_one_impl and other growth functions use &dyn Allocator, so they are compiled once in libstd rather than monomorphized per allocator type.

Isn't this a penalty for small custom allocators that can be inlined?

I suspect there is a small penalty due to indirection (although the compiler seem to generate call reg in the direct case too). But there are also positive side effects due to code dedup. These are fallback paths so from that perspective a tiny regression isn't the worst. The point of this PR is that the existence of fallback path should not influence the compilers ability to optimize the fast path and keep that clean and tight.

The &dyn Allocator change can be changed to &Allocator at the cost of monomorphizing grow function.

rustbot · 2026-01-11T05:06:19Z

r? @tgross35

rustbot has assigned @tgross35.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

tgross35 · 2026-01-11T05:26:18Z

@bors try @rust-timer queue

rust-timer · 2026-01-11T05:26:20Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-bors · 2026-01-11T05:26:22Z

⌛ Trying commit ac22726 with merge b00b54d…

To cancel the try build, run the command @bors try cancel.

Workflow: https://github.com/rust-lang/rust/actions/runs/20890110771

Optimize Vec push by preventing address escapes

tgross35 · 2026-01-11T05:29:07Z

Looks like the build isn't working yet
@bors try cancel

rust-bors · 2026-01-11T05:29:11Z

Try build cancelled. Cancelled workflows:

https://github.com/rust-lang/rust/actions/runs/20890110771

tgross35 · 2026-01-11T05:34:48Z

library/alloctests/benches/vec.rs

+    b.iter(|| {
+        let mut v = Vec::new();
+        for i in 0..n {
+            v.push(i);
+        }
+        black_box(v.as_slice());
+        v
+    });


You should move the constructor outside of the loop so we're not benchmarking that cost (more relevant for with_capacity). It would also be a good idea to black_box(v).push(i), in which case you don't need to do the as_slice bit.

black_box(v).push(i) would prevent the compiler optimizations we are trying to enable, by explicitly escaping the reference to v.

Moving the constructor out of the lambda also have a similar effect, due to criterion iter black_boxing the returned v.

tgross35 · 2026-01-11T05:35:01Z

library/alloc/src/vec/mod.rs

 #[cfg(not(no_global_oom_handling))]
 #[rustc_const_unstable(feature = "const_heap", issue = "79597")]
-#[rustfmt::skip] // FIXME(fee1-dead): temporary measure before rustfmt is bumped
+#[rustfmt::skip]


Accidental change?

tgross35 · 2026-01-11T05:36:08Z

library/alloc/src/raw_vec/mod.rs

    /// # Safety
    ///
-    /// This function deallocates the owned allocation, but does not update `ptr` or `cap` to
-    /// prevent double-free or use-after-free. Essentially, do not do anything with the caller
-    /// after this function returns.
-    /// Ideally this function would take `self` by move, but it cannot because it exists to be
-    /// called from a `Drop` impl.
-    unsafe fn deallocate(&mut self, elem_layout: Layout) {
+    /// This function deallocates the owned allocation.
+    #[inline]
+    unsafe fn deallocate(self, elem_layout: Layout, alloc: &dyn Allocator) {


Deleted safety comments?

The safety comments are mainly because it taked &mut self. The comments even mention it should take self, but couldn't because earlier it wasn't Copy. Now it takes self so the safety comments don't make sense.

tgross35 · 2026-01-11T05:38:10Z

library/alloc/src/raw_vec/mod.rs

+                unsafe {
+                    // Make it more obvious that a subsequent Vec::reserve(capacity) will not allocate.
+                    hint::assert_unchecked(!inner.needs_to_grow(0, capacity, T::LAYOUT));
+                }


We're trying to get better about having SAFETY comments in std, please make sure to cover any new unsafe blocks.

tgross35 · 2026-01-11T05:46:12Z

library/alloc/src/raw_vec/mod.rs

    pub(crate) fn grow_one(&mut self) {
-        // SAFETY: All calls on self.inner pass T::LAYOUT as the elem_layout
-        unsafe { self.inner.grow_one(T::LAYOUT) }
+        // Copy allocator to a temporary to prevent &self from escaping
+        // through the &dyn Allocator parameter, allowing LLVM to keep
+        // the Vec fields in registers.
+        let alloc = unsafe { ptr::read(&self.alloc) };
+        self.inner = self.inner.grow_one(&alloc, T::LAYOUT);
+        unsafe { ptr::write(&mut self.alloc, alloc) };
    }


Maybe I'm missing something but I don't understand this at all. How does self "escape" through &dyn Allocator? How does saving+restoring make a difference? What happens when alloc has interior mutability and you clobber it? What happens when grow_one panics on OOM and there are now two instances of A, which may impl drop, pointing to the same memory?

Is this perhaps just a trick to get around the borrow checker? If so, then perhaps something like this would work:

let inner = self.inner; inner.grow_one(self.alloc, T::LAYOUT);

rustbot · 2026-01-11T05:46:33Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

Noratrieb · 2026-01-11T10:33:03Z

FWIW the compile time benchmarks will not say anything about the runtime perf of this change, since such large refactorings of Vec are pretty much guaranteed to have some compile time impact on crates that use vec (so every crate).
So the compile time impact of this change and the runtime impact on the vecs in the compiler will be hard to untangle.

hkBst · 2026-01-11T12:39:30Z

library/alloc/src/raw_vec/mod.rs

+        {
            handle_error(err);
        }
+        self


I don't see any changes that could warrant this function changing to no longer being unsafe, it even still has the same safety comments...

hkBst · 2026-01-11T12:44:40Z

library/alloctests/benches/vec.rs

+// ============================================================================
+// PUSH BENCHMARKS - The focus of your optimization work
+// ============================================================================


Please remove these AI comments.

hkBst · 2026-01-11T12:44:59Z

library/alloctests/benches/vec.rs

+    do_bench_push_preallocated(b, 10000);
+}
+
+// ============================================================================


saethlin · 2026-01-11T14:33:07Z

If not always inlined the &mut self escapes a reference, producing code that is drastically worse.

When you say "always inlined" are you referring to the attribute or the optimization?

tgross35 · 2026-01-11T14:38:18Z

Could you define the “escaping” term that you keep using? In Rust I’m only aware of that referring to lifetimes, which isn’t relevant for codegen.

gerben-stavenga · 2026-01-11T14:56:54Z

Could you define the “escaping” term that you keep using? In Rust I’m only aware of that referring to lifetimes, which isn’t relevant for codegen.

In compiler analysis the most important step is mem2reg, ie. lower stack variables to register. The crucial part of this step is that there is no reference to the stack variable. So local function variables whose stack address does not escape the compiler analysis (for example by passing it to some other function) can easily be moved to SSA variables. This allows subsequent codegen to keep those variables in register (because there is no need to sync the register to stack). So currently the reference to self (len, cap, ptr) is passed to grow and thus all subsequent codegen is poluted by unnecessary register <-> stack syncing (see the asm i posted). This PR removes the reference to stack variables from the outline function call. in lieu of passing and returning cap, ptr by value (ie. register)

gerben-stavenga · 2026-01-11T17:54:04Z

If not always inlined the &mut self escapes a reference, producing code that is drastically worse.

When you say "always inlined" are you referring to the attribute or the optimization?

I refer to the optimization, it's crucial that functions taking &mut self, are always inlined because then compiler optimization will see that the reference can eliminated.

gerben-stavenga · 2026-01-11T17:57:40Z

FWIW the compile time benchmarks will not say anything about the runtime perf of this change, since such large refactorings of Vec are pretty much guaranteed to have some compile time impact on crates that use vec (so every crate). So the compile time impact of this change and the runtime impact on the vecs in the compiler will be hard to untangle.

I'm not sure if I understand you. The main point of this PR is runtime performance of code. I hope the benchmarks I'm running are the benchmarks for measuring the runtime perf of the Vec implementation.

There might be a compile time benefit. Because the fallback grow function is only compiled once as part of the standard lib and not, like the current state, in each crate that uses vec.

saethlin · 2026-01-11T18:18:22Z

I refer to the optimization, it's crucial that functions taking &mut self, are always inlined because then compiler optimization will see that the reference can eliminated.

I do not think this is sufficient justification for inline(always). We have so many functions which would cause similar or worse optimization degradation if they weren't inlined in optimized builds. This just isn't worth the cost of the degradation to debug build times that is caused by inline(always). If #[inline] suffices, always is all cost no benefit.

tgross35 · 2026-01-11T20:00:48Z

^ to reiterate that, the rule of thumb now is that any use of #[inline(always)] needs to be backed up by benchmarks and codegen showing it makes a meaningful difference over ‘#[inline]. ‘#[inline(always)]` hurts unoptimized builds and size-optimized binaries so we need to be very cautious with its use.

In general here, it would be helpful if you could put a mini version of the before and after code on godbolt so we can get the bigger picture of what’s actually happening at the different levels.

gerben-stavenga · 2026-01-11T20:47:43Z

^ to reiterate that, the rule of thumb now is that any use of #[inline(always)] needs to be backed up by benchmarks and codegen showing it makes a meaningful difference over ‘#[inline]. ‘#[inline(always)]` hurts unoptimized builds and size-optimized binaries so we need to be very cautious with its use.

In general here, it would be helpful if you could put a mini version of the before and after code on godbolt so we can get the bigger picture of what’s actually happening at the different levels.

https://godbolt.org/z/nrnP4T83e

shows a rather minimal version, you can see the difference in codegen the test functions

vec_push vs rf_push

This change makes RawVecInner non-generic over the allocator, allowing it to be Copy. The allocator is moved to RawVec itself. Key optimizations: - RawVecInner is now Copy (no allocator field) - grow_one uses ptr::read/ptr::write to copy allocator to a temporary, preventing &self from escaping through &dyn Allocator parameter - Drop::drop similarly copies to temporaries before deallocating - deallocate takes self by value instead of &mut self - All these functions are #[inline(always)] This allows LLVM to keep Vec fields (cap, ptr, len) in registers during push loops instead of storing/loading from memory every iteration. Benchmark results (push with pre-allocated capacity): - 100 elements: 1.74x faster - 1000 elements: 1.87x faster - 10000 elements: 2.41x faster Secondary benefit: grow_one_impl and other growth functions use &dyn Allocator, so they are compiled once in libstd rather than monomorphized per allocator type. Preserves const compatibility with the const_heap feature by using generics for the const allocation path while using &dyn Allocator for runtime paths. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

rust-log-analyzer · 2026-01-12T01:21:09Z

The job aarch64-gnu-llvm-20-1 failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

   Compiling alloc v0.0.0 (/checkout/library/alloc)
[RUSTC-TIMING] rustc_std_workspace_core test:false 0.047
[RUSTC-TIMING] core test:false 32.544
   Compiling memchr v2.7.6
error[E0277]: the trait bound `CaptureLocally<'_, A>: core::alloc::Allocator` is not satisfied
   --> library/alloc/src/raw_vec/mod.rs:207:42
    |
207 |         self.inner = self.inner.grow_one(&local_alloc, T::LAYOUT);
    |                                          ^^^^^^^^^^^^ unsatisfied trait bound
    |
help: the trait `core::alloc::Allocator` is not implemented for `CaptureLocally<'_, A>`
   --> library/alloc/src/raw_vec/mod.rs:171:1
    |
171 | struct CaptureLocally<'a, T> {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = help: the following other types implement trait `core::alloc::Allocator`:
              &A
              &mut A
              Arc<T, A>
              Box<T, A>
              Rc<T, A>
              alloc::Global
    = note: required for the cast from `&CaptureLocally<'_, A>` to `&dyn core::alloc::Allocator`

[RUSTC-TIMING] libc test:false 1.965
   Compiling unwind v0.0.0 (/checkout/library/unwind)
[RUSTC-TIMING] unwind test:false 0.065
   Compiling adler2 v2.0.1

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 11, 2026

This comment has been minimized.

Sign in to view

gerben-stavenga force-pushed the vec-push-optimization branch from 132f837 to b32c1a0 Compare January 11, 2026 03:01

This comment has been minimized.

Sign in to view

gerben-stavenga force-pushed the vec-push-optimization branch from b32c1a0 to 67c1768 Compare January 11, 2026 03:35

gerben-stavenga force-pushed the vec-push-optimization branch 2 times, most recently from 4050066 to 52ccbc8 Compare January 11, 2026 03:47

This comment has been minimized.

Sign in to view

gerben-stavenga force-pushed the vec-push-optimization branch from 52ccbc8 to 0313271 Compare January 11, 2026 04:00

This comment has been minimized.

Sign in to view

gerben-stavenga force-pushed the vec-push-optimization branch from 0313271 to 8d85a31 Compare January 11, 2026 04:07

This comment has been minimized.

Sign in to view

gerben-stavenga force-pushed the vec-push-optimization branch from 8d85a31 to f02ca6c Compare January 11, 2026 04:16

This comment has been minimized.

Sign in to view

gerben-stavenga force-pushed the vec-push-optimization branch from f02ca6c to 8597392 Compare January 11, 2026 04:22

This comment has been minimized.

Sign in to view

gerben-stavenga force-pushed the vec-push-optimization branch from 8597392 to ac22726 Compare January 11, 2026 04:56

gerben-stavenga marked this pull request as ready for review January 11, 2026 05:06

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 11, 2026

rustbot assigned tgross35 Jan 11, 2026

rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 11, 2026

rust-bors bot added a commit that referenced this pull request Jan 11, 2026

Auto merge of #150950 - gerben-stavenga:vec-push-optimization, r=<try>

b00b54d

Optimize Vec push by preventing address escapes

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 11, 2026

This comment has been minimized.

Sign in to view

tgross35 reviewed Jan 11, 2026

View reviewed changes

tgross35 requested changes Jan 11, 2026

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 11, 2026

hkBst suggested changes Jan 11, 2026

View reviewed changes

gerben-stavenga force-pushed the vec-push-optimization branch from ac22726 to 6b13ac1 Compare January 11, 2026 19:56

This comment has been minimized.

Sign in to view

gerben-stavenga force-pushed the vec-push-optimization branch from 6b13ac1 to b136c1a Compare January 12, 2026 01:07

Uh oh!

Optimize Vec push by preventing address escapes #150950

Are you sure you want to change the base?

Optimize Vec push by preventing address escapes #150950

Conversation

gerben-stavenga commented Jan 11, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

saethlin commented Jan 11, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

gerben-stavenga commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Jan 11, 2026

Uh oh!

tgross35 commented Jan 11, 2026

Uh oh!

rust-timer commented Jan 11, 2026

Uh oh!

rust-bors bot commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

tgross35 commented Jan 11, 2026

Uh oh!

rust-bors bot commented Jan 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tgross35 Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rustbot commented Jan 11, 2026

Uh oh!

Noratrieb commented Jan 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saethlin commented Jan 11, 2026

Uh oh!

tgross35 commented Jan 11, 2026

Uh oh!

gerben-stavenga commented Jan 11, 2026

Uh oh!

gerben-stavenga commented Jan 11, 2026

Uh oh!

gerben-stavenga commented Jan 11, 2026

Uh oh!

saethlin commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgross35 commented Jan 11, 2026

Uh oh!

This comment has been minimized.

gerben-stavenga commented Jan 11, 2026

Uh oh!

rust-log-analyzer commented Jan 12, 2026

gerben-stavenga commented Jan 11, 2026 •

edited

Loading

rust-bors bot commented Jan 11, 2026 •

edited

Loading

tgross35 Jan 11, 2026 •

edited

Loading

saethlin commented Jan 11, 2026 •

edited

Loading