<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>PyPy</title><link>https://www.pypy.org/</link><description>A Faster Python</description><atom:link href="https://www.pypy.org/rss.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 &lt;a href="mailto:pypy-dev@pypy.org"&gt;The PyPy Team&lt;/a&gt; </copyright><lastBuildDate>Wed, 29 Apr 2026 06:51:58 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>PyPy v7.3.22 release</title><link>https://www.pypy.org/posts/2026/04/pypy-v7322-release.html</link><dc:creator>mattip</dc:creator><description>&lt;section id="pypy-v7-3-22-release-of-python-2-7-3-11"&gt;
&lt;h2&gt;PyPy v7.3.22: release of python 2.7, 3.11&lt;/h2&gt;
&lt;p&gt;The PyPy team is proud to release version 7.3.22 of PyPy after the previous
release on March 13, 2026. This is a bug-fix release that fixes several issues
in the JIT. Among them, a long-standing JIT bug that started appearing when
some instance optimizations exposed it. We also cleaned
up many of the remaining stdlib test suite failures, which improves CPython
compatibility around line numbers in dis.dis, signatures and objclass
attributes for builtins, and other quality of life features.&lt;/p&gt;
&lt;p&gt;There is now an RPython &lt;code class="docutils literal"&gt;_pickle&lt;/code&gt; module that mirrors
the CPython one, greatly speeding up pickling operations. Where before PyPy was
5.7x slower than CPython on the pickle benchmark from the pyperformance
benchmark suite, now it is only 1.6x slower &lt;a class="brackets" href="https://www.pypy.org/posts/2026/04/pypy-v7322-release.html#footnote-1" id="footnote-reference-1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;0&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. We also added pypy
pickler extensions to dump and load lists using list strategies, and enabled
them in the &lt;code class="docutils literal"&gt;ForkingPickler&lt;/code&gt; used by multiprocessing, speeding up cases where
such objects are passed between PyPy multiprocessing instances.&lt;/p&gt;
&lt;p&gt;We also added an RPython json encoder, speeding up json_bench from being 2.6x
slower than CPython to being 0.7x (meaning faster).&lt;/p&gt;
&lt;p&gt;The release includes two different interpreters:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;PyPy2.7, which is an interpreter supporting the syntax and the features of
Python 2.7 including the stdlib for CPython 2.7.18+ (the &lt;code class="docutils literal"&gt;+&lt;/code&gt; is for
backported security updates)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PyPy3.11, which is an interpreter supporting the syntax and the features of
Python 3.11, including the stdlib for CPython 3.11.15.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The interpreters are based on much the same codebase, thus the double
release. This is a micro release, all APIs are compatible with the other 7.3
releases.&lt;/p&gt;
&lt;p&gt;We recommend updating. You can find links to download the releases here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a class="reference external" href="https://pypy.org/download.html"&gt;https://pypy.org/download.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We would like to thank our donors for the continued support of the PyPy
project. If PyPy is not quite good enough for your needs, we are available for
&lt;a class="reference external" href="https://www.pypy.org/pypy-sponsors.html"&gt;direct consulting&lt;/a&gt; work. If PyPy is helping you out, we would love to hear
about it and encourage submissions to our &lt;a class="reference external" href="https://pypy.org/blog"&gt;blog&lt;/a&gt; via a pull request
to &lt;a class="reference external" href="https://github.com/pypy/pypy.org"&gt;https://github.com/pypy/pypy.org&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We would also like to thank our contributors and encourage new people to join
the project. PyPy has many layers and we need help with all of them: bug fixes,
&lt;a class="reference external" href="https://doc.pypy.org/"&gt;PyPy&lt;/a&gt; and &lt;a class="reference external" href="https://rpython.readthedocs.org"&gt;RPython&lt;/a&gt; documentation improvements, or general &lt;a class="reference external" href="https://doc.pypy.org/en/latest/project-ideas.html"&gt;help&lt;/a&gt; with
making RPython's JIT even better.&lt;/p&gt;
&lt;p&gt;If you are a python library maintainer and use C-extensions, please consider
making a &lt;a class="reference external" href="https://hpyproject.org/"&gt;HPy&lt;/a&gt; / &lt;a class="reference external" href="https://cffi.readthedocs.io"&gt;CFFI&lt;/a&gt; / &lt;a class="reference external" href="https://cppyy.readthedocs.io"&gt;cppyy&lt;/a&gt; version of your library that would be performant
on PyPy. In any case, &lt;a class="reference external" href="https://github.com/joerick/cibuildwheel"&gt;cibuildwheel&lt;/a&gt; supports building wheels for PyPy.&lt;/p&gt;
&lt;p class="rubric"&gt;Footnotes&lt;/p&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="footnote-1" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://www.pypy.org/posts/2026/04/pypy-v7322-release.html#footnote-reference-1"&gt;0&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Once &lt;a class="reference external" href="https://github.com/python/pyperformance/pull/461"&gt;a PR to pyperformance&lt;/a&gt; to use the _pickle module on PyPy is accepted&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;section id="what-is-pypy"&gt;
&lt;h3&gt;What is PyPy?&lt;/h3&gt;
&lt;p&gt;PyPy is a Python interpreter, a drop-in replacement for CPython
It's fast (&lt;a class="reference external" href="https://speed.pypy.org"&gt;PyPy and CPython&lt;/a&gt; performance
comparison) due to its integrated tracing JIT compiler.&lt;/p&gt;
&lt;p&gt;We also welcome developers of other &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/examples.html"&gt;dynamic languages&lt;/a&gt; to see what RPython
can do for them.&lt;/p&gt;
&lt;p&gt;We provide binary builds for:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;x86&lt;/strong&gt; machines on most common operating systems
(Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;64-bit &lt;strong&gt;ARM&lt;/strong&gt; machines running Linux (&lt;code class="docutils literal"&gt;aarch64&lt;/code&gt;) and macos (&lt;code class="docutils literal"&gt;macos_arm64&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PyPy supports Windows 32-bit, Linux PPC64 big- and little-endian, Linux ARM
32 bit, RISC-V RV64IMAFD Linux, and s390x Linux but does not release binaries.
Please reach out to us if you wish to sponsor binary releases for those
platforms. Downstream packagers provide binary builds for debian, Fedora,
conda, OpenBSD, FreeBSD, Gentoo, and more.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-else-is-new"&gt;
&lt;h3&gt;What else is new?&lt;/h3&gt;
&lt;p&gt;For more information about the 7.3.22 release, see the &lt;a class="reference external" href="https://doc.pypy.org/en/latest/release-v7.3.22.html#changelog"&gt;full changelog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please update, and continue to help us make pypy better.&lt;/p&gt;
&lt;p&gt;Cheers,
The PyPy Team&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;</description><category>release</category><guid>https://www.pypy.org/posts/2026/04/pypy-v7322-release.html</guid><pubDate>Tue, 28 Apr 2026 10:00:00 GMT</pubDate></item><item><title>Using Claude to fix PyPy3.11 test failures securely</title><link>https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html</link><dc:creator>mattip</dc:creator><description>&lt;p&gt;I got access to Claude Max for 6 months, as a promotional move Anthropic made
to Open Source Software contributors. My main OSS impact is as a maintainer for
NumPy, but I decided to see what claude-code could to for PyPy's failing 3.11
tests. Most of these failures are edge cases: error messages that differ from
CPython, or debugging tools that fail in certain cases. I was worried about
letting an AI agent loose on my development machine. I noticed &lt;a class="reference external" href="https://patrickmccanna.net/a-better-way-to-limit-claude-code-and-other-coding-agents-access-to-secrets/"&gt;a post&lt;/a&gt; by
Patrick McCanna (thanks Patrick!) that pointed to using bubblewrap to
sandbox the agent. So I set it all up and (hopefully securely) pointed
claude-code at some tests.&lt;/p&gt;
&lt;!-- TEASER_END: Read more to find out how it went --&gt;
&lt;section id="setting-up"&gt;
&lt;h2&gt;Setting up&lt;/h2&gt;
&lt;p&gt;There were a few steps to make sure I didn't open myself up to obvious gotchas.
There are stories about agents wiping out data bases, or deleting mail boxes.&lt;/p&gt;
&lt;section id="bubblewrap"&gt;
&lt;h3&gt;Bubblewrap&lt;/h3&gt;
&lt;p&gt;First I needed to see what bubblewrap does. I followed the instructions in the
blog post to set things up with some minor variations:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code bash"&gt;&lt;a id="rest_code_f47febb4d6864475889601b51c684d32-1" name="rest_code_f47febb4d6864475889601b51c684d32-1" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_f47febb4d6864475889601b51c684d32-1"&gt;&lt;/a&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;bubblewrap
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I couldn't run &lt;code class="docutils literal"&gt;bwrap&lt;/code&gt;. After digging around a bit, I found I needed to add
an exception for appamor on Ubuntu 24.04:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code bash"&gt;&lt;a id="rest_code_5f45675cf72f4df29df88c4d18a23518-1" name="rest_code_5f45675cf72f4df29df88c4d18a23518-1" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5f45675cf72f4df29df88c4d18a23518-1"&gt;&lt;/a&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;bash&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'cat &amp;gt; /etc/apparmor.d/bwrap &amp;lt;&amp;lt; EOF&lt;/span&gt;
&lt;a id="rest_code_5f45675cf72f4df29df88c4d18a23518-2" name="rest_code_5f45675cf72f4df29df88c4d18a23518-2" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5f45675cf72f4df29df88c4d18a23518-2"&gt;&lt;/a&gt;&lt;span class="s1"&gt;abi &amp;lt;abi/4.0&amp;gt;,&lt;/span&gt;
&lt;a id="rest_code_5f45675cf72f4df29df88c4d18a23518-3" name="rest_code_5f45675cf72f4df29df88c4d18a23518-3" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5f45675cf72f4df29df88c4d18a23518-3"&gt;&lt;/a&gt;&lt;span class="s1"&gt;include &amp;lt;tunables/global&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_5f45675cf72f4df29df88c4d18a23518-4" name="rest_code_5f45675cf72f4df29df88c4d18a23518-4" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5f45675cf72f4df29df88c4d18a23518-4"&gt;&lt;/a&gt;
&lt;a id="rest_code_5f45675cf72f4df29df88c4d18a23518-5" name="rest_code_5f45675cf72f4df29df88c4d18a23518-5" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5f45675cf72f4df29df88c4d18a23518-5"&gt;&lt;/a&gt;&lt;span class="s1"&gt;profile bwrap /usr/bin/bwrap flags=(unconfined) {&lt;/span&gt;
&lt;a id="rest_code_5f45675cf72f4df29df88c4d18a23518-6" name="rest_code_5f45675cf72f4df29df88c4d18a23518-6" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5f45675cf72f4df29df88c4d18a23518-6"&gt;&lt;/a&gt;&lt;span class="s1"&gt;  userns,&lt;/span&gt;
&lt;a id="rest_code_5f45675cf72f4df29df88c4d18a23518-7" name="rest_code_5f45675cf72f4df29df88c4d18a23518-7" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5f45675cf72f4df29df88c4d18a23518-7"&gt;&lt;/a&gt;&lt;span class="s1"&gt;}&lt;/span&gt;
&lt;a id="rest_code_5f45675cf72f4df29df88c4d18a23518-8" name="rest_code_5f45675cf72f4df29df88c4d18a23518-8" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5f45675cf72f4df29df88c4d18a23518-8"&gt;&lt;/a&gt;&lt;span class="s1"&gt;EOF'&lt;/span&gt;
&lt;a id="rest_code_5f45675cf72f4df29df88c4d18a23518-9" name="rest_code_5f45675cf72f4df29df88c4d18a23518-9" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5f45675cf72f4df29df88c4d18a23518-9"&gt;&lt;/a&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;apparmor_parser&lt;span class="w"&gt; &lt;/span&gt;-r&lt;span class="w"&gt; &lt;/span&gt;/etc/apparmor.d/bwrap
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then &lt;code class="docutils literal"&gt;bwrap&lt;/code&gt; would run. It is all locked down by default, so I opened up some
exceptions. The arguments are pretty self-explanatory. Ubuntu spreads the
executables around the operating system, so I needed access to various
directories. I wanted a &lt;code class="docutils literal"&gt;/tmp&lt;/code&gt; for running pytest. I also wanted the prompt
to reflect the use of bubblewrap, so changed the &lt;code class="docutils literal"&gt;hostname&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code bash"&gt;&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-1" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-1" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-1"&gt;&lt;/a&gt;cat&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;&amp;lt; 'EOL' &amp;gt;&amp;gt; ./run_bwrap.sh&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-2" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-2" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-2"&gt;&lt;/a&gt;&lt;span class="s"&gt;  function call_bwrap() {&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-3" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-3" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-3"&gt;&lt;/a&gt;&lt;span class="s"&gt;    bwrap \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-4" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-4" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-4"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --ro-bind /usr /usr \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-5" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-5" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-5"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --ro-bind /etc /etc \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-6" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-6" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-6"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --ro-bind /run /run \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-7" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-7" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-7"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --symlink usr/lib /lib \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-8" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-8" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-8"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --symlink usr/lib64 /lib64 \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-9" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-9" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-9"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --symlink usr/bin /bin \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-10" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-10" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-10"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --proc /proc \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-11" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-11" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-11"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --dev /dev \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-12" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-12" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-12"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --bind $(pwd) $(pwd) \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-13" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-13" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-13"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --chdir $(pwd) \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-14" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-14" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-14"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --unshare-user --unshare-pid --unshare-ipc --unshare-uts --unshare-cgroup \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-15" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-15" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-15"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --die-with-parent \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-16" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-16" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-16"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --hostname bwrap \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-17" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-17" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-17"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --tmpfs /tmp \&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-18" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-18" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-18"&gt;&lt;/a&gt;&lt;span class="s"&gt;      /bin/bash "$@"&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-19" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-19" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-19"&gt;&lt;/a&gt;&lt;span class="s"&gt;  }&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-20" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-20" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-20"&gt;&lt;/a&gt;&lt;span class="s"&gt;EOL&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-21" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-21" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-21"&gt;&lt;/a&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-22" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-22" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-22"&gt;&lt;/a&gt;&lt;span class="nb"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;./run_bwrap.sh
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-23" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-23" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-23"&gt;&lt;/a&gt;call_bwrap
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-24" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-24" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-24"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# now I am in a sandboxed bash shell&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-25" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-25" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-25"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# play around, try seeing other directories, getting sudo, or writing outside&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-26" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-26" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-26"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# the sandbox&lt;/span&gt;
&lt;a id="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-27" name="rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-27" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_5524b8cb3bdf46e9907a8f0b852aaec9-27"&gt;&lt;/a&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I did not do &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;--unshare-network&lt;/span&gt;&lt;/code&gt; since, after all, I want to use claude and
that needs network access. I did add rw access to &lt;code class="docutils literal"&gt;$(pwd)&lt;/code&gt; since I want it to
edit code in the current directory, that is the whole point.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="basic-claude"&gt;
&lt;h3&gt;Basic claude&lt;/h3&gt;
&lt;p&gt;After trying out bubblewrap and convincing myself it does actually work, I
installed claude code&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code bash"&gt;&lt;a id="rest_code_c22e562bdb0048bf8a23da9d718e3cf7-1" name="rest_code_c22e562bdb0048bf8a23da9d718e3cf7-1" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_c22e562bdb0048bf8a23da9d718e3cf7-1"&gt;&lt;/a&gt;curl&lt;span class="w"&gt; &lt;/span&gt;-fsSL&lt;span class="w"&gt; &lt;/span&gt;https://claude.ai/install.sh&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bash
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Really Anthropic, this is the best way to install claude? No dpkg?&lt;/p&gt;
&lt;p&gt;I ran claude once (unsafely) to get logged in. It opened a webpage, and saved
the login to the &lt;code class="docutils literal"&gt;oathAccount&lt;/code&gt; field in &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;~/.claude.json&lt;/span&gt;&lt;/code&gt;. Now I changed my
bash script to this to get claude to run inside the bubblewrap sandbox:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code bash"&gt;&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-1" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-1" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-1"&gt;&lt;/a&gt;cat&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;&amp;lt; 'EOL' &amp;gt;&amp;gt; ./run_claude.sh&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-2" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-2" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-2"&gt;&lt;/a&gt;&lt;span class="s"&gt;  claude-safe() {&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-3" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-3" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-3"&gt;&lt;/a&gt;&lt;span class="s"&gt;    bwrap \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-4" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-4" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-4"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --ro-bind /usr /usr \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-5" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-5" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-5"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --ro-bind /etc /etc \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-6" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-6" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-6"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --ro-bind /run /run \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-7" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-7" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-7"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --ro-bind "$HOME/.local/share/claude" "$HOME/.local/share/claude" \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-8" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-8" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-8"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --symlink usr/lib /lib \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-9" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-9" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-9"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --symlink usr/lib64 /lib64 \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-10" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-10" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-10"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --symlink usr/bin /bin \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-11" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-11" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-11"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --symlink "$HOME/.local/share/claude/versions/2.1.81" "$HOME/.local/bin/claude" \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-12" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-12" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-12"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --proc /proc \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-13" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-13" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-13"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --dev /dev \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-14" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-14" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-14"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --bind $(pwd) $(pwd) \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-15" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-15" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-15"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --bind "$HOME/.claude" "$HOME/.claude" \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-16" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-16" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-16"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --bind "$HOME/.claude.json" "$HOME/.claude.json" \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-17" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-17" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-17"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --chdir $(pwd) \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-18" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-18" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-18"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --unshare-user --unshare-pid --unshare-ipc --unshare-uts --unshare-cgroup \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-19" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-19" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-19"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --die-with-parent \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-20" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-20" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-20"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --hostname bwrap \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-21" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-21" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-21"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --tmpfs /tmp \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-22" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-22" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-22"&gt;&lt;/a&gt;&lt;span class="s"&gt;      --setenv PATH "$HOME/.local/bin:$PATH" \&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-23" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-23" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-23"&gt;&lt;/a&gt;&lt;span class="s"&gt;      claude "$@"&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-24" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-24" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-24"&gt;&lt;/a&gt;&lt;span class="s"&gt;  }&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-25" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-25" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-25"&gt;&lt;/a&gt;&lt;span class="s"&gt;EOL&lt;/span&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-26" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-26" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-26"&gt;&lt;/a&gt;
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-27" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-27" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-27"&gt;&lt;/a&gt;&lt;span class="nb"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;./run_claude.sh
&lt;a id="rest_code_0bd501ad66b8401c94a09abb0965f1bb-28" name="rest_code_0bd501ad66b8401c94a09abb0965f1bb-28" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_0bd501ad66b8401c94a09abb0965f1bb-28"&gt;&lt;/a&gt;claude-safe
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now I can use claude. Note it needs some more directories in order to run. This
script hard-codes the version, in the future YMMV. I want it to be able to look
at github, and also my local checkout of cpython so it can examine differences.
I created a read-only token by clicking on my avatar in the upper right corner
of a github we page, then going to Settings → Developer settings → Personal
access tokens → Fine-grained tokens → Generate new token. Since pypy is in the
pypy org, I used "Repository owner: pypy", "Repository access: pypy (only)" and
"Permissions: Contents". Then I made doubly sure the token permissions were
read-only. And checked again. Then I copied the token to the bash script. I
also added a &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;ro-bind&lt;/span&gt;&lt;/code&gt; to the cpython checkout, so I could tell claude code
where to look for CPython implementations of missing PyPy functionality.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code bash"&gt;&lt;a id="rest_code_47539cea8f774219b23b06f75d181e00-1" name="rest_code_47539cea8f774219b23b06f75d181e00-1" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_47539cea8f774219b23b06f75d181e00-1"&gt;&lt;/a&gt;--ro-bind&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/oss/cpython"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/oss/cpython"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;a id="rest_code_47539cea8f774219b23b06f75d181e00-2" name="rest_code_47539cea8f774219b23b06f75d181e00-2" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_47539cea8f774219b23b06f75d181e00-2"&gt;&lt;/a&gt;--setenv&lt;span class="w"&gt; &lt;/span&gt;GH_TOKEN&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hah, sharing my token would not have been smart"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;section id="claude-sandbox"&gt;
&lt;h3&gt;Claude /sandbox&lt;/h3&gt;
&lt;p&gt;Claude comes with its own sandbox, configured by using the &lt;code class="docutils literal"&gt;/sandbox&lt;/code&gt; command.
I chose the defaults, which prevents malicious code in the repo from accessing
the file system and the network. I was missing some packages to get this to
work. Claude would hang until I installed them, and I needed to kill it with
&lt;code class="docutils literal"&gt;kill&lt;/code&gt;.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code bash"&gt;&lt;a id="rest_code_b1f99dca990e460a90c201cd7ffdd313-1" name="rest_code_b1f99dca990e460a90c201cd7ffdd313-1" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_b1f99dca990e460a90c201cd7ffdd313-1"&gt;&lt;/a&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;socat
&lt;a id="rest_code_b1f99dca990e460a90c201cd7ffdd313-2" name="rest_code_b1f99dca990e460a90c201cd7ffdd313-2" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_b1f99dca990e460a90c201cd7ffdd313-2"&gt;&lt;/a&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;npm&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;-g&lt;span class="w"&gt; &lt;/span&gt;@anthropic-ai/sandbox-runtime
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;section id="final-touches"&gt;
&lt;h3&gt;Final touches&lt;/h3&gt;
&lt;p&gt;One last thing that I discovered later: I needed to give claude access to some
grepping and git tools. While git should be locked down externally so it
cannot push to the repo, I do want claude to look at other issues and pull
requests in read-only mode. So I added a local &lt;code class="docutils literal"&gt;.claude/settings.json&lt;/code&gt; file
inside the repo (see below for which directory to do this):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code json"&gt;&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-1" name="rest_code_92d0148fbe964c40bce913ea1b64c188-1" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-1"&gt;&lt;/a&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-2" name="rest_code_92d0148fbe964c40bce913ea1b64c188-2" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-2"&gt;&lt;/a&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"permissions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-3" name="rest_code_92d0148fbe964c40bce913ea1b64c188-3" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-3"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-4" name="rest_code_92d0148fbe964c40bce913ea1b64c188-4" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-4"&gt;&lt;/a&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(sed*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-5" name="rest_code_92d0148fbe964c40bce913ea1b64c188-5" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-5"&gt;&lt;/a&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(grep*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-6" name="rest_code_92d0148fbe964c40bce913ea1b64c188-6" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-6"&gt;&lt;/a&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(cat*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-7" name="rest_code_92d0148fbe964c40bce913ea1b64c188-7" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-7"&gt;&lt;/a&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(find*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-8" name="rest_code_92d0148fbe964c40bce913ea1b64c188-8" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-8"&gt;&lt;/a&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(rg*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-9" name="rest_code_92d0148fbe964c40bce913ea1b64c188-9" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-9"&gt;&lt;/a&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(python*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-10" name="rest_code_92d0148fbe964c40bce913ea1b64c188-10" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-10"&gt;&lt;/a&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(pytest*)"&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-11" name="rest_code_92d0148fbe964c40bce913ea1b64c188-11" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-11"&gt;&lt;/a&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-12" name="rest_code_92d0148fbe964c40bce913ea1b64c188-12" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-12"&gt;&lt;/a&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_92d0148fbe964c40bce913ea1b64c188-13" name="rest_code_92d0148fbe964c40bce913ea1b64c188-13" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_92d0148fbe964c40bce913ea1b64c188-13"&gt;&lt;/a&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I made git ignore it, even when doing a &lt;code class="docutils literal"&gt;git clean&lt;/code&gt;, in a local (not part
of the repo) configuration&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code bash"&gt;&lt;a id="rest_code_664806b3e81f4a8f8e845388ffd01fd6-1" name="rest_code_664806b3e81f4a8f8e845388ffd01fd6-1" href="https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html#rest_code_664806b3e81f4a8f8e845388ffd01fd6-1"&gt;&lt;/a&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-n&lt;span class="w"&gt; &lt;/span&gt;.claude&lt;span class="w"&gt; &lt;/span&gt;&amp;gt;&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;~/.config/git/ignore
&lt;/pre&gt;&lt;/div&gt;
&lt;/section&gt;
&lt;section id="what-about-git-push"&gt;
&lt;h3&gt;What about &lt;code class="docutils literal"&gt;git push&lt;/code&gt;?&lt;/h3&gt;
&lt;p&gt;I don't want claude messing around with the upstream repo, only read access. But
I did not actively prevent &lt;code class="docutils literal"&gt;git push&lt;/code&gt;. So instead of using my actual pypy
repo, I cloned it to a separate directory and did not add a remote pointing to
github.com.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="fixing-tests-easy"&gt;
&lt;h2&gt;Fixing tests - easy&lt;/h2&gt;
&lt;p&gt;Now that everything is set up (I hope I remembered everything), I could start
asking questions. The technique I chose was to feed claude the whole test
failure from the buildbot. So starting from the &lt;a class="reference external" href="https://buildbot.pypy.org/summary?branch=py3.11"&gt;buildbot py3.11 summary&lt;/a&gt;,
click on one of the &lt;code class="docutils literal"&gt;F&lt;/code&gt; links and copy-paste all that into the claude prompt.
It didn't take long for claude to come up with solutions for the long-standing
&lt;a class="reference external" href="https://github.com/pypy/pypy/commit/9e8e121b545dbea3f26ca436ae8a797617904306#diff-ab042b3dd16bf22b7e3d8595f182ad39d3823d76b414da7debe96081a884d16bR64-R330"&gt;ctype error missing exception&lt;/a&gt; which turned out to be due to an missing error
trap when already handling an error.&lt;/p&gt;
&lt;p&gt;Also a &lt;a class="reference external" href="https://github.com/pypy/pypy/commit/9e8e121b545dbea3f26ca436ae8a797617904306#diff-ab042b3dd16bf22b7e3d8595f182ad39d3823d76b414da7debe96081a884d16bR64-R53"&gt;CTYPES_MAX_ARGCOUNT check&lt;/a&gt; was
missing. At first, claude wanted to change the ctypes code from CPython's stdlib,
and so I had to make it clear that claude was not to touch the files in
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;lib-python&lt;/span&gt;&lt;/code&gt;. They are copied verbatim from CPython and should not be
modified without really good reasons.&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://github.com/pypy/pypy/commit/39ca7a1def272742e8aafd2a649ed4f8fed7038d"&gt;fix to raise&lt;/a&gt; &lt;code class="docutils literal"&gt;TypeError&lt;/code&gt; rather
than &lt;code class="docutils literal"&gt;Attribute Error&lt;/code&gt; for deleting ctype object's &lt;code class="docutils literal"&gt;value&lt;/code&gt; was maybe a little
trickier: claude needed to create its own &lt;code class="docutils literal"&gt;property&lt;/code&gt; class and use it in
assignments.&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://github.com/pypy/pypy/commit/e0e401699c20a92d8db657879183c68ea44246b4"&gt;fix for a failing test&lt;/a&gt; for a correct &lt;code class="docutils literal"&gt;repr&lt;/code&gt; of a ctypes array was a
little more involved.  Claude needed to figure out that &lt;code class="docutils literal"&gt;newmemoryview&lt;/code&gt; was
raising an exception, dive into the RPython implementation and fix the problem,
and then also fix a pure-python &lt;code class="docutils literal"&gt;__buffer__&lt;/code&gt; shape edge case error.&lt;/p&gt;
&lt;p&gt;There were more, but you get the idea. With a little bit of coaching, and by showing
claude where the CPython implementation was, more tests are now passing.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="fixing-tests-harder"&gt;
&lt;h2&gt;Fixing tests - harder&lt;/h2&gt;
&lt;p&gt;PyPy has a HPy backend. There were some test failures that were
easy to fix (a handle not being closed, an annotation warning). But the big one
was a problem with the context tracking before and after ffi function calls. In
debug mode there is a check that the ffi call is done using the correct HPy
context. It turns out to be tricky to hang on to a reference to a context in
RPython since the context RPython object is pre-built. The solution, which took
quite a few tokens and translation cycles to work out, was to assign the
context on the C level, and have a getter to fish it out in RPython.&lt;/p&gt;
&lt;section id="conclusion"&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;I started this journey not more than 24 hours ago, after some successful
sessions using claude to refactor some web sites off hosting platforms and make
them static pages. I was impressed enough to try coding with it from the
terminal. It helps that I was given a generous budget to use Anthropic's tool.&lt;/p&gt;
&lt;p&gt;Claude seems capable of understanding the layers of PyPy: from the pure python
stdlib to RPython and into the small amount of C code. I even asked it to
examine a &lt;a class="reference external" href="https://github.com/pypy/pypy/issues/5398"&gt;segfault&lt;/a&gt; in the recently released PyPy7.3.21, and it seems to have
found the general area where there was a latent bug in the JIT.&lt;/p&gt;
&lt;p&gt;Like any tool, agentic programming must be used carefully to make sure it
cannot do damage. I hope I closed the most obvious foot-guns, if you have other
ideas of things I should do to protect myself while using an agent like this, I
would love to hear about them.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;</description><category>AI</category><guid>https://www.pypy.org/posts/2026/03/using-claude-to-fix-pypy311-test-failures-securely.html</guid><pubDate>Mon, 23 Mar 2026 10:27:55 GMT</pubDate></item><item><title>PyPy v7.3.21 release</title><link>https://www.pypy.org/posts/2026/03/pypy-v7321-release.html</link><dc:creator>mattip</dc:creator><description>&lt;section id="pypy-v7-3-21-release-of-python-2-7-3-11"&gt;
&lt;h2&gt;PyPy v7.3.21: release of python 2.7, 3.11&lt;/h2&gt;
&lt;aside class="admonition warning"&gt;
&lt;p class="admonition-title"&gt;Warning&lt;/p&gt;
&lt;p&gt;This release has some known crashes. We recommend you use a different version&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;The PyPy team is proud to release version 7.3.21 of PyPy after the previous
release on July 4, 2025. This is a bug-fix release that also updates to Python
3.11.15.&lt;/p&gt;
&lt;p&gt;The release includes two different interpreters:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;PyPy2.7, which is an interpreter supporting the syntax and the features of
Python 2.7 including the stdlib for CPython 2.7.18+ (the &lt;code class="docutils literal"&gt;+&lt;/code&gt; is for
backported security updates)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PyPy3.11, which is an interpreter supporting the syntax and the features of
Python 3.11, including the stdlib for CPython 3.11.15.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The interpreters are based on much the same codebase, thus the double
release. This is a micro release, all APIs are compatible with the other 7.3
releases.&lt;/p&gt;
&lt;p&gt;We recommend updating. You can find links to download the releases here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a class="reference external" href="https://pypy.org/download.html"&gt;https://pypy.org/download.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We would like to thank our donors for the continued support of the PyPy
project. If PyPy is not quite good enough for your needs, we are available for
&lt;a class="reference external" href="https://www.pypy.org/pypy-sponsors.html"&gt;direct consulting&lt;/a&gt; work. If PyPy is helping you out, we would love to hear
about it and encourage submissions to our &lt;a class="reference external" href="https://pypy.org/blog"&gt;blog&lt;/a&gt; via a pull request
to &lt;a class="reference external" href="https://github.com/pypy/pypy.org"&gt;https://github.com/pypy/pypy.org&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We would also like to thank our contributors and encourage new people to join
the project. PyPy has many layers and we need help with all of them: bug fixes,
&lt;a class="reference external" href="https://doc.pypy.org/"&gt;PyPy&lt;/a&gt; and &lt;a class="reference external" href="https://rpython.readthedocs.org"&gt;RPython&lt;/a&gt; documentation improvements, or general &lt;a class="reference external" href="https://doc.pypy.org/en/latest/project-ideas.html"&gt;help&lt;/a&gt; with
making RPython's JIT even better.&lt;/p&gt;
&lt;p&gt;If you are a python library maintainer and use C-extensions, please consider
making a &lt;a class="reference external" href="https://hpyproject.org/"&gt;HPy&lt;/a&gt; / &lt;a class="reference external" href="https://cffi.readthedocs.io"&gt;CFFI&lt;/a&gt; / &lt;a class="reference external" href="https://cppyy.readthedocs.io"&gt;cppyy&lt;/a&gt; version of your library that would be performant
on PyPy. In any case, &lt;a class="reference external" href="https://github.com/joerick/cibuildwheel"&gt;cibuildwheel&lt;/a&gt; supports building wheels for PyPy.&lt;/p&gt;
&lt;section id="what-is-pypy"&gt;
&lt;h3&gt;What is PyPy?&lt;/h3&gt;
&lt;p&gt;PyPy is a Python interpreter, a drop-in replacement for CPython
It's fast (&lt;a class="reference external" href="https://speed.pypy.org"&gt;PyPy and CPython&lt;/a&gt; performance
comparison) due to its integrated tracing JIT compiler.&lt;/p&gt;
&lt;p&gt;We also welcome developers of other &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/examples.html"&gt;dynamic languages&lt;/a&gt; to see what RPython
can do for them.&lt;/p&gt;
&lt;p&gt;We provide binary builds for:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;x86&lt;/strong&gt; machines on most common operating systems
(Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;64-bit &lt;strong&gt;ARM&lt;/strong&gt; machines running Linux (&lt;code class="docutils literal"&gt;aarch64&lt;/code&gt;) and macos (&lt;code class="docutils literal"&gt;macos_arm64&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PyPy supports Windows 32-bit, Linux PPC64 big- and little-endian, Linux ARM
32 bit, RISC-V RV64IMAFD Linux, and s390x Linux but does not release binaries.
Please reach out to us if you wish to sponsor binary releases for those
platforms. Downstream packagers provide binary builds for debian, Fedora,
conda, OpenBSD, FreeBSD, Gentoo, and more.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-else-is-new"&gt;
&lt;h3&gt;What else is new?&lt;/h3&gt;
&lt;p&gt;For more information about the 7.3.21 release, see the &lt;a class="reference external" href="https://doc.pypy.org/en/latest/release-v7.3.21.html#changelog"&gt;full changelog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please update, and continue to help us make pypy better.&lt;/p&gt;
&lt;p&gt;Cheers,
The PyPy Team&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;</description><category>release</category><guid>https://www.pypy.org/posts/2026/03/pypy-v7321-release.html</guid><pubDate>Fri, 13 Mar 2026 10:00:00 GMT</pubDate></item><item><title>Load and store forwarding in the Toy Optimizer</title><link>https://www.pypy.org/posts/2025/12/toy-load-store.html</link><dc:creator>Max Bernstein</dc:creator><description>&lt;p&gt;This is a &lt;a href="https://bernsteinbear.com/blog/toy-load-store/" rel="canonical"&gt;cross-post&lt;/a&gt; from Max Bernstein from his blog where he writes
about programming languages, compilers, optimizations, virtual machines.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;A long, long time ago (two years!) &lt;a href="https://cfbolz.de/"&gt;CF Bolz-Tereick&lt;/a&gt; and I made a &lt;a href="https://www.youtube.com/watch?v=w-UHg0yOPSE"&gt;video
about load/store forwarding&lt;/a&gt; and an accompanying &lt;a href="https://gist.github.com/tekknolagi/4e3fa26d350f6d3b39ede40d372b97fe"&gt;GitHub Gist&lt;/a&gt;
about load/store forwarding (also called load elimination) in the Toy Optimizer. I
said I would write a blog post about it, but never found the time—it got lost
amid a sea of large life changes.&lt;/p&gt;
&lt;p&gt;It's a neat idea: do an abstract interpretation over the trace, modeling the
heap at compile-time, eliminating redundant loads and stores. That means it's
possible to optimize traces like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;v3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;v4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;v5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;do_something&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;into traces like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;v5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;do_something&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;(where &lt;code&gt;load(v0, 5)&lt;/code&gt; is equivalent to &lt;code&gt;*(v0+5)&lt;/code&gt; in C syntax and &lt;code&gt;store(v0, 6,
123)&lt;/code&gt; is equvialent to &lt;code&gt;*(v0+6)=123&lt;/code&gt; in C syntax)&lt;/p&gt;
&lt;p&gt;This indicates that we were able to eliminate two redundant loads by keeping
around information about previous loads and stores. Let's get to work making
this possible.&lt;/p&gt;
&lt;h3 id="the-usual-infrastructure"&gt;The usual infrastructure&lt;/h3&gt;
&lt;p&gt;We'll start off with the usual infrastructure from the &lt;a href="https://pypy.org/categories/toy-optimizer.html"&gt;Toy
Optimizer series&lt;/a&gt;: a very stringly-typed representation of a
&lt;a href="https://gist.github.com/tekknolagi/4e3fa26d350f6d3b39ede40d372b97fe#file-port-py-L4-L112"&gt;trace-based SSA IR&lt;/a&gt; and a union-find rewrite mechanism.&lt;/p&gt;
&lt;p&gt;This means we can start writing some new optimization pass and our first test:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# TODO: copy an optimized version of bb into opt_bb&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;opt_bb&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test_two_loads&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;var0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;bb_to_str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"""&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;var0 = getarg(0)&lt;/span&gt;
&lt;span class="s2"&gt;var1 = load(var0, 0)&lt;/span&gt;
&lt;span class="s2"&gt;var2 = escape(var1)&lt;/span&gt;
&lt;span class="s2"&gt;var3 = escape(var1)"""&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This test is asserting that we can remove duplicate loads. Why load twice if we
can cache the result? Let's make that happen.&lt;/p&gt;
&lt;h3 id="caching-loads"&gt;Caching loads&lt;/h3&gt;
&lt;p&gt;To do this, we'll model the the heap at compile-time. When I say "model", I
mean that we will have an imprecise but correct abstract representation of the
heap: we don't (and can't) have knowledge of every value, but we can know for
sure that some addresses have certain values.&lt;/p&gt;
&lt;p&gt;For example, if we have observed a load from object &lt;em&gt;O&lt;/em&gt; at offset &lt;em&gt;8&lt;/em&gt; &lt;code&gt;v0 =
load(O, 8)&lt;/code&gt;, we know that the SSA value &lt;code&gt;v0&lt;/code&gt; is at &lt;code&gt;heap[(O, 8)]&lt;/code&gt;. That sounds
tautological, but it's not. Future loads can make use of this information.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get_num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Operation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;Constant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# Stores things we know about the heap at... compile-time.&lt;/span&gt;
    &lt;span class="c1"&gt;# Key: an object and an offset pair acting as a heap address&lt;/span&gt;
    &lt;span class="c1"&gt;# Value: a previous SSA value we know exists at that address&lt;/span&gt;
    &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"load"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;load_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;previous&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;load_info&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;previous&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make_equal_to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;previous&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;load_info&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;
        &lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;opt_bb&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This pass records information about loads and uses the result of a previous
cached load operation if available. We treat the pair of (SSA value, offset) as
an address into our abstract heap.&lt;/p&gt;
&lt;p&gt;That's great! If you run our simple test, it should now pass. But what happens
if we store into that address before the second load? Oops...&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test_store_to_same_object_offset_invalidates_load&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;var0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;bb_to_str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"""&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;var0 = getarg(0)&lt;/span&gt;
&lt;span class="s2"&gt;var1 = load(var0, 0)&lt;/span&gt;
&lt;span class="s2"&gt;var2 = store(var0, 0, 5)&lt;/span&gt;
&lt;span class="s2"&gt;var3 = load(var0, 0)&lt;/span&gt;
&lt;span class="s2"&gt;var4 = escape(var1)&lt;/span&gt;
&lt;span class="s2"&gt;var5 = escape(var3)"""&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This test fails because we are incorrectly keeping around &lt;code&gt;var1&lt;/code&gt; in our
abstract heap. We need to get rid of it and not replace &lt;code&gt;var3&lt;/code&gt; with &lt;code&gt;var1&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="invalidating-cached-loads"&gt;Invalidating cached loads&lt;/h3&gt;
&lt;p&gt;So it turns out we have to also model stores in order to cache loads correctly.
One valid, albeit aggressive, way to do that is to throw away all the
information we know at each store operation:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"store"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"load"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# ...&lt;/span&gt;
        &lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;opt_bb&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That makes our test pass—yay!—but at great cost. It means any store
operation mucks up redundant loads. In our world where we frequently read from
and write to objects, this is what we call a huge bummer.&lt;/p&gt;
&lt;p&gt;For example, a store to offset 4 on some object should never interfere with a
load from a different offset on the same object&lt;sup id="fnref:size"&gt;&lt;a class="footnote-ref" href="https://www.pypy.org/posts/2025/12/toy-load-store.html#fn:size"&gt;1&lt;/a&gt;&lt;/sup&gt;. We should be able to
keep our load from offset 0 cached here:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test_store_to_same_object_different_offset_does_not_invalidate_load&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;var0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;bb_to_str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"""&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;var0 = getarg(0)&lt;/span&gt;
&lt;span class="s2"&gt;var1 = load(var0, 0)&lt;/span&gt;
&lt;span class="s2"&gt;var2 = store(var0, 4, 5)&lt;/span&gt;
&lt;span class="s2"&gt;var3 = escape(var1)&lt;/span&gt;
&lt;span class="s2"&gt;var4 = escape(var1)"""&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We could try instead checking if our specific (object, offset) pair is in the
heap and only removing cached information about that offset and that object.
That would definitely help!&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"store"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;load_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;get_num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;load_info&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;load_info&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"load"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# ...&lt;/span&gt;
        &lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;opt_bb&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It makes our test pass, too, which is great news.&lt;/p&gt;
&lt;p&gt;Unfortunately, this runs into problems due to aliasing: it's entirely possible
that our compile-time heap could contain a pair &lt;code&gt;(v0, 0)&lt;/code&gt; and a pair &lt;code&gt;(v1, 0)&lt;/code&gt; where &lt;code&gt;v0&lt;/code&gt;
and &lt;code&gt;v1&lt;/code&gt; are the same object (but not known to the optimizer). Then we might
run into a situation where we incorrectly cache loads because the optimizer
doesn't know our abstract addresses &lt;code&gt;(v0, 0)&lt;/code&gt; and &lt;code&gt;(v1, 0)&lt;/code&gt; are actually the
same pointer at run-time.&lt;/p&gt;
&lt;p&gt;This means that we are breaking abstract interpretation rules: our abstract
interpreter has to correctly model &lt;em&gt;all&lt;/em&gt; possible outcomes at run-time. This
means to me that we should instead pick some tactic in-between clearing all
information (correct but over-eager) and clearing only exact matches of
object+offset (incorrect).&lt;/p&gt;
&lt;p&gt;The term that will help us here is called an &lt;em&gt;alias class&lt;/em&gt;. It is a name for a
way to efficiently partition objects in your abstract heap into completely
disjoint sets. Writes to any object in one class never affect objects in
another class.&lt;/p&gt;
&lt;p&gt;Our very scrappy alias classes will be just based on the offset: each offset is
a different alias class. If we write to any object at offset K, we have to
invalidate all of our compile-time offset K knowledge—even if it's for
another object. This is a nice middle ground, and it's possible because our
(made up) object system guarantees that distinct objects do not overlap, and
also that we are not writing out-of-bounds.&lt;sup id="fnref:tbaa"&gt;&lt;a class="footnote-ref" href="https://www.pypy.org/posts/2025/12/toy-load-store.html#fn:tbaa"&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;So let's remove all of the entries from &lt;code&gt;compile_time_heap&lt;/code&gt; where the offset
matches the offset in the current &lt;code&gt;store&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"store"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;compile_time_heap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;load_info&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;load_info&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;load_info&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"load"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# ...&lt;/span&gt;
        &lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;opt_bb&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Great! Now our test passes.&lt;/p&gt;
&lt;p&gt;This concludes the load optimization section of the post. We have modeled
enough of loads and stores that we can eliminate redundant loads. Very cool.
But we can go further.&lt;/p&gt;
&lt;h3 id="caching-stores"&gt;Caching stores&lt;/h3&gt;
&lt;p&gt;Stores don't just invalidate information. They also give us new information!
Any time we see an operation of the form &lt;code&gt;v1 = store(v0, 8, 5)&lt;/code&gt; we also learn
that &lt;code&gt;load(v0, 8) == 5&lt;/code&gt;! Until it gets invalidated, anyway.&lt;/p&gt;
&lt;p&gt;For example, in this test, we can eliminate the load from &lt;code&gt;var0&lt;/code&gt; at offset 0:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test_load_after_store_removed&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;var0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;bb_to_str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"""&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;var0 = getarg(0)&lt;/span&gt;
&lt;span class="s2"&gt;var1 = store(var0, 0, 5)&lt;/span&gt;
&lt;span class="s2"&gt;var2 = load(var0, 1)&lt;/span&gt;
&lt;span class="s2"&gt;var3 = escape(5)&lt;/span&gt;
&lt;span class="s2"&gt;var4 = escape(var2)"""&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Making that work is thankfully not very hard; we need only add that new
information to the compile-time heap after removing all the
potentially-aliased info:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"store"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;compile_time_heap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="c1"&gt;# ... as before ...&lt;/span&gt;
            &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;new_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_value&lt;/span&gt;  &lt;span class="c1"&gt;# NEW!&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"load"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# ...&lt;/span&gt;
        &lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;opt_bb&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This makes the test pass. It makes another test fail, but only
because—oops—we now know more. You can delete the old test because the new
test supersedes it.&lt;/p&gt;
&lt;p&gt;Now, note that we are not removing the store. This is because we have nothing
in our optimizer that keeps track of what might have observed the side-effects
of the store. What if the object got &lt;code&gt;escape&lt;/code&gt;d? Or someone did a load later on?
We would only be able to remove the store (&lt;code&gt;continue&lt;/code&gt;) if we could guarantee it
was not observable.&lt;/p&gt;
&lt;p&gt;In our current framework, this only happens in one case: someone is doing a
store of the exact same value that already exists in our compile-time heap.
That is, either the same constant, or the same SSA value. If we see this, then
we can completely skip the second store instruction.&lt;/p&gt;
&lt;p&gt;Here's a test case for that, where we have gained information from the load
instruction that we can then use to get rid of the store instruction:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test_load_then_store&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;arg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;var1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;bb_to_str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"""&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;var0 = getarg(0)&lt;/span&gt;
&lt;span class="s2"&gt;var1 = load(var0, 0)&lt;/span&gt;
&lt;span class="s2"&gt;var2 = escape(var1)"""&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's make it pass. To do that, first we'll make an equality function that
works for both constants and operations. Constants are equal if their values
are equal, and operations are equal if they are the identical (by
address/pointer) operation.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;eq_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;left&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;left&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Constant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Constant&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;left&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;left&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;right&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is a partial equality: if two operations are not equal under &lt;code&gt;eq_value&lt;/code&gt;,
it doesn't mean that they are different, only that we don't know that they are
the same.&lt;/p&gt;
&lt;p&gt;Then, after that, we need only check if the current value in the compile-time
heap is the same as the value being stored in. If it is, wonderful. No need to
store. &lt;code&gt;continue&lt;/code&gt; and don't append the operation to &lt;code&gt;opt_bb&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"store"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;store_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;current_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store_info&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;new_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;eq_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# NEW!&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="n"&gt;compile_time_heap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="c1"&gt;# ... as before ...&lt;/span&gt;
            &lt;span class="c1"&gt;# ...&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"load"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;load_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;get_num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;load_info&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make_equal_to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;load_info&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="n"&gt;compile_time_heap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;load_info&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;
        &lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;opt_bb&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This makes our load-then-store pass and it also makes other tests pass too,
like eliminating a store after another store!&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test_store_after_store&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;arg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;bb_to_str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"""&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;var0 = getarg(0)&lt;/span&gt;
&lt;span class="s2"&gt;var1 = store(var0, 0, 5)"""&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Unfortunately, this only works if the values—constants or SSA values—are
known to be the same. If we store &lt;em&gt;different&lt;/em&gt; values, we can't optimize. In the
live stream, we left this an exercise for the viewer:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nd"&gt;@pytest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mark&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xfail&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test_exercise_for_the_reader&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;arg0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;var2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;opt_bb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimize_load_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;bb_to_str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opt_bb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"""&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;var0 = getarg(0)&lt;/span&gt;
&lt;span class="s2"&gt;var1 = store(var0, 0, 7)&lt;/span&gt;
&lt;span class="s2"&gt;var2 = escape(7)"""&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We would only be able to optimize this away if we had some notion of a store
being &lt;em&gt;dead&lt;/em&gt;. In this case, that is a store in which the value is never read
before being overwritten.&lt;/p&gt;
&lt;h3 id="removing-dead-stores"&gt;Removing dead stores&lt;/h3&gt;
&lt;p&gt;TODO, I suppose. I have not gotten this far yet. If I get around to it, I will
come back and update the post.&lt;/p&gt;
&lt;h3 id="in-the-real-world"&gt;In the real world&lt;/h3&gt;
&lt;p&gt;This small optimization pass may seem silly or fiddly—when would we ever see
something like this in a real IR?—but it's pretty useful. Here's the Ruby
code that got me thinking about it again some years later for ZJIT:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;C&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;initialize&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="vi"&gt;@a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="vi"&gt;@b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="vi"&gt;@c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;CRuby has a shape system and ZJIT makes use of it, so we end up optimizing this
code (if it's monomorphic) into a series of shape checks and stores. The HIR
might end up looking something like the mess below, where I've annotated the
shape guards (can be thought of as loads) and stores with asterisks:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
&lt;span class="cp"&gt;# ...&lt;/span&gt;
&lt;span class="n"&gt;bb2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v6&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;BasicObject&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;v10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Fixnum&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;v31&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;HeapBasicObject&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GuardType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HeapBasicObject&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v32&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;HeapBasicObject&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GuardShape&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x400000&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;StoreField&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="mh"&gt;@0x10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v10&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;WriteBarrier&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v10&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;v35&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;CShape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mh"&gt;0x40008e&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CShape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x40008e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;StoreField&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;_shape_id&lt;/span&gt;&lt;span class="mh"&gt;@0x4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v35&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;v16&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Fixnum&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;v37&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;HeapBasicObject&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GuardType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HeapBasicObject&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v38&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;HeapBasicObject&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GuardShape&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v37&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x40008e&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;StoreField&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v38&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="mh"&gt;@0x18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v16&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;WriteBarrier&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v38&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v16&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;v41&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;CShape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mh"&gt;0x40008f&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CShape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x40008f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;StoreField&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v38&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;_shape_id&lt;/span&gt;&lt;span class="mh"&gt;@0x4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v41&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;v22&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Fixnum&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;v43&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;HeapBasicObject&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GuardType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;HeapBasicObject&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v44&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;HeapBasicObject&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GuardShape&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v43&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x40008f&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;StoreField&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="mh"&gt;@0x20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v22&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;WriteBarrier&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v22&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;v47&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;CShape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mh"&gt;0x400090&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CShape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x400090&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;StoreField&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;_shape_id&lt;/span&gt;&lt;span class="mh"&gt;@0x4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v47&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;CheckInterrupts&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;v22&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If we had store-load forwarding in ZJIT, we could get rid of the intermediate
shape guards; they would know the shape from the previous &lt;code&gt;StoreField&lt;/code&gt;
instruction. If we had dead store elimination, we could get rid of the
intermediate shape writes; they are never read. (And the repeated type guards
to check if it's a heap object still are just silly and need to get removed
eventually.)&lt;/p&gt;
&lt;p&gt;This is on the roadmap and will make object initialization even faster than it
is right now.&lt;/p&gt;
&lt;h3 id="wrapping-up"&gt;Wrapping up&lt;/h3&gt;
&lt;p&gt;Thanks for reading the text version of the video that CF and I made a while
back. Now you know how to do load/store elimination on traces.&lt;/p&gt;
&lt;p&gt;I think this does not need too much extra work to get it going on full CFGs; a
block is pretty much the same as a trace, so you can do a block-local version
without much fuss. If you want to go global, you need dominator information and
gen-kill sets.&lt;/p&gt;
&lt;p&gt;Maybe I will touch on this in a future post...&lt;/p&gt;
&lt;h3 id="thank-you"&gt;Thank you&lt;/h3&gt;
&lt;p&gt;Thank you to CF, who walked me through this live on a stream two years ago!
This blog post wouldn't be possible without you.&lt;/p&gt;
&lt;div class="footnote"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:size"&gt;
&lt;p&gt;In this toy optimizer example, we are assuming that all reads and writes
are the same size and different offsets don't overlap at all. This is often
the case for managed runtimes, where object fields are pointer-sized and
all reads/writes are pointed aligned. &lt;a class="footnote-backref" href="https://www.pypy.org/posts/2025/12/toy-load-store.html#fnref:size" title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:tbaa"&gt;
&lt;p&gt;We could do better. If we had type information, we could also use that
to make alias classes. Writes to a List will never overlap with writes to a
Map, for example. This requires your compiler to have strict aliasing—if
you can freely cast between types, as in C, then this tactic goes out the
window.&lt;/p&gt;
&lt;p&gt;This is called &lt;a href="https://www.pypy.org/assets/img/tbaa.pdf"&gt;Type-based alias analysis&lt;/a&gt; (PDF). &lt;a class="footnote-backref" href="https://www.pypy.org/posts/2025/12/toy-load-store.html#fnref:tbaa" title="Jump back to footnote 2 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description><category>toy-optimizer</category><guid>https://www.pypy.org/posts/2025/12/toy-load-store.html</guid><pubDate>Wed, 24 Dec 2025 23:00:00 GMT</pubDate></item><item><title>PyPy v7.3.20 release</title><link>https://www.pypy.org/posts/2025/07/pypy-v7320-release.html</link><dc:creator>mattip</dc:creator><description>&lt;section id="pypy-v7-3-20-release-of-python-2-7-3-11"&gt;
&lt;h2&gt;PyPy v7.3.20: release of python 2.7, 3.11&lt;/h2&gt;
&lt;p&gt;The PyPy team is proud to release version 7.3.20 of PyPy after the previous
release on Feb 26, 2025. The release fixes some subtle bugs in ctypes and
&lt;code class="docutils literal"&gt;OrderedDict&lt;/code&gt; and makes PyPy3.11 compatible with an upcoming release of
Cython.&lt;/p&gt;
&lt;p&gt;The release includes two different interpreters:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;PyPy2.7, which is an interpreter supporting the syntax and the features of
Python 2.7 including the stdlib for CPython 2.7.18+ (the &lt;code class="docutils literal"&gt;+&lt;/code&gt; is for
backported security updates)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PyPy3.11, which is an interpreter supporting the syntax and the features of
Python 3.11, including the stdlib for CPython 3.11.13.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The interpreters are based on much the same codebase, thus the double
release. This is a micro release, all APIs are compatible with the other 7.3
releases.&lt;/p&gt;
&lt;p&gt;We recommend updating. You can find links to download the releases here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a class="reference external" href="https://pypy.org/download.html"&gt;https://pypy.org/download.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We would like to thank our donors for the continued support of the PyPy
project. If PyPy is not quite good enough for your needs, we are available for
&lt;a class="reference external" href="https://www.pypy.org/pypy-sponsors.html"&gt;direct consulting&lt;/a&gt; work. If PyPy is helping you out, we would love to hear
about it and encourage submissions to our &lt;a class="reference external" href="https://pypy.org/blog"&gt;blog&lt;/a&gt; via a pull request
to &lt;a class="reference external" href="https://github.com/pypy/pypy.org"&gt;https://github.com/pypy/pypy.org&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We would also like to thank our contributors and encourage new people to join
the project. PyPy has many layers and we need help with all of them: bug fixes,
&lt;a class="reference external" href="https://doc.pypy.org/"&gt;PyPy&lt;/a&gt; and &lt;a class="reference external" href="https://rpython.readthedocs.org"&gt;RPython&lt;/a&gt; documentation improvements, or general &lt;a class="reference external" href="https://doc.pypy.org/en/latest/project-ideas.html"&gt;help&lt;/a&gt; with
making RPython's JIT even better.&lt;/p&gt;
&lt;p&gt;If you are a python library maintainer and use C-extensions, please consider
making a &lt;a class="reference external" href="https://hpyproject.org/"&gt;HPy&lt;/a&gt; / &lt;a class="reference external" href="https://cffi.readthedocs.io"&gt;CFFI&lt;/a&gt; / &lt;a class="reference external" href="https://cppyy.readthedocs.io"&gt;cppyy&lt;/a&gt; version of your library that would be performant
on PyPy. In any case, &lt;a class="reference external" href="https://github.com/joerick/cibuildwheel"&gt;cibuildwheel&lt;/a&gt; supports building wheels for PyPy.&lt;/p&gt;
&lt;section id="what-is-pypy"&gt;
&lt;h3&gt;What is PyPy?&lt;/h3&gt;
&lt;p&gt;PyPy is a Python interpreter, a drop-in replacement for CPython
It's fast (&lt;a class="reference external" href="https://speed.pypy.org"&gt;PyPy and CPython&lt;/a&gt; performance
comparison) due to its integrated tracing JIT compiler.&lt;/p&gt;
&lt;p&gt;We also welcome developers of other &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/examples.html"&gt;dynamic languages&lt;/a&gt; to see what RPython
can do for them.&lt;/p&gt;
&lt;p&gt;We provide binary builds for:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;x86&lt;/strong&gt; machines on most common operating systems
(Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;64-bit &lt;strong&gt;ARM&lt;/strong&gt; machines running Linux (&lt;code class="docutils literal"&gt;aarch64&lt;/code&gt;) and macos (&lt;code class="docutils literal"&gt;macos_arm64&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PyPy supports Windows 32-bit, Linux PPC64 big- and little-endian, Linux ARM
32 bit, RISC-V RV64IMAFD Linux, and s390x Linux but does not release binaries.
Please reach out to us if you wish to sponsor binary releases for those
platforms. Downstream packagers provide binary builds for debian, Fedora,
conda, OpenBSD, FreeBSD, Gentoo, and more.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-else-is-new"&gt;
&lt;h3&gt;What else is new?&lt;/h3&gt;
&lt;p&gt;For more information about the 7.3.20 release, see the &lt;a class="reference external" href="https://doc.pypy.org/en/latest/release-v7.3.20.html#changelog"&gt;full changelog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please update, and continue to help us make pypy better.&lt;/p&gt;
&lt;p&gt;Cheers,
The PyPy Team&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;</description><category>release</category><guid>https://www.pypy.org/posts/2025/07/pypy-v7320-release.html</guid><pubDate>Fri, 04 Jul 2025 12:00:00 GMT</pubDate></item><item><title>How fast can the RPython GC allocate?</title><link>https://www.pypy.org/posts/2025/06/rpython-gc-allocation-speed.html</link><dc:creator>CF Bolz-Tereick</dc:creator><description>&lt;p&gt;While working on a paper about &lt;a href="https://pypy.org/posts/2025/02/pypy-gc-sampling.html"&gt;allocation profiling in
VMProf&lt;/a&gt; I got curious
about how quickly the RPython GC can allocate an object. I wrote a small
RPython benchmark program to get an idea of the order of magnitude.&lt;/p&gt;
&lt;p&gt;The basic idea is to just allocate an instance in a tight loop:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;A&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# preliminary idea, see below&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The RPython type inference will find out that instances of &lt;code&gt;A&lt;/code&gt; have a single
&lt;code&gt;i&lt;/code&gt; field, which is an integer. In addition to that field, every RPython object
needs one word of GC meta-information. Therefore one instance of &lt;code&gt;A&lt;/code&gt; needs 16
bytes on a 64-bit architecture.&lt;/p&gt;
&lt;p&gt;However, measuring like this is not good enough, because the RPython static
optimizer would remove the allocation since the object isn't used. But we can
confuse the escape analysis sufficiently by always keeping two instances alive
at the same time:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;A&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# print the instances at the end&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;(I confirmed that the allocation isn't being removed by looking at the C code
that the RPython compiler generates from this.)&lt;/p&gt;
&lt;p&gt;This is doing a little bit more work than needed, because of the &lt;code&gt;a.i = i&lt;/code&gt;
instance attribute write. We can also (optionally) leave the field
uninitialized.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;initialize_field&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;t1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;initialize_field&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
            &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# make sure always two objects are alive&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
            &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;t2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'s'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;object_size_in_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="c1"&gt;# GC header, one integer field&lt;/span&gt;
    &lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;object_size_in_words&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1024.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1024.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1024.0&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'GB'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'GB/s'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Then we need to add some RPython scaffolding:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;loops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;with_init&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;with_init&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"with initialization"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"without initialization"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;with_init&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;target&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To build a binary:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="go"&gt;pypy rpython/bin/rpython targetallocatealot.py&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Which will turn the RPython code into C code and use a C compiler to turn that
into a binary, containing both our code above as well as the RPython garbage
collector.&lt;/p&gt;
&lt;p&gt;Then we can run it (all results again from my AMD Ryzen 7 PRO 7840U, running
Ubuntu Linux 24.04.2):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;./targetallocatealot-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1000000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="go"&gt;without initialization&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;A object at 0x7c71ad84cf60&amp;gt; &amp;lt;A object at 0x7c71ad84cf70&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;0.433825 s&lt;/span&gt;
&lt;span class="go"&gt;14.901161 GB&lt;/span&gt;
&lt;span class="go"&gt;34.348322 GB/s&lt;/span&gt;
&lt;span class="gp"&gt;$ &lt;/span&gt;./targetallocatealot-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1000000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="go"&gt;with initialization&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;A object at 0x71b41c82cf60&amp;gt; &amp;lt;A object at 0x71b41c82cf70&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;0.501856 s&lt;/span&gt;
&lt;span class="go"&gt;14.901161 GB&lt;/span&gt;
&lt;span class="go"&gt;29.692100 GB/s&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's compare it with the Boehm GC:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;pypy&lt;span class="w"&gt; &lt;/span&gt;rpython/bin/rpython&lt;span class="w"&gt; &lt;/span&gt;--gc&lt;span class="o"&gt;=&lt;/span&gt;boehm&lt;span class="w"&gt; &lt;/span&gt;--output&lt;span class="o"&gt;=&lt;/span&gt;targetallocatealot-c-boehm&lt;span class="w"&gt; &lt;/span&gt;targetallocatealot.py&lt;span class="w"&gt; &lt;/span&gt;
&lt;span class="go"&gt;...&lt;/span&gt;
&lt;span class="gp"&gt;$ &lt;/span&gt;./targetallocatealot-c-boehm&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1000000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="go"&gt;without initialization&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;A object at 0xffff8bd058a6e3af&amp;gt; &amp;lt;A object at 0xffff8bd058a6e3bf&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;9.722585 s&lt;/span&gt;
&lt;span class="go"&gt;14.901161 GB&lt;/span&gt;
&lt;span class="go"&gt;1.532634 GB/s&lt;/span&gt;
&lt;span class="gp"&gt;$ &lt;/span&gt;./targetallocatealot-c-boehm&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1000000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="go"&gt;with initialization&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;A object at 0xffff88e1132983af&amp;gt; &amp;lt;A object at 0xffff88e1132983bf&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;9.684149 s&lt;/span&gt;
&lt;span class="go"&gt;14.901161 GB&lt;/span&gt;
&lt;span class="go"&gt;1.538717 GB/s&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is not a fair comparison, because the Boehm GC uses conservative stack
scanning, therefore it cannot move objects, which requires much more
complicated allocation.&lt;/p&gt;
&lt;h3 id="lets-look-at-perf-stats"&gt;Let's look at &lt;code&gt;perf stats&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;We can use &lt;code&gt;perf&lt;/code&gt; to get some statistics about the executions:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;perf&lt;span class="w"&gt; &lt;/span&gt;stat&lt;span class="w"&gt; &lt;/span&gt;-e&lt;span class="w"&gt; &lt;/span&gt;cache-references,cache-misses,cycles,instructions,branches,faults,migrations&lt;span class="w"&gt; &lt;/span&gt;./targetallocatealot-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;10000000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="go"&gt;without initialization&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;A object at 0x7aa260e35980&amp;gt; &amp;lt;A object at 0x7aa260e35990&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;4.301442 s&lt;/span&gt;
&lt;span class="go"&gt;149.011612 GB&lt;/span&gt;
&lt;span class="go"&gt;34.642245 GB/s&lt;/span&gt;

&lt;span class="go"&gt; Performance counter stats for './targetallocatealot-c 10000000000 0':&lt;/span&gt;

&lt;span class="go"&gt;     7,244,117,828      cache-references                                                      &lt;/span&gt;
&lt;span class="go"&gt;        23,446,661      cache-misses                     #    0.32% of all cache refs         &lt;/span&gt;
&lt;span class="go"&gt;    21,074,240,395      cycles                                                                &lt;/span&gt;
&lt;span class="go"&gt;   110,116,790,943      instructions                     #    5.23  insn per cycle            &lt;/span&gt;
&lt;span class="go"&gt;    20,024,347,488      branches                                                              &lt;/span&gt;
&lt;span class="go"&gt;             1,287      faults                                                                &lt;/span&gt;
&lt;span class="go"&gt;                24      migrations                                                            &lt;/span&gt;

&lt;span class="go"&gt;       4.303071693 seconds time elapsed&lt;/span&gt;

&lt;span class="go"&gt;       4.297557000 seconds user&lt;/span&gt;
&lt;span class="go"&gt;       0.003998000 seconds sys&lt;/span&gt;

&lt;span class="gp"&gt;$ &lt;/span&gt;perf&lt;span class="w"&gt; &lt;/span&gt;stat&lt;span class="w"&gt; &lt;/span&gt;-e&lt;span class="w"&gt; &lt;/span&gt;cache-references,cache-misses,cycles,instructions,branches,faults,migrations&lt;span class="w"&gt; &lt;/span&gt;./targetallocatealot-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;10000000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="go"&gt;with initialization&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;A object at 0x77ceb0235980&amp;gt; &amp;lt;A object at 0x77ceb0235990&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;5.016772 s&lt;/span&gt;
&lt;span class="go"&gt;149.011612 GB&lt;/span&gt;
&lt;span class="go"&gt;29.702688 GB/s&lt;/span&gt;

&lt;span class="go"&gt; Performance counter stats for './targetallocatealot-c 10000000000 1':&lt;/span&gt;

&lt;span class="go"&gt;     7,571,461,470      cache-references                                                      &lt;/span&gt;
&lt;span class="go"&gt;       241,915,266      cache-misses                     #    3.20% of all cache refs         &lt;/span&gt;
&lt;span class="go"&gt;    24,503,497,532      cycles                                                                &lt;/span&gt;
&lt;span class="go"&gt;   130,126,387,460      instructions                     #    5.31  insn per cycle            &lt;/span&gt;
&lt;span class="go"&gt;    20,026,280,693      branches                                                              &lt;/span&gt;
&lt;span class="go"&gt;             1,285      faults                                                                &lt;/span&gt;
&lt;span class="go"&gt;                21      migrations                                                            &lt;/span&gt;

&lt;span class="go"&gt;       5.019444749 seconds time elapsed&lt;/span&gt;

&lt;span class="go"&gt;       5.012924000 seconds user&lt;/span&gt;
&lt;span class="go"&gt;       0.005999000 seconds sys&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is pretty cool, we can run this loop with &amp;gt;5 instructions per cycle. Every
allocation takes &lt;code&gt;110116790943 / 10000000000 ≈ 11&lt;/code&gt; instructions and
&lt;code&gt;21074240395 / 10000000000 ≈ 2.1&lt;/code&gt; cycles, including the loop around it.&lt;/p&gt;
&lt;h3 id="how-often-does-the-gc-run"&gt;How often does the GC run?&lt;/h3&gt;
&lt;p&gt;The RPython GC queries the L2 cache size to determine the size of the nursery.
We can find out what it is by turning on PYPYLOG, selecting the proper logging
categories, and printing to &lt;code&gt;stdout&lt;/code&gt; via &lt;code&gt;:-&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;&lt;span class="nv"&gt;PYPYLOG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gc-set-nursery-size,gc-hardware:-&lt;span class="w"&gt; &lt;/span&gt;./targetallocatealot-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="go"&gt;[f3e6970465723] {gc-set-nursery-size&lt;/span&gt;
&lt;span class="go"&gt;nursery size: 270336&lt;/span&gt;
&lt;span class="go"&gt;[f3e69704758f3] gc-set-nursery-size}&lt;/span&gt;
&lt;span class="go"&gt;[f3e697047b9a1] {gc-hardware&lt;/span&gt;
&lt;span class="go"&gt;L2cache = 1048576&lt;/span&gt;
&lt;span class="go"&gt;[f3e69705ced19] gc-hardware}&lt;/span&gt;
&lt;span class="go"&gt;[f3e69705d11b5] {gc-hardware&lt;/span&gt;
&lt;span class="go"&gt;memtotal = 32274210816.000000&lt;/span&gt;
&lt;span class="go"&gt;[f3e69705f4948] gc-hardware}&lt;/span&gt;
&lt;span class="go"&gt;[f3e6970615f78] {gc-set-nursery-size&lt;/span&gt;
&lt;span class="go"&gt;nursery size: 4194304&lt;/span&gt;
&lt;span class="go"&gt;[f3e697061ecc0] gc-set-nursery-size}&lt;/span&gt;
&lt;span class="go"&gt;with initialization&lt;/span&gt;
&lt;span class="go"&gt;NULL &amp;lt;A object at 0x7fa7b1434020&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;0.000008 s&lt;/span&gt;
&lt;span class="go"&gt;0.000000 GB&lt;/span&gt;
&lt;span class="go"&gt;0.001894 GB/s&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;So the nursery is 4 MiB. This means that when we allocate 14.9 GiB the GC needs to perform &lt;code&gt;10000000000 * 16 / 4194304 ≈ 38146&lt;/code&gt; minor collections. Let's confirm that:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;&lt;span class="nv"&gt;PYPYLOG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gc-minor:out&lt;span class="w"&gt; &lt;/span&gt;./targetallocatealot-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;10000000000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="go"&gt;with initialization&lt;/span&gt;
&lt;span class="go"&gt;w&amp;lt;A object at 0x7991e3835980&amp;gt; &amp;lt;A object at 0x7991e3835990&amp;gt;&lt;/span&gt;
&lt;span class="go"&gt;5.315511 s&lt;/span&gt;
&lt;span class="go"&gt;149.011612 GB&lt;/span&gt;
&lt;span class="go"&gt;28.033356 GB/s&lt;/span&gt;
&lt;span class="gp"&gt;$ &lt;/span&gt;head&lt;span class="w"&gt; &lt;/span&gt;out
&lt;span class="go"&gt;[f3ee482f4cd97] {gc-minor&lt;/span&gt;
&lt;span class="go"&gt;[f3ee482f53874] {gc-minor-walkroots&lt;/span&gt;
&lt;span class="go"&gt;[f3ee482f54117] gc-minor-walkroots}&lt;/span&gt;
&lt;span class="go"&gt;minor collect, total memory used: 0&lt;/span&gt;
&lt;span class="go"&gt;number of pinned objects: 0&lt;/span&gt;
&lt;span class="go"&gt;total size of surviving objects: 0&lt;/span&gt;
&lt;span class="go"&gt;time taken: 0.000029&lt;/span&gt;
&lt;span class="go"&gt;[f3ee482f67b7e] gc-minor}&lt;/span&gt;
&lt;span class="go"&gt;[f3ee4838097c5] {gc-minor&lt;/span&gt;
&lt;span class="go"&gt;[f3ee48380c945] {gc-minor-walkroots&lt;/span&gt;
&lt;span class="gp"&gt;$ &lt;/span&gt;grep&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{gc-minor-walkroots"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;out&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;wc&lt;span class="w"&gt; &lt;/span&gt;-l
&lt;span class="go"&gt;38147&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each minor collection is very quick, because a minor collection is
O(surviving objects), and in this program only one object survive each time
(the other instance is in the process of being allocated).
Also, the GC root shadow stack is only one entry, so walking that is super
quick as well. The time the minor collections take is logged to the out file:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;grep&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"time taken"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;out&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;tail
&lt;span class="go"&gt;time taken: 0.000002&lt;/span&gt;
&lt;span class="go"&gt;time taken: 0.000002&lt;/span&gt;
&lt;span class="go"&gt;time taken: 0.000002&lt;/span&gt;
&lt;span class="go"&gt;time taken: 0.000002&lt;/span&gt;
&lt;span class="go"&gt;time taken: 0.000002&lt;/span&gt;
&lt;span class="go"&gt;time taken: 0.000002&lt;/span&gt;
&lt;span class="go"&gt;time taken: 0.000002&lt;/span&gt;
&lt;span class="go"&gt;time taken: 0.000003&lt;/span&gt;
&lt;span class="go"&gt;time taken: 0.000002&lt;/span&gt;
&lt;span class="go"&gt;time taken: 0.000002&lt;/span&gt;
&lt;span class="gp"&gt;$ &lt;/span&gt;grep&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"time taken"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;out&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;grep&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.*"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;numsum
&lt;span class="go"&gt;0.0988160000000011&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;(This number is super approximate due to float formatting rounding.)&lt;/p&gt;
&lt;p&gt;that means that &lt;code&gt;0.0988160000000011 / 5.315511 ≈ 2%&lt;/code&gt; of the time is spent in the GC.&lt;/p&gt;
&lt;h3 id="what-does-the-generated-machine-code-look-like"&gt;What does the generated machine code look like?&lt;/h3&gt;
&lt;p&gt;The allocation fast path of the RPython GC is a simple bump pointer, in Python
pseudo-code it would look roughly like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt;
&lt;span class="c1"&gt;# Move nursery_free pointer forward by totalsize&lt;/span&gt;
&lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;totalsize&lt;/span&gt;
&lt;span class="c1"&gt;# Check if this allocation would exceed the nursery&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# If it does =&amp;gt; collect the nursery and al&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collect_and_reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;totalsize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hdr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;GC&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;So we can disassemble the compiled binary &lt;code&gt;targetallocatealot-c&lt;/code&gt; and try to
find the equivalent logic in machine code. I'm super bad at reading machine
code, but I tried to annotate what I think is the core loop (the version
without initializing the &lt;code&gt;i&lt;/code&gt; field) below:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb68&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;mov&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rbx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rdi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb6b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;mov&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rdx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rbx&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# initialize object header of object allocated in previous iteration&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb6e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;movq&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;$0x4c8&lt;/span&gt;&lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rbx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# loop termination check&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb75&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;cmp&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rbp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;r12&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb78&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;je&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="n"&gt;ccb8&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# load nursery_free&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb7e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;mov&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mh"&gt;0x33c13&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rip&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rdx&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# increment loop counter&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb85&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;$0x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rbp&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# add 16 (size of object) to nursery_free&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb89&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;lea&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mh"&gt;0x10&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rdx&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rax&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# compare nursery_top with new nursery_free&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb8d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;cmp&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mh"&gt;0x33c24&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# store new nursery_free&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb94&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;mov&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mh"&gt;0x33bfd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# if new nursery_free exceeds nursery_top, fall through to slow path, if not, start at top&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb9b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;jae&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;cb68&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# slow path from here on:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# save live object from last iteration to GC shadow stack&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cb9d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;mov&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rbx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mh"&gt;-0x8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rcx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cba1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;mov&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;r13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;rdi&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cba4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;mov&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;$0x10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;esi&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="cp"&gt;# do minor collection&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nl"&gt;cba9&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;20800&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;pypy_g_IncrementalMiniMarkGC_collect_and_reserve&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="running-the-benchmark-as-regular-python-code"&gt;Running the benchmark as regular Python code&lt;/h3&gt;
&lt;p&gt;So far we ran this code as &lt;em&gt;RPython&lt;/em&gt;, i.e. type inference is performed and the
program is translated to a C binary. We can also run it on top of PyPy, as a
regular Python3 program. However, an instance of a user-defined class in regular
Python when run on PyPy is actually a much larger object, due to &lt;a href="https://pypy.org/posts/2010/11/efficiently-implementing-python-objects-3838329944323946932.html"&gt;dynamic
typing&lt;/a&gt;.
It's at least 7 words, which is 56 bytes.&lt;/p&gt;
&lt;p&gt;However, we can simply use &lt;code&gt;int&lt;/code&gt; objects instead. Integers are allocated on the
heap and consist of two words, one for the GC and one with the
machine-word-sized integer value, if the integer fits into a signed 64-bit
representation (otherwise a less compact different representation is used,
which can represent arbitrarily large integers).&lt;/p&gt;
&lt;p&gt;Therefore, we can simply use this kind of code:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;time&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;t1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# make sure always two objects are alive&lt;/span&gt;
    &lt;span class="n"&gt;t2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;object_size_in_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="c1"&gt;# GC header, one integer field&lt;/span&gt;
    &lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1024.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1024.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1024.0&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'GB'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'GB/s'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;loops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In this case we can't really leave the value uninitialized though.&lt;/p&gt;
&lt;p&gt;We can run this both with and without the JIT:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;pypy3&lt;span class="w"&gt; &lt;/span&gt;allocatealot.py&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1000000000&lt;/span&gt;
&lt;span class="go"&gt;999999998 999999999&lt;/span&gt;
&lt;span class="go"&gt;14.901161193847656 GB&lt;/span&gt;
&lt;span class="go"&gt;17.857494904899553 GB/s&lt;/span&gt;
&lt;span class="gp"&gt;$ &lt;/span&gt;pypy3&lt;span class="w"&gt; &lt;/span&gt;--jit&lt;span class="w"&gt; &lt;/span&gt;off&lt;span class="w"&gt; &lt;/span&gt;allocatealot.py&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1000000000&lt;/span&gt;
&lt;span class="go"&gt;999999998 999999999&lt;/span&gt;
&lt;span class="go"&gt;14.901161193847656 GB&lt;/span&gt;
&lt;span class="go"&gt;0.8275382375297171 GB/s&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is obviously much less efficient than the C code, the PyPy JIT generates
much less efficient machine code than GCC. Still, "only" twice as slow is kind
of cool anyway.&lt;/p&gt;
&lt;p&gt;(Running it with CPython doesn't really make sense for this measurements, since
CPython ints are bigger – &lt;code&gt;sys.getsizeof(5)&lt;/code&gt; reports 28 bytes.)&lt;/p&gt;
&lt;h3 id="the-machine-code-that-the-jit-generates"&gt;The machine code that the JIT generates&lt;/h3&gt;
&lt;p&gt;Unfortunately it's a bit of a journey to show the machine code that PyPy's JIT generates for this. First we need to run with all jit logging categories:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;&lt;span class="nv"&gt;PYPYLOG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;jit:out&lt;span class="w"&gt; &lt;/span&gt;pypy3&lt;span class="w"&gt; &lt;/span&gt;allocatealot.py&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1000000000&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Then we can read the log file to find the trace IR for the loop under the logging category &lt;code&gt;jit-log-opt&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;532&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i34&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p19&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p29&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;descr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TargetToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;137358545605472&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;debug_merge_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-9~#24 FOR_ITER'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# are we at the end of the loop&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;552&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i45&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;int_lt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i35&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;555&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;guard_true&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;descr&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Guard0x7ced4756a160&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p19&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p29&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i34&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;561&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i47&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;int_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;debug_merge_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-9~#26 STORE_FAST'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;debug_merge_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-10~#28 LOAD_FAST'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;debug_merge_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-10~#30 STORE_FAST'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;debug_merge_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-11~#32 LOAD_FAST'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;debug_merge_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-11~#34 STORE_FAST'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;debug_merge_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-11~#36 JUMP_ABSOLUTE'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# update iterator object&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;565&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;setfield_gc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i47&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;descr&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;FieldS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pypy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__builtin__&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functional&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W_IntRangeIterator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inst_current&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;569&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;guard_not_invalidated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;descr&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Guard0x7ced4756a1b0&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p19&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p29&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i34&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# check for signals&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;569&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i49&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;getfield_raw_i&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;137358624889824&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;descr&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;FieldS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pypysig_long_struct_inner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;582&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i51&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;int_lt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i49&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;586&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;guard_false&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i51&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;descr&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Guard0x7ced4754db78&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p19&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p29&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i34&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;debug_merge_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-9~#24 FOR_ITER'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# allocate the integer (allocation sunk to the end of the trace)&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;592&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p52&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;new_with_vtable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;descr&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;SizeDescr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;630&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;setfield_gc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p52&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i34&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;descr&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;FieldS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pypy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objspace&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intobject&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W_IntObject&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inst_intval&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pure&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;634&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p52&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p19&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p29&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;p31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i47&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;descr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TargetToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;137358545605472&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To find the machine code address of the trace, we need to search for this line:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nx"&gt;Loop&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;cfbolz&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;projects&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;gitpypy&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;allocatealot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;py&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;FOR_ITER&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;\
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;has&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;address&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x7ced473ffa0b&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x7ced473ffbb0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bootstrap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0x7ced473ff980&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Then we can use a script in the PyPy repo to disassemble the generated machine code:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;pypy&lt;span class="w"&gt; &lt;/span&gt;rpython/jit/backend/tool/viewcode.py&lt;span class="w"&gt; &lt;/span&gt;out
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This will dump all the machine code to stdout, and open a &lt;a href="https://pypy.org/posts/2021/04/ways-pypy-graphviz.html"&gt;pygame-based
graphviz cfg&lt;/a&gt;. In there
we can search for the address and see this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Graphviz based visualization of the machine code the JIT generates" src="https://www.pypy.org/images/2025-allocatealot-machine-code.png"&gt;&lt;/p&gt;
&lt;p&gt;Here's an annotated version with what I think this code does:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="x"&gt;# increment the profile counter&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb40:   48 ff 04 25 20 9e 33    incq   0x38339e20&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb47:   38 &lt;/span&gt;

&lt;span class="x"&gt;# check whether the loop is done&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb48:   4c 39 fe                cmp    %r15,%rsi&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb4b:   0f 8d 76 01 00 00       jge    0x7ced473ffcc7&lt;/span&gt;

&lt;span class="x"&gt;# increment iteration variable&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb51:   4c 8d 66 01             lea    0x1(%rsi),%r12&lt;/span&gt;

&lt;span class="x"&gt;# update iterator object&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb55:   4d 89 61 08             mov    %r12,0x8(%r9)&lt;/span&gt;

&lt;span class="x"&gt;# check for ctrl-c/thread switch&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb59:   49 bb e0 1b 0b 4c ed    movabs $0x7ced4c0b1be0,%r11&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb60:   7c 00 00 &lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb63:   49 8b 0b                mov    (%r11),%rcx&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb66:   48 83 f9 00             cmp    $0x0,%rcx&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb6a:   0f 8c 8f 01 00 00       jl     0x7ced473ffcff&lt;/span&gt;

&lt;span class="x"&gt;# load nursery_free pointer&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb70:   49 8b 8b d8 30 f6 fe    mov    -0x109cf28(%r11),%rcx&lt;/span&gt;

&lt;span class="x"&gt;# add size (16)&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb77:   48 8d 51 10             lea    0x10(%rcx),%rdx&lt;/span&gt;

&lt;span class="x"&gt;# compare against nursery top&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb7b:   49 3b 93 f8 30 f6 fe    cmp    -0x109cf08(%r11),%rdx&lt;/span&gt;

&lt;span class="x"&gt;# jump to slow path if nursery is full&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb82:   0f 87 41 00 00 00       ja     0x7ced473ffbc9&lt;/span&gt;

&lt;span class="x"&gt;# store new value of nursery free&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb88:   49 89 93 d8 30 f6 fe    mov    %rdx,-0x109cf28(%r11)&lt;/span&gt;

&lt;span class="x"&gt;# initialize GC header&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb8f:   48 c7 01 30 11 00 00    movq   $0x1130,(%rcx)&lt;/span&gt;

&lt;span class="x"&gt;# initialize integer field&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb96:   48 89 41 08             mov    %rax,0x8(%rcx)&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb9a:   48 89 f0                mov    %rsi,%rax&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffb9d:   48 89 8d 60 01 00 00    mov    %rcx,0x160(%rbp)&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffba4:   4c 89 e6                mov    %r12,%rsi&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffba7:   e9 94 ff ff ff          jmp    0x7ced473ffb40&lt;/span&gt;
&lt;span class="x"&gt;7ced473ffbac:   0f 1f 40 00             nopl   0x0(%rax)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;The careful design of the RPython GC's allocation fast path gives pretty good
allocation rates. This technique isn't really new, it's a pretty typical way to
design a GC. Apart from that, my main conclusion would be that computers are
fast or something? Indeed, when we ran the same code on my colleague's
two-year-old AMD, we got quite a bit worse results, so a lot of the speed seems
to be due to the hard work of CPU architects.&lt;/p&gt;</description><category>benchmarking</category><category>gc</category><category>rpython</category><guid>https://www.pypy.org/posts/2025/06/rpython-gc-allocation-speed.html</guid><pubDate>Sun, 15 Jun 2025 13:48:30 GMT</pubDate></item><item><title>Doing the Prospero-Challenge in RPython</title><link>https://www.pypy.org/posts/2025/04/prospero-in-rpython.html</link><dc:creator>CF Bolz-Tereick</dc:creator><description>&lt;p&gt;Recently I had a lot of fun playing with the &lt;a href="https://www.mattkeeter.com/projects/prospero/"&gt;Prospero
Challenge&lt;/a&gt; by &lt;a href="https://www.mattkeeter.com/"&gt;Matt
Keeter&lt;/a&gt;. The challenge is to render a 1024x1024 image of
a quote from The Tempest by Shakespeare. The input is a mathematical formula
with 7866 operations, which is evaluated once per pixel.&lt;/p&gt;
&lt;p&gt;What made the challenge particularly enticing for me personally was the fact
that the formula is basically a trace in
&lt;a href="https://en.wikipedia.org/wiki/Static_single-assignment_form"&gt;SSA-form&lt;/a&gt; – a
linear sequence of operations, where every variable is assigned exactly once.
The challenge is to evaluate the formula as fast as possible. I tried a number
of ideas how to speed up execution and will talk about them in this somewhat
meandering post. Most of it follows Matt's implementation
&lt;a href="https://github.com/mkeeter/fidget"&gt;Fidget&lt;/a&gt; very closely. There are two points
of difference:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I tried to add more peephole optimizations, but they didn't end up helping
  much.&lt;/li&gt;
&lt;li&gt;I implemented a "demanded information" optimization that removes a lot of
  operations by only keeping the sign of the result. This optimization ended up
  being useful.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most of the prototyping in this post was done in RPython (a statically typable
subset of Python2, that can be compiled to C), but I later rewrote the program
in C to get better performance. All the code &lt;a href="https://github.com/cfbolz/pyfidget/"&gt;can be found on
Github&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="input-program"&gt;Input program&lt;/h3&gt;
&lt;p&gt;The input program is a sequence of operations, like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;_0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2.95&lt;/span&gt;
&lt;span class="n"&gt;_1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="n"&gt;_2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;8.13008&lt;/span&gt;
&lt;span class="n"&gt;_3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;_1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;_2&lt;/span&gt;
&lt;span class="n"&gt;_4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;_0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;_3&lt;/span&gt;
&lt;span class="n"&gt;_5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;3.675&lt;/span&gt;
&lt;span class="n"&gt;_6&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;_5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;_3&lt;/span&gt;
&lt;span class="n"&gt;_7&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;neg&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;_6&lt;/span&gt;
&lt;span class="n"&gt;_8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;_4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;_7&lt;/span&gt;
&lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The first column is the name of the result variable, the second column is the
operation, and the rest are the arguments to the operation. &lt;code&gt;var-x&lt;/code&gt; is a
special operation that returns the x-coordinate of the pixel being rendered,
and equivalently for &lt;code&gt;var-y&lt;/code&gt; the y-coordinate. The sign of the result gives the
color of the pixel, the absolute value is not important.&lt;/p&gt;
&lt;h3 id="a-baseline-interpreter"&gt;A baseline interpreter&lt;/h3&gt;
&lt;p&gt;To run the program, I first parse them and replace the register names with
indexes, to avoid any dictionary lookups at runtime.
Then I implemented a simple interpreter for the SSA-form
input program. The interpreter is a simple register machine, where every
operation is executed in order. The result of the operation is stored into a
list of results, and the next operation is executed. This was the slow baseline
implementation of the interpreter but it's very useful to compare against the optimized
versions.&lt;/p&gt;
&lt;p&gt;This is roughly what the code looks like&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;DirectFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;next&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;run_floats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setxyz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;setxyz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;program&lt;/span&gt;
        &lt;span class="n"&gt;num_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_operations&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;floatvalues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;num_ops&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_ops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_func_and_args&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;const&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;floatvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="n"&gt;farg0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;floatvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;farg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;floatvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;var_x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;var_y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;var_z&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;farg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;farg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;farg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;farg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;farg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;farg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;farg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;farg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;farg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;farg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;farg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;farg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;farg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;neg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;neg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;farg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;farg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
            &lt;span class="n"&gt;floatvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;floatvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;num_ops&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;mul&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;neg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Running the naive interpreter on the prospero image file is super slow, since
it performs 7866 * 1024 * 1024 float operations, plus the interpretation overhead.&lt;/p&gt;
&lt;h3 id="using-quadtrees-to-render-the-picture"&gt;Using Quadtrees to render the picture&lt;/h3&gt;
&lt;p&gt;The approach that Matt describes in his really excellent
&lt;a href="https://www.youtube.com/watch?v=UxGxsGnbyJ4"&gt;talk&lt;/a&gt; is to use
&lt;a href="https://en.wikipedia.org/wiki/Quadtree"&gt;quadtrees&lt;/a&gt;: recursively subdivide the
image into quadrants, and evaluate the formula in each quadrant. For every
quadrant you can simplify the formula by doing a range analysis. After a few
recursion steps, the formula becomes significantly smaller, often only a few
hundred or a few dozen operations.&lt;/p&gt;
&lt;p&gt;At the bottom of the recursion you either reach a square where the range
analysis reveals that the sign for all pixels is determined, then you can fill
in all the pixels of the quadrant. Or you can evaluate the (now much simpler)
formula in the quadrant by executing it for every pixel.&lt;/p&gt;
&lt;p&gt;This is an interesting use case of JIT compiler/optimization techniques,
requiring the optimizer itself to execute really quickly since it is an essential
part of the performance of the algorithm. The optimizer runs literally hundreds
of times to render a single image. If the algorithm is used for 3D models
it becomes even more crucial.&lt;/p&gt;
&lt;h3 id="writing-a-simple-optimizer"&gt;Writing a simple optimizer&lt;/h3&gt;
&lt;p&gt;Implementing the quadtree recursion is straightforward. Since the program has
no control flow the optimizer is very simple to write. I've written a couple of
blog posts on how to easily write optimizers for linear sequences of
operations, and I'm using the approach described in these &lt;a href="https://pypy.org/categories/toy-optimizer.html"&gt;Toy
Optimizer&lt;/a&gt; posts. The interval
analysis is basically an &lt;a href="https://pypy.org/posts/2024/08/toy-knownbits.html"&gt;abstract
interpretation&lt;/a&gt; of the
operations. The optimizer does a sequential forward pass over the input
program. For every operation, the output interval is computed. The optimizer
also performs optimizations based on the computed intervals, which helps in
reducing the number of operations executed (I'll talk about this further down).&lt;/p&gt;
&lt;p&gt;Here's a sketch of the Python code that does the optimization:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Optimizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;
        &lt;span class="n"&gt;num_operations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_operations&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resultops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ProgramBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_operations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalframe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;IntervalFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# old index -&amp;gt; new index&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opreplacements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;num_operations&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get_replacement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opreplacements&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;newop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resultops&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_op&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;newconst&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;const&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resultops&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_const&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;minvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;const&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maxvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;const&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
        &lt;span class="c1"&gt;#self.seen_consts[value] = const&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;const&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;program&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setxyz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;numops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_operations&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;numops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;newop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_optimize_op&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opreplacements&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;newop&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opreplacements&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;numops&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_optimize_op&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;program&lt;/span&gt;
        &lt;span class="n"&gt;intervalframe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalframe&lt;/span&gt;
        &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_func_and_args&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;var_x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;minimum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;minx&lt;/span&gt;
            &lt;span class="n"&gt;maximum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maxx&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opt_default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;var_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;var_y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;minimum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;miny&lt;/span&gt;
            &lt;span class="n"&gt;maximum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maxy&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opt_default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;var_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;var_z&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;minimum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;minz&lt;/span&gt;
            &lt;span class="n"&gt;maximum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maxz&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opt_default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;var_z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;const&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;const&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newconst&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;const&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;arg0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_replacement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;arg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_replacement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="n"&gt;arg0minimum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;minvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;arg0maximum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maxvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;arg1minimum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;minvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;arg1maximum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maxvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;neg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opt_neg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opt_min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1maximum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;opt_default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;newop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;newop&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;opt_neg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# peephole rules go here, see below&lt;/span&gt;
        &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_neg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opt_default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;neg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@symmetric&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;opt_min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1maximum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# peephole rules go here, see below&lt;/span&gt;
        &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1maximum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opt_default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The resulting optimized traces are then simply interpreted at the bottom of the
quadtree recursion. Matt talks about also generating machine code from them,
but when I tried to use PyPy's JIT for that it was way too slow at
producing machine code.&lt;/p&gt;
&lt;h3 id="testing-soundness-of-the-interval-abstract-domain"&gt;Testing soundness of the interval abstract domain&lt;/h3&gt;
&lt;p&gt;To make sure that my interval computation in the optimizer is correct, I
implemented a hypothesis-based property based test. It checks the abstract
transfer functions of the interval domain for soundness. It does so by
generating random concrete input values for an operation and random intervals that
surround the random concrete values, then performs the concrete operation to
get the concrete output, and finally checks that the abstract transfer function applied
to the input intervals gives an interval that contains the concrete output.&lt;/p&gt;
&lt;p&gt;For example, the random test for the &lt;code&gt;square&lt;/code&gt; operation would look like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;hypothesis&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;given&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;assume&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pyfidget.vm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IntervalFrame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DirectFrame&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;math&lt;/span&gt;

&lt;span class="n"&gt;regular_floats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;strategies&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;floats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;allow_nan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allow_infinity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;make_range_and_contained_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;

&lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DirectFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;intervalframe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;IntervalFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;range_and_contained_float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;strategies&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;builds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;make_range_and_contained_float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regular_floats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regular_floats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regular_floats&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rmin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rmax&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isnan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rmin&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isnan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rmax&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rmin&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;rmax&lt;/span&gt;


&lt;span class="nd"&gt;@given&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;range_and_contained_float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test_square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;
    &lt;span class="n"&gt;rmin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rmax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rmin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rmax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This test generates a random float &lt;code&gt;b&lt;/code&gt;, and two other floats &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;c&lt;/code&gt; such
that the interval &lt;code&gt;[a, c]&lt;/code&gt; contains &lt;code&gt;b&lt;/code&gt;. The test then checks that the result
of the &lt;code&gt;square&lt;/code&gt; operation on &lt;code&gt;b&lt;/code&gt; is contained in the interval &lt;code&gt;[rmin, rmax]&lt;/code&gt;
returned by the abstract transfer function for the &lt;code&gt;square&lt;/code&gt; operation.&lt;/p&gt;
&lt;h3 id="peephole-rewrites"&gt;Peephole rewrites&lt;/h3&gt;
&lt;p&gt;The only optimization that Matt does in his implementation is a peephole
optimization rule that removes &lt;code&gt;min&lt;/code&gt; and &lt;code&gt;max&lt;/code&gt; operations where the intervals
of the arguments don't overlap. In that case, the optimizer statically can know
which of the arguments will be the result of the operation. I implemented this
peephole optimization in my implementation as well, but I also added a few more
peephole optimizations that I thought would be useful.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Optimizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;opt_neg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# new: add peephole rule --x =&amp;gt; x&lt;/span&gt;
        &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resultops&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_func_and_args&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;neg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;arg0arg0&lt;/span&gt;
        &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_neg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opt_default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;neg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@symmetric&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;opt_min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1maximum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Matt's peephole rule&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;arg1minimum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt; &lt;span class="c1"&gt;# we can use the intervals to decide which argument will be returned&lt;/span&gt;
        &lt;span class="c1"&gt;# new one by me: min(x, x) =&amp;gt; x &lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;
        &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0arg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resultops&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_func_and_args&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intervalframe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0maximum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1maximum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opt_default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minimum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;However, it turns out that all my attempts at adding other peephole
optimization rules were not very useful. Most rules never fired, and the ones
that did only had a small effect on the performance of the program. The only
peephole optimization that I found to be useful was the one that Matt describes
in his talk. Matt's &lt;code&gt;min&lt;/code&gt;/&lt;code&gt;max&lt;/code&gt; optimization were 96% of all rewrites that my
peephole optimizer applied for the &lt;code&gt;prospero.vm&lt;/code&gt; input. The remaining 4% of
rewrites were (the percentages are of that 4%):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;--x =&amp;gt; x                          4.65%
(-x)**2 =&amp;gt; x ** 2                 0.99%
min(x, x) =&amp;gt; x                   20.86%
min(x, min(x, y)) =&amp;gt;  min(x, y)  52.87%
max(x, x) =&amp;gt; x                   16.40%
max(x, max(x, y)) =&amp;gt; max(x, y)    4.23%
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In the end it turned out that having these extra optimization rules made the
total runtime of the system go up. Checking for the rewrites isn't free, and
since they apply so rarely they don't pay for their own cost in terms of
improved performance.&lt;/p&gt;
&lt;p&gt;There are some further rules that I tried that never fired at all:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;a &lt;span class="gs"&gt;* 0 =&amp;gt; 0&lt;/span&gt;
&lt;span class="gs"&gt;a *&lt;/span&gt; 1 =&amp;gt; a
a &lt;span class="gs"&gt;* a =&amp;gt; a *&lt;/span&gt;* 2
a &lt;span class="gs"&gt;* -1 =&amp;gt; -a&lt;/span&gt;
&lt;span class="gs"&gt;a + 0 =&amp;gt; a&lt;/span&gt;
&lt;span class="gs"&gt;a - 0 =&amp;gt; a&lt;/span&gt;
&lt;span class="gs"&gt;x - x =&amp;gt; 0&lt;/span&gt;
&lt;span class="gs"&gt;abs(known positive number x) =&amp;gt; x&lt;/span&gt;
&lt;span class="gs"&gt;abs(known negative number x) =&amp;gt; -x&lt;/span&gt;
&lt;span class="gs"&gt;abs(-x) =&amp;gt; abs(x)&lt;/span&gt;
&lt;span class="gs"&gt;(-x) *&lt;/span&gt;* 2 =&amp;gt; x ** 2
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This investigation is clearly way too focused on a single program and should be
re-done with a larger set of example inputs, if this were an actually serious
implementation.&lt;/p&gt;
&lt;h3 id="demanded-information-optimization"&gt;Demanded Information Optimization&lt;/h3&gt;
&lt;p&gt;LLVM has an static analysis pass called 'demanded bits'. It is a backwards analysis that
allows you to determine which bits of a value are actually used in the final
result. This information can then be used in peephole optimizations. For
example, if you have an expression that computes a value, but only the last
byte of that value is used in the final result, you can optimize the expression
to only compute the last byte.&lt;/p&gt;
&lt;p&gt;Here's an example. Let's say we first byte-swap a 64-bit int, and then mask off the last byte:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kt"&gt;uint64_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;byteswap_then_mask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint64_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;byteswap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;0xff&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In this case, the "demanded bits" of the &lt;code&gt;byteswap(a)&lt;/code&gt; expression are
&lt;code&gt;0b0...011111111&lt;/code&gt;, which inversely means that we don't care about the upper 56
bits. Therefore the whole expression can be optimized to &lt;code&gt;a &amp;gt;&amp;gt; 56&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For the Prospero challenge, we can observe that for the resulting pixel values, the value of
the result is not used at all, only its sign. Essentially, every program ends
implicitly with a &lt;code&gt;sign&lt;/code&gt; operation that returns &lt;code&gt;0.0&lt;/code&gt; for negative values and
&lt;code&gt;1.0&lt;/code&gt; for positive values. For clarity, I will show this &lt;code&gt;sign&lt;/code&gt; operation in
the rest of the section, even if it's not actually in the real code.&lt;/p&gt;
&lt;p&gt;This makes it possible to simplify certain min/max
operations further. Here is an example of a program, together with the
intervals of the variables:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;# [0.1, 1]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;# [-1, 1]&lt;/span&gt;
&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;# [-1, 1]&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;sign&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This program can be optimized to:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;sign&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Because that expression has the same result as the original expression: if &lt;code&gt;x &amp;gt;
0.1&lt;/code&gt;, for the result of &lt;code&gt;min(x, y)&lt;/code&gt; to be negative then &lt;code&gt;y&lt;/code&gt; needs to be negative.&lt;/p&gt;
&lt;p&gt;Another, more complex, example is this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;# [1, 100]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;# [-10, 10]&lt;/span&gt;
&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;# [-100, 100]&lt;/span&gt;
&lt;span class="n"&gt;m1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;# [-10, 10]&lt;/span&gt;
&lt;span class="n"&gt;m2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;# [-10, 100]&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;sign&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Which can be optimized to this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;
&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;
&lt;span class="n"&gt;m2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;sign&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is because the sign of &lt;code&gt;min(x, y)&lt;/code&gt; is the same as the sign of &lt;code&gt;y&lt;/code&gt; if &lt;code&gt;x &amp;gt;
0&lt;/code&gt;, and the sign of &lt;code&gt;max(z, min(x, y))&lt;/code&gt; is thus the same as the sign of &lt;code&gt;max(z,
y)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To implement this optimization, I do a backwards pass over the program after
the peephole optimization forward pass. For every &lt;code&gt;min&lt;/code&gt; call I encounter, where
one of the arguments is positive, I can optimize the &lt;code&gt;min&lt;/code&gt; call away and
replace it with the other argument. For &lt;code&gt;max&lt;/code&gt; calls I simplify their arguments
recursively.&lt;/p&gt;
&lt;p&gt;The code looks roughly like this:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;work_backwards&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resultops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minvalues&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maxvalues&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;demand_sign_simplify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resultops&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_func_and_args&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;narg0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;demand_sign_simplify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;narg0&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;resultops&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;narg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;narg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;demand_sign_simplify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;narg1&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;resultops&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;narg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;minvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;demand_sign_simplify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;minvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;demand_sign_simplify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;narg0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;demand_sign_simplify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;narg0&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;resultops&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;narg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;narg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;demand_sign_simplify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;narg1&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;resultops&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;narg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;demand_sign_simplify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In my experiment, this optimization lets me remove 25% of all operations in
prospero, at the various levels of my octree. I'll briefly look at performance
results further down.&lt;/p&gt;
&lt;h3 id="further-ideas-about-the-demanded-sign-simplification"&gt;Further ideas about the demanded sign simplification&lt;/h3&gt;
&lt;p&gt;There is another idea how to short-circuit the evaluation of expressions that I
tried briefly but didn't pursue to the end. Let's go back to the first example
of the previous subsection, but with different intervals:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;# [-1, 1]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;# [-1, 1]&lt;/span&gt;
&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c1"&gt;# [-1, 1]&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;sign&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now we can't use the "demanded sign" trick in the optimizer, because neither
&lt;code&gt;x&lt;/code&gt; nor &lt;code&gt;y&lt;/code&gt; are known positive. However, during &lt;em&gt;execution&lt;/em&gt; of the program, if
&lt;code&gt;x&lt;/code&gt; turns out to be negative we can end the execution of this trace
immediately, since we know that the result must be negative.&lt;/p&gt;
&lt;p&gt;So I experimented with adding &lt;code&gt;return_early_if_neg&lt;/code&gt; flags to all operations
with this property. The interpreter then checks whether the flag is set on an
operation and if the result is negative, it stops the execution of the program
early:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;return_early_if_neg&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;var&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;return_early_if_neg&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;
&lt;span class="k"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;sign&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This looked pretty promising, but it's also a trade-off because the cost of
checking the flag and the value isn't zero. Here's a sketch to the change in the interpreter:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;DirectFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;program&lt;/span&gt;
        &lt;span class="n"&gt;num_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_operations&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;floatvalues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;num_ops&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_ops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="o"&gt;...&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;var_x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;
            &lt;span class="o"&gt;...&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_flags&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;should_return_if_neg&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
            &lt;span class="n"&gt;floatvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;floatvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;num_ops&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I implemented this in the RPython
version, but didn't end up porting it to C, because it interferes with SIMD.&lt;/p&gt;
&lt;h3 id="dead-code-elimination"&gt;Dead code elimination&lt;/h3&gt;
&lt;p&gt;Matt performs dead code elimination in his implementation by doing a single
backwards pass over the program. This is a very simple and effective
optimization, and I implemented it in my implementation as well. The dead code
elimination pass is very simple: It starts by marking the result operation as
used. Then it goes backwards over the program. If the current operation is
used, its arguments are marked as used as well. Afterwards, all the operations
that are not marked as used are removed from the program. The PyPy JIT actually
performs dead code elimination on traces in exactly the same way (and I don't
think we ever explained how this works on the blog), so I thought it was worth
mentioning.&lt;/p&gt;
&lt;p&gt;Matt also performs register allocation as part of the backwards pass, but I
didn't implement it because I wasn't too interested in that aspect.&lt;/p&gt;
&lt;h3 id="random-testing-of-the-optimizer"&gt;Random testing of the optimizer&lt;/h3&gt;
&lt;p&gt;To make sure I didn't break anything in the optimizer, I implemented a
test that generates random input programs and checks that the output of the
optimizer is equivalent to the input program. The test generates random
operations, random intervals for the operations and a random input value within
that interval. It then runs the optimizer on the input program and checks that
the output program has the same result as the input program. This is again
implemented with &lt;code&gt;hypothesis&lt;/code&gt;. Hypothesis' test case minimization feature is
super useful for finding optimizer bugs. It's just not fun to analyze a problem
on a many-thousand-operation input file, but Hypothesis often generated reduced
test cases that were only a few operations long.&lt;/p&gt;
&lt;h3 id="visualizing-programs"&gt;Visualizing programs&lt;/h3&gt;
&lt;p&gt;It's actually surprisingly annoying to visualize &lt;code&gt;prospero.vm&lt;/code&gt; well, because
it's quite a bit too large to just feed it into Graphviz. I made the problem
slightly easier by grouping several operations together, where only the first
operation in a group is used as the argument for more than one operation
further in the program. This made it slightly more manageable for Graphviz. But
it still wasn't a big enough improvement to be able to visualize all of
&lt;code&gt;prospero.vm&lt;/code&gt; in its unoptimized form at the top of the octree.&lt;/p&gt;
&lt;p&gt;Here's a visualization of the optimized &lt;code&gt;prospero.vm&lt;/code&gt; at one of the octree
levels:&lt;/p&gt;
&lt;p&gt;&lt;img alt="graph visualization of a part of the input program" src="https://www.pypy.org/images/2025-image-prospero-dataflow.png"&gt;&lt;/p&gt;
&lt;p&gt;The result is on top, every node points to its arguments. The &lt;code&gt;min&lt;/code&gt; and &lt;code&gt;max&lt;/code&gt;
operations form a kind of "spine" of the expression tree, because they are
unions and intersection in the constructive solid geometry sense.&lt;/p&gt;
&lt;p&gt;I also wrote a function to visualize the octree recursion itself, the output
looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="graph visualization of the octree recursion, zoomed out" src="https://www.pypy.org/images/2025-image-octree-zoomed-out.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="graph visualization of the octree recursion, zoomed in" src="https://www.pypy.org/images/2025-image-octree-zoomed-in.png"&gt;&lt;/p&gt;
&lt;p&gt;Green nodes are where the interval analysis determined that the output must be
entirely outside the shape. Yellow nodes are where the octree recursion
bottomed out.&lt;/p&gt;
&lt;h3 id="c-implementation"&gt;C implementation&lt;/h3&gt;
&lt;p&gt;To achieve even faster performance, I decided to rewrite the implementation in
C. While RPython is great for prototyping, it can be challenging to control
low-level aspects of the code. The rewrite in C allowed me to experiment with
several techniques I had been curious about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html"&gt;&lt;code&gt;musttail&lt;/code&gt; optimization&lt;/a&gt; for the interpreter.&lt;/li&gt;
&lt;li&gt;SIMD (Single Instruction, Multiple Data): Using Clang's
  &lt;a href="https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors"&gt;&lt;code&gt;ext_vector_type&lt;/code&gt;&lt;/a&gt;, I process eight pixels at once using AVX (or some other
  SIMD magic that I don't properly understand).&lt;/li&gt;
&lt;li&gt;Efficient struct packing: I packed the operations struct into just 8
  bytes by limiting the maximum number of operations to 65,536, with the idea
  of making the optimizer faster.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I didn't rigorously study the performance impact of each of these techniques
individually, so it's possible that some of them might not have contributed
significantly. However, the rewrite was a fun exercise for me to explore these
techniques. The code can be found
&lt;a href="https://github.com/cfbolz/pyfidget/blob/main/pyfidget/experiments.c"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="testing-the-c-implementation"&gt;Testing the C implementation&lt;/h3&gt;
&lt;p&gt;At various points I had bugs in the C implementation, leading to a fun glitchy
version of prospero:&lt;/p&gt;
&lt;p&gt;&lt;img alt="glitchy prospero" src="https://www.pypy.org/images/2025-glitchy-prospero.png"&gt;&lt;/p&gt;
&lt;p&gt;To find these bugs, I used the same random testing approach as in the
RPython version. I generated random input programs as strings in Python and
checked that the output of the C implementation was equivalent to the output of
the RPython implementation (simply by calling out to the shell and reading the
generated image, then comparing pixels). This helped ensure that the C
implementation was
correct and didn't introduce any bugs. It was surprisingly tricky to get this
right, for reasons that I didn't expect. At lot of them are related to the fact
that in C I used &lt;code&gt;float&lt;/code&gt; and Python uses &lt;code&gt;double&lt;/code&gt; for its (Python) &lt;code&gt;float&lt;/code&gt;
type. This made the random tester find weird floating point corner cases where
rounding behaviour between the widths was different.&lt;/p&gt;
&lt;p&gt;I solved those by using &lt;code&gt;double&lt;/code&gt; in C when running the random tests by means of
an &lt;code&gt;IFDEF&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;It's super fun to watch the random program generator produce random images, here are a few:&lt;/p&gt;
&lt;iframe width="560" height="560" src="https://www.youtube.com/embed/VqU5n3zzOjc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen&gt;&lt;/iframe&gt;

&lt;h3 id="performance"&gt;Performance&lt;/h3&gt;
&lt;p&gt;Some very rough performance results on my laptop (an AMD Ryzen 7 PRO 7840U with
32 GiB RAM running Ubuntu 24.04), comparing the RPython version, the C version
(with and without demanded info), and Fidget (in &lt;code&gt;vm&lt;/code&gt; mode, its JIT made things
worse for me), both for 1024x1024 and 4096x4096 images:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;th&gt;1024x1024&lt;/th&gt;
&lt;th&gt;4096x4096&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RPython&lt;/td&gt;
&lt;td&gt;26.8ms&lt;/td&gt;
&lt;td&gt;75.0ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C (no demanded info)&lt;/td&gt;
&lt;td&gt;24.5ms&lt;/td&gt;
&lt;td&gt;45.0ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C (demanded info)&lt;/td&gt;
&lt;td&gt;18.0ms&lt;/td&gt;
&lt;td&gt;37.0ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fidget&lt;/td&gt;
&lt;td&gt;10.8ms&lt;/td&gt;
&lt;td&gt;57.8ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The demanded info seem to help quite a bit, which was nice to see.&lt;/p&gt;
&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;That's it! I had lots of fun with the challenge and have a whole bunch of other
ideas I want to try out, thanks Matt for this interesting puzzle.&lt;/p&gt;</description><category>toy-optimizer</category><guid>https://www.pypy.org/posts/2025/04/prospero-in-rpython.html</guid><pubDate>Wed, 09 Apr 2025 15:07:09 GMT</pubDate></item><item><title>PyPy v7.3.19 release</title><link>https://www.pypy.org/posts/2025/02/pypy-v7319-release.html</link><dc:creator>mattip</dc:creator><description>&lt;section id="pypy-v7-3-19-release-of-python-2-7-3-10-and-3-11-beta"&gt;
&lt;h2&gt;PyPy v7.3.19: release of python 2.7, 3.10 and 3.11 beta&lt;/h2&gt;
&lt;p&gt;The PyPy team is proud to release version 7.3.19 of PyPy. This is primarily a
bug-fix release fixing JIT-related problems and follows quickly on the heels of
the previous release on Feb 6, 2025.&lt;/p&gt;
&lt;p&gt;This release includes a python 3.11 interpreter. There were bugs in the first
beta that could prevent its wider use, so we are continuing to call this
release "beta". In the next release we will drop 3.10 and remove the "beta"
label.&lt;/p&gt;
&lt;p&gt;The release includes three different interpreters:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;PyPy2.7, which is an interpreter supporting the syntax and the features of
Python 2.7 including the stdlib for CPython 2.7.18+ (the &lt;code class="docutils literal"&gt;+&lt;/code&gt; is for
backported security updates)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PyPy3.10, which is an interpreter supporting the syntax and the features of
Python 3.10, including the stdlib for CPython 3.10.16.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PyPy3.11, which is an interpreter supporting the syntax and the features of
Python 3.11, including the stdlib for CPython 3.11.11.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The interpreters are based on much the same codebase, thus the triple
release. This is a micro release, all APIs are compatible with the other 7.3
releases. It follows after 7.3.17 release on August 28, 2024.&lt;/p&gt;
&lt;p&gt;We recommend updating. You can find links to download the releases here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a class="reference external" href="https://pypy.org/download.html"&gt;https://pypy.org/download.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We would like to thank our donors for the continued support of the PyPy
project. If PyPy is not quite good enough for your needs, we are available for
&lt;a class="reference external" href="https://www.pypy.org/pypy-sponsors.html"&gt;direct consulting&lt;/a&gt; work. If PyPy is helping you out, we would love to hear
about it and encourage submissions to our &lt;a class="reference external" href="https://pypy.org/blog"&gt;blog&lt;/a&gt; via a pull request
to &lt;a class="reference external" href="https://github.com/pypy/pypy.org"&gt;https://github.com/pypy/pypy.org&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We would also like to thank our contributors and encourage new people to join
the project. PyPy has many layers and we need help with all of them: bug fixes,
&lt;a class="reference external" href="https://doc.pypy.org/"&gt;PyPy&lt;/a&gt; and &lt;a class="reference external" href="https://rpython.readthedocs.org"&gt;RPython&lt;/a&gt; documentation improvements, or general &lt;a class="reference external" href="https://doc.pypy.org/en/latest/project-ideas.html"&gt;help&lt;/a&gt; with
making RPython's JIT even better.&lt;/p&gt;
&lt;p&gt;If you are a python library maintainer and use C-extensions, please consider
making a &lt;a class="reference external" href="https://hpyproject.org/"&gt;HPy&lt;/a&gt; / &lt;a class="reference external" href="https://cffi.readthedocs.io"&gt;CFFI&lt;/a&gt; / &lt;a class="reference external" href="https://cppyy.readthedocs.io"&gt;cppyy&lt;/a&gt; version of your library that would be performant
on PyPy. In any case, both &lt;a class="reference external" href="https://github.com/joerick/cibuildwheel"&gt;cibuildwheel&lt;/a&gt; and the &lt;a class="reference external" href="https://github.com/matthew-brett/multibuild"&gt;multibuild system&lt;/a&gt; support
building wheels for PyPy.&lt;/p&gt;
&lt;section id="what-is-pypy"&gt;
&lt;h3&gt;What is PyPy?&lt;/h3&gt;
&lt;p&gt;PyPy is a Python interpreter, a drop-in replacement for CPython
It's fast (&lt;a class="reference external" href="https://speed.pypy.org"&gt;PyPy and CPython&lt;/a&gt; performance
comparison) due to its integrated tracing JIT compiler.&lt;/p&gt;
&lt;p&gt;We also welcome developers of other &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/examples.html"&gt;dynamic languages&lt;/a&gt; to see what RPython
can do for them.&lt;/p&gt;
&lt;p&gt;We provide binary builds for:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;x86&lt;/strong&gt; machines on most common operating systems
(Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;64-bit &lt;strong&gt;ARM&lt;/strong&gt; machines running Linux (&lt;code class="docutils literal"&gt;aarch64&lt;/code&gt;) and macos (&lt;code class="docutils literal"&gt;macos_arm64&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PyPy supports Windows 32-bit, Linux PPC64 big- and little-endian, Linux ARM
32 bit, RISC-V RV64IMAFD Linux, and s390x Linux but does not release binaries.
Please reach out to us if you wish to sponsor binary releases for those
platforms. Downstream packagers provide binary builds for debian, Fedora,
conda, OpenBSD, FreeBSD, Gentoo, and more.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-else-is-new"&gt;
&lt;h3&gt;What else is new?&lt;/h3&gt;
&lt;p&gt;For more information about the 7.3.19 release, see the &lt;a class="reference external" href="https://doc.pypy.org/en/latest/release-v7.3.19.html#changelog"&gt;full changelog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please update, and continue to help us make pypy better.&lt;/p&gt;
&lt;p&gt;Cheers,
The PyPy Team&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;</description><category>release</category><guid>https://www.pypy.org/posts/2025/02/pypy-v7319-release.html</guid><pubDate>Wed, 26 Feb 2025 12:00:00 GMT</pubDate></item><item><title>Low Overhead Allocation Sampling with VMProf in PyPy's GC</title><link>https://www.pypy.org/posts/2025/02/pypy-gc-sampling.html</link><dc:creator>Christoph Jung</dc:creator><description>&lt;h3 id="introduction"&gt;Introduction&lt;/h3&gt;
&lt;p&gt;There are many time-based statistical profilers around (like VMProf or py-spy
just to name a few). They allow the user to pick a trade-off between profiling
precision and runtime overhead.&lt;/p&gt;
&lt;p&gt;On the other hand there are memory profilers
such as &lt;a href="https://github.com/bloomberg/memray"&gt;memray&lt;/a&gt;. They can be handy for
finding leaks or for discovering functions that allocate a lot of memory.
Memory profilers typlically save every single allocation a program does. This
results in precise profiling, but larger overhead.&lt;/p&gt;
&lt;p&gt;In this post we describe our experimental approach to low overhead statistical
memory profiling. Instead of saving every single allocation a program does, it
only saves every nth allocated byte. We have tightly integrated VMProf and the
PyPy Garbage Collector to achieve this. The main technical insight is that the
check whether an allocation should be sampled can be made free. This is done by
folding it into the bump pointer allocator check that the PyPy’s GC uses to
find out if it should start a minor collection. In this way the fast path with
and without memory sampling are exactly the same.&lt;/p&gt;
&lt;h3 id="background"&gt;Background&lt;/h3&gt;
&lt;p&gt;To get an insight how the profiler and GC interact, lets take a brief look at
both of them first.&lt;/p&gt;
&lt;h4 id="vmprof"&gt;VMProf&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://github.com/vmprof/vmprof-python"&gt;VMProf&lt;/a&gt; is a statistical time-based profiler for PyPy. VMProf samples the stack of currently running Python functions a certain user-configured number of times per second. By adjusting
this number, the overhead of profiling can be modified to pick the correct trade-off between overhead and precision of the profile. In the resulting profile, functions with huge runtime stand out the most, functions with shorter runtime less so. If you want to get a little more introduction to VMProf and how to use it with PyPy, you may look
at &lt;a href="https://pypy.org/posts/2024/05/vmprof-firefox-converter.html"&gt;this blog post&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="pypys-gc"&gt;PyPy’s GC&lt;/h4&gt;
&lt;p&gt;PyPy uses a generational incremental copying collector. That means there are two spaces for allocated objects, the nursery and the old-space. Freshly allocated objects will be allocated into the nursery. When the nursery is full at some point, it will be collected and all objects that survive will be tenured i.e. moved into the old-space. The old-space is much larger than the nursery and is collected less frequently and &lt;a href="https://www.pypy.org/posts/2024/03/fixing-bug-incremental-gc.html"&gt;incrementally&lt;/a&gt; (not completely
collected in one go, but step-by-step). The old space collection is not relevant for the rest of the post though. We will now take a look at nursery allocations and how the nursery is collected.&lt;/p&gt;
&lt;h4 id="bump-pointer-allocation-in-the-nursery"&gt;Bump Pointer Allocation in the Nursery&lt;/h4&gt;
&lt;p&gt;The nursery (a small continuous memory area) utilizes two pointers to keep track from where on the nursery is free and where it ends. They are called &lt;code&gt;nursery_free&lt;/code&gt; and &lt;code&gt;nursery_top&lt;/code&gt;. When memory is allocated, the GC checks if there is enough space in the nursery left. If there is enough space, the &lt;code&gt;nursery_free&lt;/code&gt; pointer will be returned as the start address for the newly allocated memory, and &lt;code&gt;nursery_free&lt;/code&gt; will be moved forward by the amount of allocated memory.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_allocation.svg"&gt;&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;allocate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;totalsize&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="c1"&gt;# Save position, where the object will be allocated to as result&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt;
  &lt;span class="c1"&gt;# Move nursery_free pointer forward by totalsize&lt;/span&gt;
  &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;totalsize&lt;/span&gt;
  &lt;span class="c1"&gt;# Check if this allocation would exceed the nursery&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# If it does =&amp;gt; collect the nursery and allocate afterwards&lt;/span&gt;
      &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collect_and_reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;totalsize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="c1"&gt;# result is a pointer into the nursery, obj will be allocated there&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;collect_and_reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size_of_allocation&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# do a minor collection and return the start of the nursery afterwards&lt;/span&gt;
    &lt;span class="n"&gt;minor_collection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Understanding this is crucial for our allocation sampling approach, so let us go through this step-by-step.&lt;/p&gt;
&lt;p&gt;We already saw an example on how an allocation into a non-full nursery will look like. But what happens, if the nursery is (too) full?&lt;/p&gt;
&lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_full.svg"&gt;&lt;/p&gt;
&lt;p&gt;As soon as an object doesn't fit into the nursery anymore, it will be collected. A nursery collection will move all surviving objects into the old-space, so that the nursery is free afterwards, and the requested allocation can be made.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_collected.svg"&gt;&lt;/p&gt;
&lt;p&gt;(Note that this is still a bit of a simplification.)&lt;/p&gt;
&lt;h3 id="sampling-approach"&gt;Sampling Approach&lt;/h3&gt;
&lt;p&gt;The last section described how the nursery allocation works normally. Now we'll talk how we integrate the new allocation sampling approach into it.&lt;/p&gt;
&lt;p&gt;To decide whether the GC should trigger a sample, the sampling logic is integrated into the bump pointer allocation logic. Usually, when there is not enough space in the nursery left to fulfill an allocation request, the nursery will be collected and the allocation will be done afterwards. We reuse that mechanism for sampling, by introducing a new pointer called &lt;code&gt;sample_point&lt;/code&gt; that is calculated by &lt;code&gt;sample_point = nursery_free + sample_n_bytes&lt;/code&gt; where &lt;code&gt;sample_n_bytes&lt;/code&gt; is the number of bytes allocated before a sample is made (i.e. our sampling rate).&lt;/p&gt;
&lt;p&gt;Imagine we'd have a nursery of 2MB and want to sample every 512KB allocated, then you could imagine our nursery looking like that:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_sampling.svg"&gt;&lt;/p&gt;
&lt;p&gt;We use the sample point as &lt;code&gt;nursery_top&lt;/code&gt;, so that allocating a chunk of 512KB would exceed the nursery top and start a nursery collection. But of course we don't want to do a minor collection just then, so before starting a collection, we need to check if the nursery is actually full or if that is just an exceeded sample point. The latter will then trigger a VMprof stack sample. Afterwards we don't actually do a minor collection, but change &lt;code&gt;nursery_top&lt;/code&gt; and immediately return to the caller.&lt;/p&gt;
&lt;p&gt;The last picture is a conceptual simplification. Only one sampling point exists at any given time. After we created the sampling point, it will be used as nursery top, if exceeded at some point, we will just add &lt;code&gt;sample_n_bytes&lt;/code&gt; to that sampling point, i.e. move it forward.&lt;/p&gt;
&lt;p&gt;Here's how the updated &lt;code&gt;collect_and_reserve&lt;/code&gt; function looks like:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;collect_and_reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size_of_allocation&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Check if we exceeded a sample point or if we need to do a minor collection&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sample_point&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# One allocation could exceed multiple sample points&lt;/span&gt;
        &lt;span class="c1"&gt;# Sample, move sample_point forward&lt;/span&gt;
        &lt;span class="n"&gt;vmprof&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sample_now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sample_point&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;sample_n_bytes&lt;/span&gt;

        &lt;span class="c1"&gt;# Set sample point as new nursery_top if it fits into the nursery&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sample_point&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;real_nursery_top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sample_point&lt;/span&gt;
        &lt;span class="c1"&gt;# Or use the real nursery top if it does not fit&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;real_nursery_top&lt;/span&gt;

        &lt;span class="c1"&gt;# Is there enough memory left inside the nursery&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;size_of_allocation&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Yes =&amp;gt; move nursery_free forward&lt;/span&gt;
            &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;size_of_allocation&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt;

    &lt;span class="c1"&gt;# We did not exceed a sampling point and must do a minor collection, or&lt;/span&gt;
    &lt;span class="c1"&gt;# we exceeded a sample point but we needed to do a minor collection anyway&lt;/span&gt;
    &lt;span class="n"&gt;minor_collection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id="why-is-the-overhead-low"&gt;Why is the Overhead ‘low’&lt;/h3&gt;
&lt;p&gt;The most important property of our approach is that the bump-pointer fast path is not changed at all. If sampling is turned off, the slow path in &lt;code&gt;collect_and_reserve&lt;/code&gt; has three extra instructions for the if at the beginning, but are only a very small amount of overhead, compared to doing a minor collection.&lt;/p&gt;
&lt;p&gt;When sampling is on, the extra logic in &lt;code&gt;collect_and_reserve&lt;/code&gt; gets executed. Every time an allocation exceeds the &lt;code&gt;sample_point&lt;/code&gt;, &lt;code&gt;collect_and_reserve&lt;/code&gt; will sample the Python functions currently executing. The resulting overhead is directly controlled by &lt;code&gt;sample_n_bytes&lt;/code&gt;. After sampling, the &lt;code&gt;sample_point&lt;/code&gt; and &lt;code&gt;nursery_top&lt;/code&gt; must be set accordingly. This will be done once after sampling in &lt;code&gt;collect_and_reserve&lt;/code&gt;. At some point a nursery collection will free the nursery and set the new &lt;code&gt;sample_point&lt;/code&gt; afterwards.&lt;/p&gt;
&lt;p&gt;That means that the overhead mostly depends on the sampling rate and the rate at which the user program allocates memory, as the combination of those two factors determines the amount of samples.&lt;/p&gt;
&lt;p&gt;Since the sampling rate can be adjusted from as low as 64 Byte to a theoretical maximum of ~4 GB (at the moment), the tradeoff between number of samples (i.e. profiling precision) and overhead can be completely adjusted.&lt;/p&gt;
&lt;p&gt;We also suspect linkage between user program stack depth and overhead (a deeper stack takes longer to walk, leading to higher overhead), especially when walking the C call stack to.&lt;/p&gt;
&lt;h3 id="sampling-rates-bigger-than-the-nursery-size"&gt;Sampling rates bigger than the nursery size&lt;/h3&gt;
&lt;p&gt;The nursery usually has a size of a few megabytes, but profiling long-runningor larger applications with tons of allocations could result in very high number of samples per second (and thus overhead). To combat that it is possible to use sampling rates higher than the nursery size.&lt;/p&gt;
&lt;p&gt;The sampling point is not limited by the nursery size, but if it is 'outside' the nursery (e.g. because &lt;code&gt;sample_n_bytes&lt;/code&gt; is set to twice the nursery size) it won't be used as &lt;code&gt;nursery_top&lt;/code&gt; until it 'fits' into the nursery.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_sampling_larger_than_nursery.svg"&gt;&lt;/p&gt;
&lt;p&gt;After every nursery collection, we'd usually set the &lt;code&gt;sample_point&lt;/code&gt; to &lt;code&gt;nursery_free + sample_n_bytes&lt;/code&gt;, but if it is larger than the nursery, then the amount of collected memory during the last nursery collection is subtracted from &lt;code&gt;sample_point&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_sampling_larger_than_nursery_post_minor.svg"&gt;&lt;/p&gt;
&lt;p&gt;At some point the &lt;code&gt;sample_point&lt;/code&gt; will be smaller than the nursery size, then it will be used as &lt;code&gt;nursery_top&lt;/code&gt; again to trigger a sample when exceeded.&lt;/p&gt;
&lt;h3 id="differences-to-time-based-sampling"&gt;Differences to Time-Based Sampling&lt;/h3&gt;
&lt;p&gt;As mentioned in the introduction, time-based sampling ‘hits’ functions with high runtime, and allocation-sampling ‘hits’ functions allocating much memory. But are those always different functions? The answer is: sometimes. There can be functions allocating lots of memory, that do not have a (relative) high runtime.&lt;/p&gt;
&lt;p&gt;Another difference to time-based sampling is that the profiling overhead does not solely depend on the sampling rate (if we exclude a potential stack-depth - overhead correlation for now) but also on the amount of memory the user code allocates.&lt;/p&gt;
&lt;p&gt;Let us look at an example:&lt;/p&gt;
&lt;p&gt;If we’d sample every 1024 Byte and some program A allocates 3 MB and runs for 5 seconds, and program B allocates 6 MB but also runs for 5 seconds, there will be ~3000 samples when profiling A, but ~6000 samples when profiling B. That means we cannot give a ‘standard’ sampling rate like time-based profilers use to do (e.g. vmprof uses ~1000 samples/s for time sampling), as the number of resulting samples, and thus overhead, depends on sampling rate and amount of memory allocated by the program.&lt;/p&gt;
&lt;p&gt;For testing and benchmarking, we usually started with a sampling rate of 128Kb and then halved or doubled that (multiple times) depending on sample counts, our need for precision (and size of the profile).&lt;/p&gt;
&lt;h3 id="evaluation"&gt;Evaluation&lt;/h3&gt;
&lt;h4 id="overhead"&gt;Overhead&lt;/h4&gt;
&lt;p&gt;Now let us take a look at the allocation sampling overhead, by profiling some benchmarks. &lt;/p&gt;
&lt;p&gt;The x-axis shows the sampling rate, while the y-axis shows the overhead, which is computed as &lt;code&gt;runtime_with_sampling / runtime_without_sampling&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;All benchmarks were executed five times on a PyPy with JIT and native profiling enabled, so that every dot in the plot is one run of a benchmark.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/as_overhead.png"&gt;&lt;/p&gt;
&lt;p&gt;As you probably expected, the Overhead drops with higher allocation sampling rates.
Reaching from as high as ~390% for 32kb allocation sampling to as low as &amp;lt; 10% for 32mb.&lt;/p&gt;
&lt;p&gt;Let me give one concrete example: One run of the microbenchmark at 32kb sampling took 15.596 seconds and triggered 822050 samples.
That makes a ridiculous amount of &lt;code&gt;822050 / 15.596 = ~52709&lt;/code&gt; samples per second. &lt;/p&gt;
&lt;p&gt;There is probably no need for that amount of samples per second, so that for 'real' application profiling a much higher sampling rate would be sufficient.&lt;/p&gt;
&lt;p&gt;Let us compare that to time sampling.&lt;/p&gt;
&lt;p&gt;This time we ran those benchmarks with 100, 1000 and 2000 samples per second.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/ts_overhead.png"&gt;&lt;/p&gt;
&lt;p&gt;The overhead varies with the sampling rate. Both with allocation and time sampling, you can reach any amount of overhead and any level of profiling precision you want. The best approach probably is to just try out a sampling rate and choose what gives you the right tradeoff between precision and overhead (and disk usage).&lt;/p&gt;
&lt;p&gt;The benchmarks used are:&lt;/p&gt;
&lt;p&gt;microbenchmark &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Cskorpion/microbenchmark"&gt;https://github.com/Cskorpion/microbenchmark&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pypy microbench.py 65536&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;gcbench &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/pypy/pypy/blob/main/rpython/translator/goal/gcbench.py"&gt;https://github.com/pypy/pypy/blob/main/rpython/translator/goal/gcbench.py&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;print statements removed&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pypy gcbench.py 1&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;pypy translate step&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;first step of the pypy translation (annotation step)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pypy path/to/rpython --opt=0 --cc=gcc --dont-write-c-files --gc=incminimark --annotate path/to/pypy/goal/targetpypystandalone.py&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;interpreter pystone&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pystone benchmark on top of an interpreted pypy on top of a translated pypy&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pypy path/to/pypy/bin/pyinteractive.py -c "import test.pystone; test.pystone.main(1)"&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All benchmarks executed on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kubuntu 24.04&lt;/li&gt;
&lt;li&gt;AMD Ryzen 7 5700U&lt;/li&gt;
&lt;li&gt;24gb DDR4 3200MHz (dual channel)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;SSD benchmarking at read: 1965 MB/s, write: 227 MB/s&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sequential 1MB 1 Thread 8 Queues&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Self built PyPy with allocation sampling features&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Cskorpion/pypy/tree/gc_allocation_sampling_u_2.7"&gt;https://github.com/Cskorpion/pypy/tree/gc_allocation_sampling_u_2.7&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Modified VMProf with allocation sampling support&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Cskorpion/vmprof-python/tree/pypy_gc_allocation_sampling"&gt;https://github.com/Cskorpion/vmprof-python/tree/pypy_gc_allocation_sampling&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="example"&gt;Example&lt;/h4&gt;
&lt;p&gt;We have also modified &lt;a href="https://github.com/Cskorpion/vmprof-firefox-converter/tree/allocation_sampling"&gt;vmprof-firefox-converter&lt;/a&gt; to show the allocation samples in the Firefor Profiler UI. With the techniques from this post, the output looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/allocation_sampling_call_tree.png"&gt;&lt;/p&gt;
&lt;p&gt;While this view is interesting, it would be even better if we could also see what types of objects are being allocated in these functions. We will take about how to do this in a future blog post.&lt;/p&gt;
&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;In this blog post we introduced allocation sampling for PyPy by going through the technical aspects and the corresponding overhead. In a future blog post, we are going to dive into the actual usage of allocation sampling with VMProf, and show an example case study. That will be accompanied by some new improvements and additional features, like extracting the type of an object that triggered a sample.&lt;/p&gt;
&lt;p&gt;So far all this work is still experimental and happening on PyPy branches but
we hope to get the technique stable enough to merge it to main and ship it with
PyPy eventually.&lt;/p&gt;
&lt;p&gt;-- Christoph Jung and CF Bolz-Tereick&lt;/p&gt;</description><category>gc</category><category>profiling</category><category>vmprof</category><guid>https://www.pypy.org/posts/2025/02/pypy-gc-sampling.html</guid><pubDate>Tue, 25 Feb 2025 10:16:00 GMT</pubDate></item><item><title>PyPy v7.3.18 release</title><link>https://www.pypy.org/posts/2025/02/pypy-v7318-release.html</link><dc:creator>mattip</dc:creator><description>&lt;section id="pypy-v7-3-18-release-of-python-2-7-3-10-and-3-11-beta"&gt;
&lt;h2&gt;PyPy v7.3.18: release of python 2.7, 3.10 and 3.11 beta&lt;/h2&gt;
&lt;p&gt;The PyPy team is proud to release version 7.3.18 of PyPy.&lt;/p&gt;
&lt;p&gt;This release includes a python 3.11 interpreter. We are labelling it "beta"
because it is the first one. In the next release we will drop 3.10 and remove
the "beta" label. There are a particularly large set of bugfixes in this
release thanks to @devdanzin using fusil on the 3.10 builds, originally written
by Victor Stinner. Other significant changes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We have updated libffi shipped in our portable builds. We also now statically
link to libffi where possible which reduces the number of
shared object dependencies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We have added code to be able to show the native function names when
profiling with VMProf. So far only Linux supports this feature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We have added a &lt;a class="reference external" href="https://peps.python.org/pep-0768/"&gt;PEP 768&lt;/a&gt;-inspired remote debugging facility.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The HPy backend has been updated to latest HPy HEAD&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The release includes three different interpreters:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;PyPy2.7, which is an interpreter supporting the syntax and the features of
Python 2.7 including the stdlib for CPython 2.7.18+ (the &lt;code class="docutils literal"&gt;+&lt;/code&gt; is for
backported security updates)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PyPy3.10, which is an interpreter supporting the syntax and the features of
Python 3.10, including the stdlib for CPython 3.10.16.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PyPy3.11, which is an interpreter supporting the syntax and the features of
Python 3.11, including the stdlib for CPython 3.11.11.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The interpreters are based on much the same codebase, thus the triple
release. This is a micro release, all APIs are compatible with the other 7.3
releases. It follows after 7.3.17 release on August 28, 2024.&lt;/p&gt;
&lt;p&gt;We recommend updating. You can find links to download the releases here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a class="reference external" href="https://pypy.org/download.html"&gt;https://pypy.org/download.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We would like to thank our donors for the continued support of the PyPy
project. If PyPy is not quite good enough for your needs, we are available for
&lt;a class="reference external" href="https://www.pypy.org/pypy-sponsors.html"&gt;direct consulting&lt;/a&gt; work. If PyPy is helping you out, we would love to hear
about it and encourage submissions to our &lt;a class="reference external" href="https://pypy.org/blog"&gt;blog&lt;/a&gt; via a pull request
to &lt;a class="reference external" href="https://github.com/pypy/pypy.org"&gt;https://github.com/pypy/pypy.org&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We would also like to thank our contributors and encourage new people to join
the project. PyPy has many layers and we need help with all of them: bug fixes,
&lt;a class="reference external" href="https://doc.pypy.org/"&gt;PyPy&lt;/a&gt; and &lt;a class="reference external" href="https://rpython.readthedocs.org"&gt;RPython&lt;/a&gt; documentation improvements, or general &lt;a class="reference external" href="https://doc.pypy.org/en/latest/project-ideas.html"&gt;help&lt;/a&gt; with
making RPython's JIT even better.&lt;/p&gt;
&lt;p&gt;If you are a python library maintainer and use C-extensions, please consider
making a &lt;a class="reference external" href="https://hpyproject.org/"&gt;HPy&lt;/a&gt; / &lt;a class="reference external" href="https://cffi.readthedocs.io"&gt;CFFI&lt;/a&gt; / &lt;a class="reference external" href="https://cppyy.readthedocs.io"&gt;cppyy&lt;/a&gt; version of your library that would be performant
on PyPy. In any case, both &lt;a class="reference external" href="https://github.com/joerick/cibuildwheel"&gt;cibuildwheel&lt;/a&gt; and the &lt;a class="reference external" href="https://github.com/matthew-brett/multibuild"&gt;multibuild system&lt;/a&gt; support
building wheels for PyPy.&lt;/p&gt;
&lt;section id="vmprof-native-symbol-names"&gt;
&lt;h3&gt;VMProf Native Symbol Names&lt;/h3&gt;
&lt;p&gt;When running VMProf profiling with native profiling enabled, PyPy did so far
not produce function names for C functions. The output looked like this:&lt;/p&gt;
&lt;pre class="literal-block"&gt;pypy -m vmprof ~/projects/gitpypy/lib-python/2.7/test/pystone.py
Pystone(1.1) time for 50000 passes = 0.0109887
This machine benchmarks at 4.55011e+06 pystones/second
 vmprof output:
 %:      name:                location:
 100.0%  entry_point          &amp;lt;builtin&amp;gt;/app_main.py:874
 100.0%  run_command_line     &amp;lt;builtin&amp;gt;/app_main.py:601
 100.0%  run_toplevel         &amp;lt;builtin&amp;gt;/app_main.py:93
 100.0%  _run_module_as_main  /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/lib-python/2.7/runpy.py:150
 100.0%  _run_code            /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/lib-python/2.7/runpy.py:62
 100.0%  &amp;lt;module&amp;gt;             /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/site-packages/vmprof/__main__.py:1
 100.0%  main                 /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/site-packages/vmprof/__main__.py:30
 100.0%  run_path             /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/lib-python/2.7/runpy.py:238
 100.0%  _run_module_code     /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/lib-python/2.7/runpy.py:75
 100.0%  &amp;lt;module&amp;gt;             /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:3
 100.0%  main                 /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:60
 100.0%  pystones             /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:67
 100.0%  Proc0                /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:79
 76.9%   &amp;lt;unknown code&amp;gt;
 69.2%   &amp;lt;unknown code&amp;gt;
 53.8%   &amp;lt;unknown code&amp;gt;
 53.8%   &amp;lt;unknown code&amp;gt;
 46.2%   &amp;lt;unknown code&amp;gt;
 46.2%   &amp;lt;unknown code&amp;gt;
 38.5%   &amp;lt;unknown code&amp;gt;
 38.5%   Proc8                /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:212
 30.8%   &amp;lt;unknown code&amp;gt;
 ...&lt;/pre&gt;
&lt;p&gt;We can now symbolify these C functions and give function names and which
shared library they come from, at least on Linux:&lt;/p&gt;
&lt;pre class="literal-block"&gt;Pystone(1.1) time for 50000 passes = 0.218967
This machine benchmarks at 228345 pystones/second
 vmprof output:
 %:      name:                                           location:
 100.0%  entry_point                                     &amp;lt;builtin&amp;gt;/app_main.py:889
 100.0%  run_command_line                                &amp;lt;builtin&amp;gt;/app_main.py:616
 100.0%  run_toplevel                                    &amp;lt;builtin&amp;gt;/app_main.py:95
 100.0%  _run_module_as_main                             /home/user/projects/gitpypy/lib-python/2.7/runpy.py:150
 100.0%  _run_code                                       /home/user/projects/gitpypy/lib-python/2.7/runpy.py:62
 100.0%  &amp;lt;module&amp;gt;                                        /home/user/projects/gitpypy/site-packages/vmprof/__main__.py:1
 100.0%  main                                            /home/user/projects/gitpypy/site-packages/vmprof/__main__.py:30
 100.0%  run_module                                      /home/user/projects/gitpypy/lib-python/2.7/runpy.py:179
 100.0%  _run_module_code                                /home/user/projects/gitpypy/lib-python/2.7/runpy.py:75
 100.0%  &amp;lt;module&amp;gt;                                        /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:3
 100.0%  main                                            /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:60
 100.0%  pystones                                        /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:67
 100.0%  Proc0                                           /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:79
 95.5%   n:pypy_g_execute_frame:0:pypy-c
 91.4%   n:pypy_g_PyFrame_dispatch:0:pypy-c
 63.8%   n:pypy_g_PyFrame_dispatch_bytecode:0:pypy-c
 49.8%   Proc1                                           /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:137
 17.6%   copy                                            /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:53
 13.6%   n:pypy_g_PyFrame_CALL_FUNCTION:0:pypy-c
 10.4%   Proc8                                           /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:212
 8.6%    n:pypy_g_STORE_ATTR_slowpath:0:pypy-c&lt;/pre&gt;
&lt;p&gt;This becomes even more useful when using the &lt;a class="reference external" href="https://github.com/Cskorpion/vmprof-firefox-converter/"&gt;VMProf Firefox converter&lt;/a&gt;, which
uses the Firefox Profiler Web UI to visualize profiling output:&lt;/p&gt;
&lt;img alt="/images/2025-vmprof-firefox.png" src="https://www.pypy.org/images/2025-vmprof-firefox.png"&gt;
&lt;/section&gt;
&lt;section id="what-is-pypy"&gt;
&lt;h3&gt;What is PyPy?&lt;/h3&gt;
&lt;p&gt;PyPy is a Python interpreter, a drop-in replacement for CPython
It's fast (&lt;a class="reference external" href="https://speed.pypy.org"&gt;PyPy and CPython&lt;/a&gt; performance
comparison) due to its integrated tracing JIT compiler.&lt;/p&gt;
&lt;p&gt;We also welcome developers of other &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/examples.html"&gt;dynamic languages&lt;/a&gt; to see what RPython
can do for them.&lt;/p&gt;
&lt;p&gt;We provide binary builds for:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;x86&lt;/strong&gt; machines on most common operating systems
(Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;64-bit &lt;strong&gt;ARM&lt;/strong&gt; machines running Linux (&lt;code class="docutils literal"&gt;aarch64&lt;/code&gt;) and macos (&lt;code class="docutils literal"&gt;macos_arm64&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PyPy supports Windows 32-bit, Linux PPC64 big- and little-endian, Linux ARM
32 bit, RISC-V RV64IMAFD Linux, and s390x Linux but does not release binaries.
Please reach out to us if you wish to sponsor binary releases for those
platforms. Downstream packagers provide binary builds for debian, Fedora,
conda, OpenBSD, FreeBSD, Gentoo, and more.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-else-is-new"&gt;
&lt;h3&gt;What else is new?&lt;/h3&gt;
&lt;p&gt;For more information about the 7.3.18 release, see the &lt;a class="reference external" href="https://doc.pypy.org/en/latest/release-v7.3.18.html#changelog"&gt;full changelog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please update, and continue to help us make pypy better.&lt;/p&gt;
&lt;p&gt;Cheers,
The PyPy Team&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;</description><category>release</category><guid>https://www.pypy.org/posts/2025/02/pypy-v7318-release.html</guid><pubDate>Thu, 06 Feb 2025 12:00:00 GMT</pubDate></item></channel></rss>