Scientific Computing with JAX

A Case Study Evaluating Gravitational Lensing Likelihood
HTML presentation, PDF archive

Dr. Kolen Cheung, Research Software Engineer
khcheung@berkeley.edu

Research Software & Analytics Group, University of Exeter

June 4th, 2025

Physics: revealing the nature of dark matter with the James Webb Space Telescope (JWST)

Probing the low-mass region ($\lesssim 10^{8.5} M_\odot$) of Dark Matter substructures
- ΛCDM vs. alternative Dark Matter models
- invisible $\Rightarrow$ probing via strong gravitational lensing
- demonstrated in Nightingale et al. (2023) using Hubble Space Telescope (HST) data
Next: take advantage of 4+ wavebands, high quality observations from James Webb Space Telescope (JWST)

How? PyAutoLens

Mass modeling: parametric, non-linear, multi-phase models
Source reconstruction: linear, reconstructing unlensed light distribution via different kinds of meshes: rectangular, Delaunay, Voronoi

Log-likelihood function takes the output of (1) and computes its likelihood (where (2) is part of the calculation).

Key goal is to automate the whole process and apply it to large datasets.

The power of likelihood in theory

$P(\boldsymbol{\Theta} | \mathbf{D}, M) = \frac{P(\mathbf{D} | \boldsymbol{\Theta}, M) P(\boldsymbol{\Theta} | M)}{P(\mathbf{D} | M)} \equiv \frac{\mathcal{L}(\boldsymbol{\Theta}) \pi(\boldsymbol{\Theta})} {\mathcal{Z}}$

The power of likelihood in action

25 free parameters
- Lens Light (11): Sersic + Exponential
- Lens Mass (7): SIE + Shear
- Source Light (7): Sersic

PyAutoLens (via PyAutoFit) supports Nested sampling (Dynesty), MCMC (emcee), particle swarm optimization (PySwarms)

The power of likelihood in action

25 free parameters
- Lens Light (11): Sersic + Exponential
- Lens Mass (7): SIE + Shear
- Source Light (7): Sersic

PyAutoLens (via PyAutoFit) supports Nested sampling (Dynesty), MCMC (emcee), particle swarm optimization (PySwarms)

Computational challenges

PyAutoLens originally is implemented in Numba
For single-band HST data, the image processing analysis of a single lens takes approximately 48 hours over 76 CPUs.
- image pixels: $M \sim 10,000$
- source image pixels: $S \sim 2,000$
- $3\text{ s}$ for 1 iteration, $O(100,000)$ iterations needed.
For JWST’s 4x freq. bands, $M \rightarrow 4M, S \rightarrow 4S$. Runtime: $\sim 3\text{ s} \rightarrow 60\text{ s}$. I.e. $\sim \times 20$

The 17 most spectacular lenses from the COWLS sample, revealed by the JWST imaging through our visual inspection of the COSMOS-Web field. The images are produced combining the four filters (F115W, F150W, F277W, F444W) for an ideal rendering of the lensing evidence. (Mahler et al. 2025)

Why JAX?

Herculens (Galan et al. 2022), GIGA-Lens (Gu et al. 2022) demonstrated successful adoption of JAX in modeling strong gravitational lensing.
Performance gains potentially come from:
- Functions implemented with JAX become faster
- Running on accelerators
- Gradient information reduces no. of iterations in fitting
This project ports the likelihood function, a subset of functionality provided in PyAutoLens, from Numba to JAX

Lesson learnt (programming experience)

Methodology

Porting
1. translate the function to Math
2. translate the Math to Numba, using vectorized programming as much as possible to anticipate the programming paradigm in JAX
3. translate the Numba function to JAX, where in simple case would just work
4. further optimize from there
Organizing & Testing
- Functions are then kept in 3 different modules, original for the original functions, numba for those ported in (2), jax for those ported in (3)
- Metaprogramming is used in setting up unit-test framework (via pytest) to guarantee correctness. pytest-benchmark is used to compare performance differences between these implementations. This feeds back into step (4).

What is Numba

Numba is a jit compiler supporting a subset of Python and NumPy operations, powered by LLVM. While it is possible to target the GPU via CUDA, it requires rewriting the function using different APIs and paradigms, not to mention it is CUDA (i.e. NVidia) only.

What is JAX

JAX is a jit compiler, tracing compiler by Google, powered by XLA compiler, originated from Google. JAX is designed primarily for machine learning workloads but is suitable for scientific computing as well. It is a tracing compiler removing side-effects of function. I.e. effectively it encourages functional programming paradigm and thinking. It automatically targets multiple hardware architectures including CPU, GPU, TPU, without requiring rewriting.
I.e. it solves the “two-language problem”, or more accurately, “three-implementation problem”: prototype/API, CPU, GPU.

Numba vs. JAX

Numba and JAX are both Domain-Specific Languages (DSLs), with different kinds of fallbacks when complete jit compilation of a function is not possible.
- Better think of it as language + compiler + library.

Numba vs. JAX
Numba	JAX
C-like mini language	Smaller language ($\text{JAX} \underset{\sim}{\subset} \text{Numba}$): restrictions on control flow, mutation, and dynamic shapes
Implements a subset of Python+NumPy, with a parallelization model similar to a mini-“OpenMP”	Implements a subset of Python+NumPy+SciPy exposed via duck-typing.
NumPy implementations are dropped in replacement but only a subset is implemented. Calling NumPy within jitted function is completely hijacked. Documentation is minimal.	`jax.numpy` and `jax.scipy` have similar API comparing to NumPy and SciPy, but has its own documentation. This facilitates deviations in behaviors.
Functions “recompile” whenever input type changes.	Functions “recompile” whenever input type and shape changes.
No automatic compiling & offloading to accelerator. No autograd/autodiff.	Going through FFI is more costly: memory transfer from and to device, losing autograd/autodiff.

Characteristics of JAX

tracing compiler & recompile per shape change $\Rightarrow$ static_argnums

@partial(jax.jit, static_argnums=0)
def this_recompile_everytime(shape):
    return jax.numpy.zeros(shape)

Compiler Driven Design
- Especially in JAX, partly because of its functional paradigm, framing your problem in JAX idiomatic expressions results in great speed up, sometimes more than you could do otherwise in Numba because of its design (recompile per shape, fusion/fusing compatible operations, etc.), but you’ll hit a wall if you want more low-level optimizations.
- It can also means performance improvements can come for free through compiler improvements, as long as your code is written in JAX idiomatic way.
Easy to port to GPU without setting one up.
JAX vs numba-cuda: The XLA compiler handles device-specific optimization automatically.
JAX nudges you to write correct code, and performance comes as a bonus.

$\tilde{w}$—Code: original version