Bayesian Inference for Cosmology with JAX

Wassim Kabalan

Alexandre Boucaud, François Lanusse

2025-05-01

Outline for This Presentation

Understand Cosmological Inference: Learn how we go from observations to cosmological parameters.
From χ² to Bayesian Inference: See how Bayesian modeling generalizes classical approaches.
Learn Forward Modeling and Hierarchical Models: Understand generative models and field-level inference.
Explore Modern Tools (JAX, NumPyro, BlackJAX): Use practical libraries for scalable inference.
Prepare for Hands-On Notebooks: Apply Bayesian techniques in real examples using JAX.

“Let me walk you through what we’re aiming to cover today.”

First, we’ll build an understanding of cosmological inference — how we move from raw observational data to constraints on cosmological parameters. This includes both the intuition and the mathematical machinery behind it.
Second, we’ll see how Bayesian inference generalizes the classical approach. Instead of just optimizing a χ², we model uncertainty and latent structure more fully.
Third, we’ll dive into forward modeling and hierarchical models. These are especially relevant in modern cosmology where we simulate the full data generation process and marginalize over latent variables.
Then we’ll explore some of the modern tools that make all of this practical: JAX for fast, differentiable computing; NumPyro for probabilistic programming; and BlackJAX for flexible sampling.
Finally, the goal is for you to leave prepared for the hands-on notebooks, where you’ll implement and explore real inference pipelines using JAX-based tools.

The goal of this is to able to start doing bayesian inference .. so if you have any questions, please ask them during the presentation.

Background : Inference in Cosmology: The Big Picture

Inference in Cosmology: The Frequentist Pipeline

cosmological parameters (Ω): matter density, dark energy, etc.

Predict observables: CMB, galaxies, lensing

Extract summary statistics: $P(k)$, $C_\ell$ , 2PCF

Compute likelihood: $L(\Omega \vert data)$

Estimate $\hat{\Omega}$ via maximization ($\chi^2$ fitting)

Frequentist Toolbox

Optimizers/Gradient descent
2-point correlation function (2PCF)
Power spectrum fitting: $P(k)$, $C_\ell$

“This is the traditional approach many of you are already familiar with — the frequentist pipeline.”

We start with a set of cosmological parameters, denoted here as Ω — this could include things like the matter density, dark energy equation of state, etc.
These parameters are used to predict observable quantities, such as the CMB power spectrum, galaxy clustering, or weak lensing shear.
From the data, we extract summary statistics — typically things like 2-point correlation functions, power spectra (P(k), C_ℓ), or other reduced forms of the data.
Then, we compute a likelihood — usually assuming a Gaussian form for the summary statistics — and estimate parameters by maximizing this likelihood. That gives us a point estimate Ω̂.
This pipeline works well when:
- You have a reliable analytic likelihood,
- The summary statistics are informative,
- And the assumptions (e.g. Gaussianity, linear regime) hold.
The box in the lower right shows what’s typically in the frequentist toolbox — χ² fitting, 2PCF, and so on.

“This pipeline is efficient and interpretable — but as we’ll see, it has limitations when you move beyond simple, low-dimensional, or linear models.”

Inference in Cosmology: The Bayesian Pipeline

Start from summary statistics: $P(k)$, $C_\ell$ , 2PCF

Sample from a Prior $P(\Omega)$

Compute likelihood: $L(Obs \vert \Omega)$

Sampler from the Posterior $P(\Omega \vert Obs)$

Bayesian Toolbox

Priors encode beliefs: $P(\Omega)$
Hierarchical Bayesian Modeling (HBM)
Probabilistic programming (e.g., NumPyro)
Gradient-based samplers: HMC, NUTS

Inference in Cosmology: The Bayesian Pipeline

Prior: Theory-driven assumptions $P(\Omega)$

Latent variables: Hidden/unobserved $z \sim P(z \mid \Omega)$

Likelihood: Generates observables $P(\text{Obs} \mid \Omega, z)$

Posterior: infer $P(\Omega \mid \text{Obs})$

Inference in Cosmology: The Bayesian Pipeline

Bayes’ Rule with all components:

Full decomposition of the posterior. The denominator marginalizes over all possible parameters.

\[ \underbrace{P(\Omega \mid \text{Obs})}_{\text{Posterior}} = \frac{ \underbrace{P(\text{Obs} \mid \Omega)}_{\text{Likelihood}} \cdot \underbrace{P(\Omega)}_{\text{Prior}} }{ \underbrace{ \int P(\text{Obs} \mid \Omega) P(\Omega) \, d\Omega }_{\text{Evidence}} } \]

\[ \underbrace{P(\Omega \mid \text{Obs})}_{\text{Posterior}} = \frac{ \underbrace{\int P(\text{Obs} \mid \Omega, z)\, P(z \mid \Omega)\, dz}_{\text{Likelihood (marginalized over latent $z$)}} \cdot \underbrace{P(\Omega)}_{\text{Prior}} }{ \underbrace{ \int \left[ \int P(\text{Obs} \mid \Omega, z)\, P(z \mid \Omega)\, dz \right] P(\Omega)\, d\Omega }_{\text{Evidence}} } \]

In practice, we drop the evidence term when sampling — it’s a constant.

\[ P(\Omega \mid \text{Obs}) \propto \underbrace{\int P(\text{Obs} \mid \Omega, z)\, P(z \mid \Omega) \, dz}_{\text{Marginal Likelihood}} \cdot \underbrace{P(\Omega)}_{\text{Prior}} \]

\[ \log P(\Omega \mid \text{Obs}) = \log P(\text{Obs} \mid \Omega) + \log P(\Omega) \]

Bayes’ Rule in Practice

The posterior combines theory (prior) and data (likelihood) to infer cosmological parameters.
Latent variables $z$ encode hidden structure (e.g., initial fields, nuisance parameters).
The evidence is often ignored during sampling (it’s constant).
Model comparison via the Bayes Factor:

\[ \text{Bayes Factor} = \frac{P(\text{Obs} \mid \mathcal{M}_1)}{P(\text{Obs} \mid \mathcal{M}_2)} \]

Speaker Notes for This Sequence:

STEP 1:

We now extend the Bayesian pipeline by introducing latent variables — denoted z. These are hidden, unobserved quantities such as initial conditions, noise fields, or instrumental effects.

The prior encodes our belief over cosmological parameters Ω. It’s unobserved, unknown, and is the target of inference.
The latent variables z are also unobserved and unknown, but they are conditional on the prior — they depend on Ω and are integrated out (marginalized) during inference.
The likelihood is a forward model that generates observables given both the prior and latent structure: $\mathcal{L}(\text{Obs} \mid \Omega, z)$
The posterior combines all of these: it tells us how probable different cosmological parameters are, given the data and the model structure: $P(\Omega \mid \text{Obs}) \propto \int \mathcal{L}(\text{Obs} \mid \Omega, z) \, P(z \mid \Omega) \, dz \cdot P(\Omega)$

This hierarchical view is what powers modern cosmological inference.

STEP 2:

Here’s a polished version of your speaker notes for that section:

We now move to the full Bayesian formula — starting without latent variables:

This gives us the posterior as the product of the likelihood and the prior, normalized by the evidence. The evidence $P(\text{Obs})$ ensures the posterior is a proper probability distribution.

Now, when we introduce latent variables $z$, the likelihood itself becomes an integral over those:

This marginal likelihood accounts for the full hidden structure.

In practice, we ignore the evidence term when sampling:

It’s constant for a given model — so it doesn’t affect posterior shape.
It’s computationally expensive to compute (requires full integration).
But: it’s very useful for comparing models.

Summary:

We sample directly from the posterior, which combines prior and likelihood.
Latent variables model hidden or uncertain structure — like initial conditions.
The evidence is dropped during sampling, but becomes important in model comparison using the Bayes Factor:

This tells us which model is better supported by the data.

Two Roads to Inference: Frequentist and Bayesian

Conceptual Differences

Concept	Frequentist	Bayesian
Parameters	Fixed but unknown	Random variables with a prior
Goal	Point estimate (e.g. MLE)	Full distribution (posterior over parameters)
Uncertainty	From data variability	From parameter uncertainty (posterior)
Prior Knowledge	Not used	Explicitly included via prior $P(\Omega)$
Interval Meaning	Confidence interval: “95% of experiments contain truth”	Credible interval: “95% chance truth is in this range”
Likelihood Role	Central in $\chi^2$ minimization, fits	Combined with prior to form posterior
Inference Output	Best-fit estimate + error bars	Posterior distribution
Tooling	Optimization (e.g. χ², maximum likelihood)	Sampling (e.g. MCMC, HMC, NUTS)

Although these approaches are often contrasted, they’re not mutually exclusive. Modern workflows — like causal inference in Statistical Rethinking — draw on both perspectives. Bayesian methods offer a formal way to combine theory and data, especially powerful when simulations are involved.

This slide lays out a direct comparison between the frequentist and Bayesian approaches.

We’re highlighting not just the philosophical differences, but also the practical consequences.

A few key contrasts to emphasize:

Parameters: In frequentist stats, parameters are fixed but unknown. In Bayesian stats, they’re random variables — we assign distributions to represent our uncertainty.
Goal: Frequentists usually aim for a point estimate (like the MLE). Bayesians aim to recover the entire posterior distribution.
Uncertainty: Frequentists focus on data variability — uncertainty from random samples. Bayesians focus on parameter uncertainty — how uncertain we are about the parameters given the data.
Intervals: The interpretations are totally different. A frequentist says, “95% of the time, this interval contains the truth.” A Bayesian says, “There’s a 95% chance the truth is in this interval.”
Tooling: Optimization vs. sampling. Frequentists often rely on curve fitting, minimization, etc. Bayesian workflows rely on sampling (MCMC, HMC, NUTS).

Make sure the audience sees this isn’t about choosing sides. As the bottom note says — they’re not mutually exclusive. A lot of modern workflows combine both perspectives.

🛠️ The Mechanics of Inference

Sampling the Posterior: The Core Loop

The Sampling Loop:

Start from a sample $(\Omega^t, z^t)$

Propose new sample $(\Omega', z')$

Compute acceptance probability

Accept or reject proposal

Repeat and store accepted samples ⟶ posterior

Goal: Explore the full shape of the posterior
(even in high-dim, non-Gaussian spaces)

Key Takeaways

Most samplers follow this accept/reject loop
Differ by how they propose samples: – Random walk (e.g., MH) – Gradient-guided (e.g., HMC, NUTS)
Some skip rejection (e.g., Langevin, VI)

This slide illustrates the core mechanism behind most MCMC samplers — the accept/reject loop.

We start with a current sample from the posterior, say $(\Omega^t, z^t)$. The sampler then proposes a new point $(\Omega', z')$, using some rule — it might be a random walk, or it might use gradients like in HMC or NUTS.

Next, we compute the acceptance probability — this depends on how likely the new sample is under the posterior compared to the current one.

Then we make a decision:

If the new sample is more likely (or meets some acceptance threshold), we accept it and add it to the chain.
If not, we reject it and store the current one again.

This process repeats to build a chain of samples. The accepted ones collectively approximate the posterior distribution.

On the right, the key takeaways:

Most samplers use this loop.
The difference lies in how they generate proposals — basic methods use random walks, while advanced methods use gradients.
Some newer algorithms avoid rejection altogether — like variational inference or some Langevin-based flows — but the classic accept/reject structure remains fundamental to many MCMC approaches.

Sampling Algorithms at a Glance

Metropolis-Hastings (MCMC)

Propose: Random walk $\Omega' \sim \mathcal{N}(\Omega^t, \sigma^2)$
Accept:

\[ \alpha = \min\left(1, \frac{P(\text{Obs} \mid \Omega') P(\Omega')}{P(\text{Obs} \mid \Omega^t) P(\Omega^t)}\right) \]

Hamiltonian Monte Carlo (HMC)

Propose: Simulate physics Trajectory via gradients $\nabla\_\Omega \log P(\text{Obs} \mid \Omega)$
Accept: Based on Hamiltonian energy conservation. $\alpha = \min(1, e^{\mathcal{H}(\Omega^t, p^t) - \mathcal{H}(\Omega', p')})$

NUTS (No-U-Turn Sampler) Same as HMC, but auto-tunes:

Step size
Trajectory length (stops before looping back)

This slide gives a quick overview of three core MCMC algorithms.

We start with Metropolis-Hastings (MH): It proposes a new sample using a simple random walk — typically from a Normal centered on the current value. The acceptance probability is the ratio of posteriors — new over old — and we accept based on how much better the new sample fits. You can see on the diagram: it just takes a small step and checks whether to keep it.

Next, Hamiltonian Monte Carlo (HMC): Rather than proposing random jumps, HMC uses gradients to simulate a physical trajectory through parameter space — like a particle rolling through a potential landscape. This allows it to make larger moves that still preserve the posterior distribution. Acceptance here is based on energy conservation, using a Hamiltonian formulation.

Finally, NUTS (No-U-Turn Sampler): This builds on HMC, but it adds smart tuning:

It automatically adjusts step size
It stops the trajectory before looping back on itself — hence “No-U-Turn” This makes NUTS a great default sampler in most PPLs — it avoids a lot of manual tuning and works well out of the box.

Together, these illustrate a spectrum: from simple MH to advanced gradient-based samplers — and show how modern samplers make use of geometry to efficiently explore high-dimensional spaces.

Gradient-Based Sampling in Action

Gradient-Based Sampling in Action

In high dimensions, random walk proposals (MCMC) often land in low-probability regions ⟶ low acceptance.
To maintain acceptance, step size must shrink like $1/\sqrt{d}$ ⟶ very slow exploration.
HMC uses gradients to follow high-probability paths ⟶ better samples, fewer steps.

In this slide we compare HMC and traditional MCMC in two scenarios:

Top row: Gaussian posterior

HMC (left) aligns well with the true density contours — samples are well spread.
MCMC (right) struggles a bit — it’s noisy, slightly distorted, and shows correlated samples.
That’s because MCMC does a random walk, which is inefficient even in simple geometries.

Bottom row: Banana-shaped posterior

This is a nonlinear, curved posterior — a much harder target.
HMC (left) still tracks the true shape well using its gradient information.
MCMC (right) again struggles: it oversamples in wrong regions and can’t explore the full space.

Key Point:

HMC shines when the geometry is tricky. Its gradients guide proposals along the posterior, unlike MCMC’s aimless wandering.

This motivates why we use HMC or NUTS in high-dimensional, curved, or strongly correlated problems — like cosmology.

Slide: Sampling Without Gradients

This shows how traditional MCMC struggles when sampling from complex distributions.
Here, proposals are based on random walks, which means they can easily jump into low-probability regions.
As a result, many proposals are rejected — leading to inefficient sampling.
To maintain high acceptance, samplers reduce step size — but this slows down exploration dramatically, especially in high dimensions.
You can see the samples (blue dots) under-sample the second peak and have poor coverage overall.

Slide: Sampling With Gradients

Now we add gradient information — this is what HMC uses.
Gradients give the sampler a sense of “direction,” pointing it toward high-probability areas.
Instead of random jumps, we simulate trajectories that follow the shape of the distribution.
This enables much better exploration with fewer steps.
As a result, samples land more effectively across both peaks and better represent the target distribution.
This illustrates why gradient-based samplers like HMC or NUTS perform so well in high-dimensional, structured problems.

Differentiable Inference with JAX

When it comes to gradients, always think of JAX.

An Easy pythonic API

import jax
import jax.numpy as jnp
from jax import random

def sample_prior(key):
    return random.normal(key, shape=(3,))  # Ω ~ N(0, 1)

def log_prob(omega):
    return -0.5 * jnp.sum(omega**2)  # log p(Ω) ∝ -Ω²

log_prob_jit = jax.jit(log_prob)

Easily accessible gradients using GRAD

omegas = ... # Sampled Ω
gradients = jax.grad(log_prob_jit)(omegas)

Supports vectorization using VMAP

def generate_samples(seeds):
    key = jax.random.PRNGKey(seeds)
    omega = sample_prior(key)
    return omega
seeds = jnp.arange(0, 1000)
omegas = jax.vmap(generate_samples)(seeds)

This is why JAX is such a natural fit for inference: it’s fully differentiable and built around gradients.
On the left we show three core ideas that power modern inference.

First block: JAX gives you a familiar NumPy-like API, with tools like jit to compile and optimize code. You define a prior, a log-prob function, and wrap it in jit — easy and fast.

Second block: With jax.grad, you can differentiate any function. That means you get gradients of the log-posterior “for free,” which is exactly what HMC or variational inference need.

Third block: JAX scales easily — use vmap to vectorize your function across many seeds, particles, or chains. This is a huge win when doing amortized inference or simulation-based methods.

Altogether, JAX provides the gradient plumbing for probabilistic inference — while remaining readable and fast.

Practical Bayesian Modeling & Inference with JAX

A Recipe for Bayesian Inference

1. Probabilistic Programming Language (PPL) NumPyro:

import numpyro
import numpyro.distributions as dist

def model():
    omega_m = numpyro.sample("Ωₘ", dist.Uniform(0.1, 0.5))
    sigma8 = numpyro.sample("σ₈", dist.Normal(0.8, 0.1))

2. Computing Likelihoods JAX-Cosmo:

import jax_cosmo as jc
def likelihood(cosmo_params):
    mu, cov = jc.angular_cl.gaussian_cl_covariance_and_mean(
        cosmo_params, ell, probes
    )
    return jc.likelihood.gaussian_log_likelihood(data, mu, cov)

3. Sampling the Posterior NumPyro & Blackjax:

from numpyro.infer import MCMC, NUTS

kernel = NUTS(model)
mcmc = MCMC(kernel, num_warmup=500, num_samples=1000)
mcmc.run(random.PRNGKey(0))
samples = mcmc.get_samples()

4. Visualizing the Posterior ArviZ:

import arviz as az
samples = mcmc.get_samples()
az.plot_pair(samples, marginals=True)

@credit: Zeghal et al. (2409.17975)

Here are the speaker notes for the two slides:

Slide 1: “A Recipe for Bayesian Inference”

Now let’s break Bayesian inference into a practical workflow.
First, we define our model using a probabilistic programming language — here, NumPyro. This is where we encode the prior and the structure of the model.
Second, we use a tool like JAX-Cosmo to compute the likelihood. This connects cosmological parameters to observable predictions — such as angular power spectra.
Third, we use MCMC or BlackJAX to sample from the posterior.
- This is where sampling happens, and where gradient-based methods like NUTS come into play.
Finally, we extract posterior samples to analyze uncertainty and parameter correlations.

The graphic on the right summarizes this logic visually: from prior → likelihood → posterior → sample.

Slide 2: “A Recipe for Bayesian Inference (Full Loop)”

This slide extends the recipe with the final step: visualization.
After sampling, we can summarize and visualize the posterior with tools like ArviZ.
The corner plot on the right shows the joint and marginal distributions — a key diagnostic to assess whether inference worked and how parameters are correlated.
The dashed vs. solid lines show different inference strategies — possibly explicit (forward simulations) vs implicit likelihoods.

The takeaway is that with JAX, NumPyro, and JAX-Cosmo, we have a modular pipeline for Bayesian inference — from model definition to visual diagnostics.

A Minimal Bayesian Linear Model

Define a simple linear model:

true_w = 2.0
true_b = -1.0
num_points = 100

rng_key = jax.random.PRNGKey(0)
x_data = jnp.linspace(-3, 3, num_points)
noise = jax.random.normal(rng_key, shape=(num_points,)) * 0.3
y_data = true_w * x_data + true_b + noise

def linear_regression(x, y=None):
    w = numpyro.sample("w", dist.Normal(1., 2.))
    b = numpyro.sample("b", dist.Normal(0., 2.))  # Fixed the second parameter
    sigma = numpyro.sample("sigma", dist.Exponential(1.0))

    mean = w * x + b
    numpyro.sample("obs", dist.Normal(mean, sigma), obs=y)

Run the model using NUTS:

kernel = numpyro.infer.NUTS(linear_regression)
mcmc = numpyro.infer.MCMC(kernel, num_warmup=500, num_samples=1000)
mcmc.run(rng_key, x=x_data, y=y_data)

Posterior corner plot using arviz + corner

idata = az.from_numpyro(mcmc)
posterior_array = az.extract(idata, var_names=["w", "b", "sigma"]).to_array().values.T

fig = corner.corner(
    posterior_array,
    labels=["w", "b", "σ"],
    truths=[true_w, true_b, None],
    show_titles=True
)
plt.show()

Numpyro: Tips & Tricks for Bayesian Modeling

numpyro.handlers.seed: Fix randomness for reproducibility

from numpyro.handlers import seed
seeded_model = seed(model, rng_key)

numpyro.handlers.trace: Inspect internal execution and sample sites

from numpyro.handlers import trace
tr = trace(model).get_trace()
print(tr["omega"])

numpyro.handlers.condition: Clamp a variable to observed or fixed value

from numpyro.handlers import condition
conditioned_model = condition(model, data={"omega": 0.3})

numpyro.handlers.substitute: Replace variables with fixed values (e.g., MAP estimates)

from numpyro.handlers import substitute
subbed_model = substitute(model, data={"omega": 0.3})

numpyro.handlers.reparam: Reparameterize a site to improve geometry

from numpyro.infer.reparam import LocScaleReparam
from numpyro.handlers import reparam

reparammed_model = reparam(model, config={"z": LocScaleReparam()})

Using BlackJax and NumPyro

BlackJax is NOT a PPL, so you need to combine it with a PPL like NumPyro or PyMC.

Initialize model and extract the log-probability function

rng_key, init_key = jax.random.split(rng_key)
init_params, potential_fn, *_ = initialize_model(
    init_key, model, model_args=(x_data,), model_kwargs={"y": y_data}, dynamic_args=True
)

logdensity_fn = lambda position: -potential_fn(x_data, y=y_data)(position)
initial_position = init_params.z

Run warm-up to adapt step size and mass matrix using BlackJAX’s window adaptation

num_warmup = 2000
adapt = blackjax.window_adaptation(blackjax.nuts, logdensity_fn, target_acceptance_rate=0.8)
rng_key, warmup_key = jax.random.split(rng_key)
(last_state, parameters), _ = adapt.run(warmup_key, initial_position, num_warmup)
kernel = blackjax.nuts(logdensity_fn, **parameters).step

Run BlackJAX NUTS sampling using lax.scan

def run_blackjax_sampling(rng_key, state, kernel, num_samples=1000):
    def one_step(state, key):
        state, info = kernel(key, state)
        return state, state

    keys = jax.random.split(rng_key, num_samples)
    _, samples = jax.lax.scan(one_step, state, keys)
    return samples

samples = run_blackjax_sampling(rng_key, last_state, kernel)

Convert BlackJAX output to ArviZ InferenceData

idata = az.from_dict(posterior=samples.position)

Sampler Comparison Table

Sampler	Library	Uses Gradient	Auto-Tuning	Rejection	Best For	Notes
MCMC (SA)	NumPyro	❌	❌	✅	Simple low-dim models	No gradients; slow mixing
HMC	NumPyro / BlackJAX	✅	❌	✅	High-dim continuous posteriors	Needs tuned step size & trajectory
NUTS	NumPyro / BlackJAX	✅	✅	✅	General-purpose inference	Adaptive HMC
MALA	BlackJAX	✅	❌	✅	Local proposals w/ gradients	Stochastic gradient steps
MCLMC	BlackJAX	✅	✅ (via L)	❌	Large latent spaces	Unadjusted Langevin dynamics
Adj. MCLMC	BlackJAX	✅	Manual (L)	✅	Bias-controlled Langevin sampler	Includes MH step

For more information check Simons et al. (2025), §2.2.3, arXiv:2504.20130

Speaker Notes – Sampler Comparison Table

This table gives a high-level overview of common samplers available in NumPyro and BlackJAX, organized by key features.

MCMC (SA): This is standard Metropolis-Hastings — no gradients, no tuning, but it’s simple and useful for low-dimensional problems. Slow mixing is a drawback.
HMC: Hamiltonian Monte Carlo improves mixing by using gradients to simulate physics-based trajectories. It requires tuning for step size and path length, so it’s more manual.
NUTS: No-U-Turn Sampler is HMC with built-in auto-tuning. It adapts step size and path length during warm-up, making it the default choice for general-purpose inference in NumPyro and BlackJAX.
MALA: Metropolis-Adjusted Langevin Algorithm uses gradients for local proposals. It’s a good middle ground — cheaper than full HMC, but more efficient than MH.
MCLMC: Stochastic Langevin dynamics with no rejection step. Useful in large latent spaces, often in simulator-based models, but biased due to lack of correction.
Adjusted MCLMC: Same as MCLMC, but now with a Metropolis-Hastings correction step. This helps reduce bias, at the cost of a little more computation.

Main idea: Pick the right sampler based on:

whether you can compute gradients
dimensionality of the posterior
whether you need auto-tuning or not
whether you want unbiased samples (with rejection) or not.

Final note: This is a simplified summary. For real applications, benchmarking is always best.

Examples: Bayesian Inference for Cosmology

Power Spectrum Inference with jax-cosmo

Step 1: Simulate Cosmological Data

Define a fiducial cosmology to generate synthetic observations

fiducial_cosmo = jc.Planck15()
ell = jnp.logspace(1, 3)  # Multipole range for power spectrum

Set up two redshift bins for galaxy populations

nz1 = jc.redshift.smail_nz(1., 2., 1.)
nz2 = jc.redshift.smail_nz(1., 2., 0.5)
nzs = [nz1, nz2]

Define observational probes: weak lensing and number counts

probes = [
    jc.probes.WeakLensing(nzs, sigma_e=0.26),
    jc.probes.NumberCounts(nzs, jc.bias.constant_linear_bias(1.))
]

Generate synthetic data using the fiducial cosmology

mu, cov = jc.angular_cl.gaussian_cl_covariance_and_mean(fiducial_cosmo, ell, probes)
rng_key = jax.random.PRNGKey(0)
noise = jax.random.multivariate_normal(rng_key, jnp.zeros_like(mu), cov)
data = mu + noise  # Fake observations

Step 2: Define the NumPyro Model

# Define a NumPyro probabilistic model to infer cosmological parameters
def model(data):
    Omega_c = numpyro.sample("Omega_c", dist.Uniform(0.1, 0.5))
    sigma8 = numpyro.sample("sigma8", dist.Uniform(0.6, 1.0))
    
    # Forward model: compute theoretical prediction given parameters
    cosmo = jc.Planck15(Omega_c=Omega_c, sigma8=sigma8)
    mu, cov = jc.angular_cl.gaussian_cl_covariance_and_mean(cosmo, ell, probes)
    
    # Likelihood: multivariate Gaussian over angular power spectra
    numpyro.sample("obs", dist.MultivariateNormal(mu, cov), obs=data)

This slide introduces a basic end-to-end inference workflow using jax-cosmo and NumPyro.

Step 1: Simulate synthetic cosmological data

We start with a fiducial cosmology using jc.Planck15().
Define a multipole range (ell) and two redshift distributions for tomographic bins.
Create weak lensing and number count probes.
Then, using the angular power spectrum mean and covariance from jax_cosmo, we generate mock observations by adding Gaussian noise.

Step 2: Define the NumPyro model

The model samples two parameters: Omega_c and sigma8.
A new cosmology object is constructed with those parameters.
Then the power spectrum prediction is computed and matched to data with a multivariate Gaussian likelihood.

This setup mirrors what we do in practice: forward-model observables, simulate noisy data, and recover parameters via inference.

Full Field Inference with Forward Models

Bayesian Inference using power spectrum data:

Bayesian Inference using full field data:

Recap: Bayesian inference maps theory + data → posterior

Cosmological Forward models
- Start from cosmological + latent parameters
- Sample initial conditions
- Evolve using N-body simulations
- Predict convergence maps in tomographic bins

Simulation-Based Inference
- Compare predictions to real survey maps
- Build a likelihood from the forward model
- Infer cosmological parameters from full field data

Full Field vs. Summary Statistics

Preserves non-Gaussian structure lost in summaries
Enables tighter constraints in nonlinear regimes
Especially useful in high-dimensional inference problems
See: Zeghal et al. (2024), Leclercq et al. (2021)
🔜 a talk on this topic this Thursday

Speaker Notes for Final Slide (Full Field Inference):

We now shift to the most general and flexible form of inference — using full field data.
This means we don’t extract summary statistics like $C_\ell$ — instead, we model the forward process end-to-end, including simulations.
The green box shows the core components of this forward model pipeline:
- Start by sampling cosmological + latent parameters (e.g., initial conditions)
- Use an N-body simulation to evolve structure
- Predict observables (e.g., weak lensing convergence maps)
- Compare the simulated maps to real observations to construct a likelihood
This approach preserves non-Gaussian information, which is critical in the nonlinear regime.
We’ll focus specifically on the contents of the green box in the hands-on notebook:
- Sampling initial fields
- Running a small-scale N-body simulation
- Building a likelihood from simulation output
It’s an extremely powerful method, and the frontier of cosmological inference.

Conclusion

Conclusion: Why Bayesian Inference?

Key Takeaways

Bayesian modeling enables flexible, end-to-end inference pipelines — from analytical likelihoods to full forward simulations.
The JAX ecosystem (NumPyro, BlackJAX, jax-cosmo…) lets you focus on modeling, not low-level math.
Gradients + differentiable simulators make inference scalable — even in complex, high-dimensional models.
These tools are now mature, fast, and usable — and already applied to realistic cosmological settings.

Future Work

Distributed, differentiable N-body simulations enable full-field inference at survey scale.
We look forward to applying these models to real survey data in upcoming projects.

Thank you for your attention!

Speaker Notes – Conclusion Slide

Let’s wrap up with some key takeaways on why Bayesian inference matters, especially in cosmology:
- First, Bayesian modeling is modular and flexible. It gives us a way to go from simple analytic models to full simulator-based pipelines — all under one framework.
- The JAX ecosystem — NumPyro, BlackJAX, jax-cosmo — gives you a complete toolchain. You don’t have to worry about math-heavy derivations or custom samplers. You can focus on the science and modeling.
- A major strength is the ability to leverage gradients and differentiable simulators. That means you can scale inference even for models with thousands or millions of parameters — like field-level inference.
- And importantly, the tools are mature and fast. These aren’t just research toys — they’re ready for real problems.
Now it’s your turn. You’ll have access to two hands-on notebooks:
- One walks you through Bayesian regression and cosmological inference with NumPyro.
- The other dives into field-level inference and simulation-based modeling.
This is where things become tangible — you’ll code your own inference pipeline, simulate structure formation, and run real MCMC samplers.
That’s the power of combining Bayesian ideas with modern tools — and that’s where the future of cosmological inference is headed.

Hands on notebooks

Hands-On Notebooks:

Beginner Bayesian Inference with NumPyro & Blackjax here
Intermediate Bayesian Inference with NumPyro & Blackjax here
some of the animation were made using this notebook