The Julia CMB emulator: here comes Capse!

One of the recent hot topics in Cosmology is represented by the development of emulators, surrogate models that are meant to replace a computationally expensive function with an approximation that is way cheaper. Focusing on CMB emulators, two of the last emulators built are ClassNet[1] and Cosmopower[2]. Following different strategies and approaches, these two frameworks are able to compute the CMB angular spectrum in a fraction of the time required by an Einstein-Boltzmann solver; in particular, Cosmopower gives a speedup of a factor of 1.000x, computing the CMB power spectrum in a few milliseconds.

So, the interested reader might ask us a simple question: why are you working on yet another CMB emulator? Can't you just use what is already out there?

I think that, although the codes developed by the community have incredibile performance and versatility, we can do better, reducing the amount of resources required to train the emulators and reaching a higher computational performance. Furthermore, I wanna leverage one of the emulators main characteristics, their differentiability. This open up to the possibility of using gradient-based algorithms for bayesian inference and minimization.

In order to reach that goal, together with Federico Bianchini[3] and Jaime Ruiz-Zapatero[4], I have developed Capse.jl, a CMB Angular Power Spectrum emulator written using the Julia language. The paper has been fomally accepted for publication in the Open Journal of Astrophysics and a revised version of the paper just hit the arXiv.

So, what are the main characteristics of Capse.jl and why should you consider giving it a try?

  • ⚡ Capse.jl is blazingly fast. It builds on SimpleChains.jl[5], a Julia Neural Network library designed with a limited scope: develop fast neural networks running on the CPU[6].

  • ⚗️ In order to minimize the amount of resourced used, Capse.jl performs a physics-based preprocess of the output features[7].

  • 🔋 Capse.jl is cheap to train: as a consequence of the previous two points, it can be trained in around one hour using a standard CPU.

  • 🎯 Capse.jl can be coupled with gradient-based algorithms, in order to perform bayesian analysis and/or minimizations.

Furthermore, Capse.jl is written, ça va sans dire, in pure Julia. This means that it can be easily deployed on different kind of machines (installing JAX and TensorFlow on Mac devices can be troublesome), even on Android smartphones. Yes, I have used Capse.jl on my Android smartphone and it was incredibly performant also there.

In order to reproduce all the results of this paper, in this repository there are the notebooks to reproduce all of our results.

Let us now give a quick summary of the main points of the Capse.jl emulator.

Installation & Usage

Installing Capse.jl is quite easy. After moving to the package environment, you can enter the Pkg REPL by pressing ] from the Julia REPL. To get back to the Julia REPL, press Ctrl+C or backspace (when the REPL cursor is at the beginning of the input).

Upon entering the Pkg REPL, you should see the following prompt:

(@v1.10) pkg>

Now you can install Capse.jl

(@v1.10) pkg> add

Now load the following packages

using NPZ
using Capse

Now you have to load the NN, that we are passing to Capse.jl. There are two possibilities:

  1. You can define a SimpleChains.jl emulator

using SimpleChains
mlpd = SimpleChain(
  TurboDense(tanh, 64),
  TurboDense(tanh, 64),
  TurboDense(tanh, 64),
  TurboDense(tanh, 64),
  TurboDense(tanh, 64),
  TurboDense(identity, 40)
  1. You can use a Dict and the built-in method instantiate_NN

NN_dict = Dict("n_input_features" => 6,
               "n_output_features" => 4999,
               "n_hidden_layers" => 5,
               "layers" => Dict("layer_1" => Dict("activation_function" => "tanh", "n_neurons" => 64),
                                "layer_2" => Dict("activation_function" => "tanh", "n_neurons" => 64),
                                "layer_3" => Dict("activation_function" => "tanh", "n_neurons" => 64),
                                "layer_4" => Dict("activation_function" => "tanh", "n_neurons" => 64),
                                "layer_5" => Dict("activation_function" => "tanh", "n_neurons" => 64)

mlpd = Capse.instantiate_NN(NN_dict)

The former is more direct, but the latter has the advantage that, since dictionaries can be saved as JSON files, is more portable (this is what we are currently doing with the official repo for the Capse.jl release paper).

Now you we have to load the emulator, which requires three elements:

  • the ℓ\ell-grid the emulator has been trained on

  • the trained weights

  • the input and output features minimum and maximum ranges, used the perform the min-max normalization

ℓ = npzread("l.npy")

weights_TT = npzread("weights_TT_lcdm.npy")
trained_emu_TT = Capse.SimpleChainsEmulator(Architecture= mlpd, Weights = weights_TT)
CℓTT_emu = Capse.CℓEmulator(TrainedEmulator = trained_emu_TT, ℓgrid = ℓ,
                             InMinMax = npzread("inMinMax_lcdm.npy"),
                             OutMinMax = npzread("outMinMaxCℓTT_lcdm.npy"))

With a new released version of Capse.jl, we have added a new loading method. In order to work, you need the required files (the same mentioned above) to be put in folder. Then, you just have to type

CℓTT_emu = Capse.load_emulator("/path/to/emulator/")

We can now use the loaded emulator to compute some CℓC_\ell's!

Warning ⚠: Up to now Capse.jl takes the input with an hardcoded order, being them ln⁡1010As\ln 10^{10}A_s, nsn_s, H0H_0, ωb\omega_b, ωc\omega_c and τ\tau. In a future release we are going to add a more flexible way to use it!
Capse.get_Cℓ(input_test, CℓTT_emu)

That's it! The mean execution time of Capse.jl is of 45 μs45\,\mu s! This is almost 3 orders of magnitudes faster than Cosmopower!

Precision tests: residual curves & chains

Capse.jl is the fastest CMB emulator available. But is it precise? This is something we need to check, because as Mr. Pirelli said... Pirelli

Let us start plotting the distribution of the ratio between the emulation error and the expected variance from the forthcoming S4-observatory, for a validation dataset with 20,00020,000 cosmologies covering the entire training space.

Capse error distribution

This means that the emulation error is 1010 times smaller than expected measurements variance for almost all scales[8] for 99%99\% of the validation points!

These are not the only tests we performed in order to assess the accuracy of our emulators. We analyzed the Planck and ACT DR4 datasets, using CAMB and Cobaya, then we used Turing.jl to write a likelihood using Capse.jl. Although this can be different from what is usually done in Cosmology, the following code snippet should highlight how to write a likelihood

@model function CMB_planck(data, covariance)
    #prior on model parameters
    ln10As ~ Uniform(0.25, 0.35)
    ns     ~ Uniform(0.88, 1.06)
    h0     ~ Uniform(0.60, 0.80)
    ωb     ~ Uniform(0.1985, 0.25)
    ωc     ~ Uniform(0.08, 0.20)
    τ      ~ Normal(0.0506, 0.0086)
    yₚ     ~ Normal(1.0, 0.0025)

    θ = [10*ln10As, ns, 100*h0, ωb/10, ωc, τ]
    #compute theoretical prediction
    pred = theory_planck(θ) ./(yₚ^2)
    #compute likelihood
    data ~ MvNormal(pred, covariance)

    return nothing

Here comes one of the Julia strenghts: the high interoperability of its ecosystem. Given our Turing.jl likelihood, we can use different analysis algorithms:

  • NUTS, the No-U-Turn-Sampler. This is the state-of-the-art gradient-based sampler

  • MicroCanonical Hamiltonian MonteCarlo, MCHMC, a recently developed Hamiltonian MonteCarlo Sampler

  • Pathfinder, a variational inference algorithm

The Julia implementation of these algorithms is interfaced with Turing.jl, so we can easily test these algorithms with our likelihoods! What is the result of this comparison?

Contour Planck

The chains are basically the same, with differences of about 0.1σ0.1\sigma for the MCMC methods, and 0.2σ0.2\sigma for Pathfinder. But what about the computational performance?

  • CAMB analyzed Planck with 8080 CPUhours, with an Effective Sample Size per second (ESS/s) of 0.0040.004

  • Capse.jl + NUTS employed 1 CPUhour, with an ESS/s of 1.6

  • Capse.jl + MCHMC required less than 1 CPUhour, with an ESS/s of 3.7

  • Pathfinder computes the posterior in ~ 1515 seconds

Furthermore, with the updated version of our paper, we have added a Julia version of the SPT 3G likelihood[9], that we checked against the original likelihood (as described in Balkenhol et al 2023). The main difference with the aforementioned Planck likelihood, is that this likelihood as 33 nuisance parameters, that we keep free while running our chains. Again, which is the result of this comparison?

Contour SPT

Also in this case the results are excellent, with differences of marginalized parameters smaller than 0.1 σ0.1\,\sigma. Regarding the computational performance, these are our findings:

  • CAMB analysis required 1,6001,600 CPUhours, with an Effective Sample Size per second (ESS/s) of 0.00030.0003

  • Capse.jl + NUTS employed 14 CPUhours, with an ESS/s of 0.000.00

  • Capse.jl + MCHMC required 12 CPUhours, with an ESS/s of 0.3350.335

Although these results are impressive, there is still room for comes the Chebyshev-based emulator!

Chebyshev emulator

The main idea is quite simple: rather than emulating all the CℓC_\ell's, with a grid with several thousands of elements, let us decompose the power spectrum on the Chebyshev polynomial basis:

Cℓ(θ)≈∑n=0Nmax⁡an(θ)Tn(ℓ) C_{\ell}(\theta) \approx \sum_{n=0}^{N_{\max }} a_n(\theta) T_n(\ell)

The coefficients of the expansion carry all the Cosmological dependence. We found out that, in order to accurately emulate the Planck data, we just need 48 coefficients! A reduction in the output features of the neural network by a factor of 50!

But this is not the best thing we can do. If the operations we need to perform in order to compute the loglikelihood (such as the CℓC_\ell's binning etc...) are linear in the theoretical spectra C(θ)C(\theta), than we can represent them as a linear operator L\mathbb{L} and if this operator does not depend on the cosmological parameter, we can do the following thing:

L∑nan(θ)Tn=∑nan(θ)LTn≡∑nanT^n \mathbb{L} \sum_n a_n(\theta) T_n=\sum_n a_n(\theta) \mathbb{L} T_n \equiv \sum_n a_n \hat{T}_n

We can compute the action of the linear operator on the Chebyshev polynomials TnT_n just once and store it! After this, we just need to make a matrix-vector product in order to compute the binned theoretical vector!

This enables for dramatic speedup: the efficiency of our likelihoods is improved by a factor of 8, at no cost, since the binned theory vector computed with this method coincides, up to floating point precision, with the old one!

Ok, and now?

Although we are quite proud of the results obtained, we are already working on improvements of our framework. First and foremost, we wanna perform a rescaling based on the τ\tau value. This is more subtle, as most spectra are proportional to e−2τe^{-2\tau}, but the low-ℓ\ell part of the EEEE spectrum scales as τ2\tau^2. We are working to implement a nice rescaling, that can improve the performance of our networks.

Furthermore, we are working on improving the Chebyshev approach. Although we succesfully applied this approach to the Planck data, we wanna push it to the same scales of Cosmopower, which reaches ℓmax=10,000\ell_\mathrm{max}=10,000. Preliminary results show that we can actually reach this level, even increasing the precision of our emulators!

Please, feel free to use the comments-tab here or to drop to me, or any of my colleagues, an email for questions or comments.

[1] CosmicNet II: Emulating extended cosmologies with efficient and accurate neural networks (2022)
[2] COSMOPOWER: emulating cosmological power spectra for accelerated Bayesian inference from next-generation surveys
[3] The man with the most beautiful pair of moustaches of the East Coast. Here you can find his website.
[4] I love discussing with Jaime about no-physics related topics. The problem is that he has a way deeper knowledge of phylosophy than me, and every single time I have to admit (at least to myself) that he is right. Here you can find his website.
[5] Here you can find a link to SimpleChains.jl repository.
[6] We have opened a branch where we are addying support to the Lux.jl library, in order to have models running on the GPU as well.
[7] Although we already reached a nice performance, we wanna improve the preprocessing in a future work.
[8] The only exception is the EEEE 2-pt correlation function for ℓ<10\ell<10. We are working to improve the precision of Capse.jl also on these scales.
[9] While we were finalizing our revised paper, a new work appeared on the arXiv (Balkenhol et al. 2024), which implemented candle, a JAX based likelihood which is fully auto-differentiable and hence can use the same tools we have been using in this paper. It's a nice paper, you should check it!