Reproducible Parallel Markov-Chain Monte Carlo

With Applications to Portfolio Optimization

Siddharth Nand

Motivation: Inference in Portfolio Optimization

You manage a portfolio of \(N\) assets.

Let \(\mathbf{r}_t = (r_{1}, \dots, r_{N})\) denote the vector of asset returns at time \(t\).

Using \(T\) periods of historical data, we observe: \[ \mathbf{Y} = \{\mathbf{r}_t\}_{t=1}^T \in \mathbb{R}^{T \times N} \]

we infer the inputs used for portfolio construction: \[ \boldsymbol{\mu} \;=\; \mathbb{E}[\mathbf{r}_t], \qquad \boldsymbol{\Sigma} \;=\; \mathrm{Cov}(\mathbf{r}_t). \]

Core question: How uncertain are these inferred inputs?

Classical Mean–Variance Optimization

Standard portfolio optimization solves:

\[ \max_{\mathbf{w}} \quad \mathbf{w}^\top \widehat{\boldsymbol{\mu}} - \frac{\gamma}{2}\, \mathbf{w}^\top \widehat{\boldsymbol{\Sigma}}\,\mathbf{w} \]

(Equation from [1])

  • Uses point estimates \(\widehat{\boldsymbol{\mu}}, \widehat{\boldsymbol{\Sigma}}\) [2]
  • Treats parameters as known constants

Issue: Parameter uncertainty is ignored

Bayesian View

Treat Parameters as Random Variables

  • Model parameters are uncertain and inferred from data
  • The goal is to characterize a joint distribution over parameters

\[ \mathbf{r}_t \sim p(\mathbf{r}_t \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}), \qquad (\boldsymbol{\mu}, \boldsymbol{\Sigma}) \sim p(\boldsymbol{\mu}, \boldsymbol{\Sigma}) \]

\[ \boldsymbol{\Sigma} = \mathbf{D}\mathbf{C}\mathbf{D}, \qquad \mathbf{D} = \mathrm{diag}(\sigma_1, \dots, \sigma_N) \]

  • \(\mathbf{D}\): scales (volatilities)
  • \(\mathbf{C}\): correlation matrix

Inference state: \(x = (\boldsymbol{\mu}, \mathbf{D}, \mathbf{C})\)

From Modeling to Computation

We seek the posterior distribution:

\[ \mathbb{P}(\boldsymbol{\mu}, \mathbf{D}, \mathbf{C} \mid \mathbf{Y}) \]

By Bayes’ theorem:

\[ \mathbb{P}(\boldsymbol{\mu}, \mathbf{D}, \mathbf{C} \mid \mathbf{Y}) = \frac{\mathbb{P}(\mathbf{Y} \mid \boldsymbol{\mu}, \mathbf{D}, \mathbf{C}) \mathbb{P}(\boldsymbol{\mu}, \mathbf{D}, \mathbf{C})}{\mathbb{P}(\mathbf{Y})} \]

The marginal likelihood requires a high-dimensional integral:

\[ \mathbb{P}(\mathbf{Y}) = \int \mathbb{P}(\mathbf{Y} \mid \boldsymbol{\mu}, \mathbf{D}, \mathbf{C}) \mathbb{P}(\boldsymbol{\mu}, \mathbf{D}, \mathbf{C}) \ d\boldsymbol{\mu} \ d\mathbf{D} \ d\mathbf{C} \]

No closed-form solution

Dimension grows quickly with \(N\)

(See [3]; [4])

Markov-Chain Monte Carlo (MCMC)

MCMC is a class of algorithms that generate samples from a posterior distribution.

Define the unnormalized target density:

\[ \gamma(x) \equiv \mathbb{P}(\mathbf{Y} \mid x) \mathbb{P}(x), \quad x = (\boldsymbol{\mu}, \mathbf{D}, \mathbf{C}) \]

The normalizing constant is:

\[ Z = \mathbb{P}(\mathbf{Y}) = \int \gamma(x) \ dx \]

Although \(Z\) is unknown, we can compute relative posterior densities:

\[ \frac{\mathbb{P}(x' \mid \mathbf{Y})}{\mathbb{P}(x \mid \mathbf{Y})} = \frac{\gamma(x')}{\gamma(x)} \]

How much more (or less) posterior density the model assigns to \(x'\) versus \(x\).

(See [3])

MCMC algorithms generate a Markov chain with states: \[ x^{(1)}, x^{(2)}, \dots, x^{(K)} \]

which (once the chain has converged) are treated as samples from the target distribution \[ \mathbb{P}(x \mid \mathbf{Y}) \propto \gamma(x). \]

All MCMC algorithms:

  • differ in their randomized mechanisms for proposing new states
  • define a Markov chain whose stationary distribution matches the target posterior

Examples: Metropolis–Hastings, Gibbs, Hamiltonian Monte Carlo

(See [5]; [6])

Why MCMC is Expensive in Practice

  • High-dimensional parameter space Even a small portfolio with \(N=5\) assets has:

    • 5 expected returns
    • 5 volatilities
    • 10 correlations → 20 parameters
  • Complex posterior geometry

    • Multiple modes (e.g., volatility regimes)

Result: Single chains make local moves, can get stuck, and yield unreliable uncertainty estimates. (See [7]; [3])

Parallel MCMC: the Industry Standard

In practice, we run multiple chains in parallel:

\[ { x_1^{(1:K)}, \dots, x_M^{(1:K)} } \]

(See [7]; [8])

Parallelism provides:

  • Faster computation
  • Better exploration via multiple initializations
  • Diagnostics (e.g., convergence checks)

However, chains remain independent.

Parallel chains help, but independence limits exploration of complex posteriors.

Interacting MCMC

Interacting MCMC allows chains to share information during sampling.

Examples include:

  • Information exchange about explored regions
  • Proposals based on other chains’ states
  • Heterogeneous stepping strategies

(See [9]; [10]; [8])

Benefits:

  • Reduced mode trapping
  • Improved exploration
  • More stable parameter estimates

The Hidden Problem

Parallel Execution with Interaction

Interacting MCMC requires coordination during execution.

However:

  • Synchronization depends on execution order
  • Scheduling varies across hardware and thread counts
  • Identical runs may follow different computational paths

Why this Matters for Finance

Portfolio and risk applications require:

  • Reproducibility for auditability
  • Deterministic reruns for stress testing
  • Exact replication across environments

Two runs with the same data and seed

should not produce different inferences

Research Problem

Question: How can we run interacting, parallel MCMC algorithms such that:

  1. Results are deterministic
  2. Runs are exactly reproducible
  3. Statistical guarantees remain unchanged

Key Challenge

  • Chains exchange information during inference
  • Parallel execution introduces non-deterministic scheduling
  • Standard RNGs are stateful

Stateful RNGs depend on:

  • Number of previous draws
  • Execution order

In parallel MCMC, execution order varies, leading to different random draws and different results.

Stateful Random Number Generation

A stateful RNG produces a deterministic sequence given a seed.

  • States: \(s_0, s_1, s_2, \dots\)
  • State update: \(s_{t+1} = h(s_t)\)
  • Output: \(u_t = g(s_t)\)

Example sequence:

\[ \text{seed} \to s_0 \to (u_0, s_1) \to (u_1, s_2) \to \dots \]

(See [11]; [12])

Solution

Stateless Random Number Generation

Generate randomness as a pure function of algorithmic context:

\[ u = f_{\text{RNG}}(\text{seed}, \text{chain ID}, \text{iteration}, \text{variable index}) \]

Random numbers depend only on their role in the algorithm, not on execution history.

Result: Bitwise-identical results across machines, thread counts, and execution orders.

Implementation

Goal: Generate random numbers as a deterministic, one-to-one function of structured inputs.

Each draw is keyed by:

\[ (\text{seed}, \text{chain ID}, \text{iteration}, \text{variable index}) \]

Each random draw is uniquely determined by its algorithmic role. (See [13])

Example

Chain ID Iteration Variable Index Key
1 10 5 (seed, 1, 10, 5)
2 10 5 (seed, 2, 10, 5)
1 11 6 (seed, 1, 11, 6)
2 11 6 (seed, 2, 11, 6)

Mixing Functions

Integer Mixing Permutations

  • Deterministic mappings over fixed-width integers
  • Strong avalanche behavior
  • Bijective over the supported key space

\[ u = f_{\text{RNG}}(h) = \text{mix}(h) \]

Examples: SplitMix (See [14])

Full Implementation

  1. Key: for chain \(m\), iteration \(t\), index \(j\) \[ k_{m,t,j} = (\text{seed}, m, t, j) \]

  2. Encode to 64-bit integer: \(z_{m,t,j} = \text{encode}(k_{m,t,j})\)

  3. Stateless RNG (integer mixing): \[ u_{m,t,j} = f_{\text{RNG}}(z_{m,t,j}) = \text{mix}_{64}(z_{m,t,j}), \qquad u_t^{(m)} = (u_{m,t,1}, \dots, u_{m,t,N_t^{(m)}}) \]

  4. MCMC update: \[ x_{t+1}^{(m)} = \Phi_m\!\big(x_t^{(m)},\, X_{-m}^{(t)},\, u_t^{(m)}\big) \]

    where:

    • \(\Phi_m\): update rule for chain \(m\) (algorithm-specific, deterministic)
    • \(X_{-m}^{(t)}\): states of all other chains at iteration \(t\) (interacting MCMC)
    • \(u_t^{(m)}\): vector of random numbers used at iteration \(t\)

(See [13] for details.)

Limitations and Future Work

1. Explicit randomness required (no hidden RNG)

  • All randomness must be: explicitly generated, indexed, and under your control.
  • Doesn’t work for:
    • Implicit randomness (e.g., library calls that use internal RNGs)
    • External processes (e.g., calls to black-box simulators)

2. Deterministic communication only (no randomness in control flow)

  • Applies only when inter-chain communication is deterministic
  • Examples of non-deterministic communication:
    • Random subsampling of chains
      • e.g. randomly selecting which chains exchange states at each iteration
    • Stochastic aggregation
      • e.g. combining chain statistics using random weights or Monte Carlo estimates

References

[1]
H. Markowitz, “Portfolio selection,” The Journal of Finance, vol. 7, no. 1, pp. 77–91, 1952.
[2]
J. Jobson and B. Korkie, “Estimation for markowitz efficient portfolios,” Journal of the American Statistical Association, vol. 75, no. 371, pp. 544–554, 1980.
[3]
R. M. Neal, “Probabilistic inference using markov chain monte carlo methods,” Technical Report, Dept. of Computer Science, University of Toronto, 1993.
[4]
C. P. Robert and G. Casella, Monte carlo statistical methods, 2nd ed. Springer, 2004.
[5]
N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” Journal of Chemical Physics, vol. 21, no. 6, pp. 1087–1092, 1953.
[6]
W. K. Hastings, “Monte carlo sampling methods using markov chains and their applications,” Biometrika, vol. 57, no. 1, pp. 97–109, 1970.
[7]
A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin, Bayesian data analysis, Third. Boca Raton, FL, USA: CRC Press, 2013.
[8]
S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, Handbook of markov chain monte carlo. CRC Press, 2011.
[9]
C. J. Geyer, “Markov chain monte carlo maximum likelihood,” Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, pp. 156–163, 1991.
[10]
J. S. Liu, Monte carlo strategies in scientific computing. Springer, 2001.
[11]
D. E. Knuth, The art of computer programming, vol. 2: Seminumerical algorithms. Addison-Wesley, 1997.
[12]
M. Matsumoto and T. Nishimura, “Mersenne twister: A 623-dimensionally equidistributed uniform pseudorandom number generator,” ACM Transactions on Modeling and Computer Simulation (TOMACS), vol. 8, no. 1, pp. 3–30, 1998.
[13]
S. Syed, S. Nand, G. Deligiannidis, and A. Doucet, “Non-reversible parallel tempering: An embarrassingly parallel MCMC scheme,” Journal of the Royal Statistical Society Series B, vol. 84, no. 2, pp. 321–350, 2024, Available: https://sidnand.github.io/assets/pdfs/publications/non_reversible_pt.pdf
[14]
G. L. Steele, “SplitMix64: Fast pseudo-random number generator.” 2014.