Stan Algorithms: Where to Start?

Originally published here: https://bayesianops.medium.com/stan-algorithms-where-to-start-e6103c778d12

We’re starting the work that’s involved with Reimplementing the Inference Algorithms in Stan. Before we can get to any reimplementation, we need to understand what’s going on. The first steps are to identify the entry point into the code, instantiate the existing tests, and take inventory of what’s there.

I have Michael Feathers’ book “Working Effectively with Legacy Code” open while I’m working on this. I feel like we’re at “Chapter 16. I Don’t Understand the Code Well Enough to Change It.” The goal of this post is to get us collectively past that point.

For some additional background, please see a series of posts written by Juho Timonen about Understanding the Stan Codebase: Part 1: Finding an entry point and Part 2: Samplers. The technical focus of this post is somewhere between the two.

If you’d like to run code rather than only inspect the code, jump down to the section called “Running the Tests” to get set up.

Note: the link to Michael Feathers’ “Working Effectively with Legacy Code” is an Amazon Affiliate link. I might make a few dollars off it if enough people click through. I’ve personally bought and handed out multiple copies of the book over the years.

The Starting Point: A Unit Test

We’ll start with the unit test for the default MCMC algorithm used in Stan. The default MCMC algorithm in Stan is:

template <typename Model> int hmc_nuts_diag_e_adapt(...)

located at src/stan/services/sample/hmc_nuts_diag_e_adapt.hpp. This function is responsible for running MCMC for a model that’s provided, with the arguments that control the MCMC algorithm including number of iterations, and output the iterations. The output of the function is handled through callbacks, so it’s all through side-effects. The configuration of “which model?” is not part of the responsibility of this function; this function is handed a valid model.

The unit test is located at src/test/unit/services/sample/hmc_nuts_diag_e_adapt_test.cpp and this can be run from a terminal from the Stan working directory:

> ./runTests.py src/test/unit/services/sample/hmc_nuts_unit_e_adapt_test.cpp
Unit test output for hmc_nuts_unit_e_adapt_test.

Breaking Down the Unit Test

Even with a familiarity with C++, if you’re not familiar with Google Test, the unit test code may look pretty foreign. I’ll try my best to demystify how everything connects within the test listed above.

Note: there is no int main() function in this test file. There is a separate C++ header file that is provided by Google Test that contains the int main() function that is linked in when compiling this test file. The Stan make process handles the compiling and linking and generates the executable test/unit/services/sample/hmc_nuts_unit_e_test.

1. Include Statements

The include statements are L1–6 in the test file. Line 4 includes the header file generated by the Stan compiler: #include <test/test-models/good/optimization/rosenbrock.hpp>.

Stan’s build process (using make) generates the C++ header file from the Stan program,rosenbrock.stan, automatically. For understanding this test, the model choice is not important. We just need to know that this header file is where the model is declared and defined. (What exactly is generated will be covered in future posts.)

2. Google Test: Test Fixture

Lines 8–17 define a test fixture. This provides variables and an initial state prior to each of the 4 tests.

class ServicesSampleHmcNutsDiagEAdapt : public testing::Test {
public:
ServicesSampleHmcNutsDiagEAdapt() : model(context, 0, &model_log) {}

std::stringstream model_log;
stan::test::unit::instrumented_logger logger;
stan::test::unit::instrumented_writer init, parameter, diagnostic;
stan::io::empty_var_context context;
stan_model model;
};

Most of it is straightforward and uses default constructors to create objects. stan::test::unit::instrumented_logger and instrumented_writer can be found in src/test/unit/services/instrumented_callbacks.hpp and provide a way to inspect the outputs programatically.

The variable stan_model model is the fully instantiated Stan program and its construction happens with the construction of the test fixture itself: model(context, 0, &model_log). The constructor is defined in the generated C++ and the arguments are 1) container for data, 2) random seed, 3) output messages from initialization. The Stan program defined in rosenbrock.stan contains no data; the empty_var_context is an object that derives from the right class and contains no data.

We can dig into the C++ interfaces more, but with the current information, we can get into the test itself. We just have to know that the model variable has the instantiated Stan program that we want to run MCMC on.

3. The First Test: check that it instantiates properly

Lines 19–55 contain the first test called call_count. In Google Test, we write assertions to verify behaviors. This test checks that output callbacks are called and the correct number of times.

Algorithm Parameters
Lines 20–37 define parameters that need to be passed to the function to control the MCMC algorithm:

unsigned int random_seed = 0;
unsigned int chain = 1;
double init_radius = 0;
int num_warmup = 200;
int num_samples = 400;
int num_thin = 5;
bool save_warmup = true;
int refresh = 0;
double stepsize = 0.1;
double stepsize_jitter = 0;
int max_depth = 8;
double delta = .1;
double gamma = .1;
double kappa = .1;
double t0 = .1;
unsigned int init_buffer = 50;
unsigned int term_buffer = 50;
unsigned int window = 100;

First Assertion
Lines 38–39 is our first assertion.

stan::test::unit::instrumented_interrupt interrupt;
EXPECT_EQ(interrupt.call_count(), 0);

Technically, we don’t need to include this test, but we include it just to make sure we start at 0. Every time the functor interrupt() is called, there is an internal counter that increments. The state can be retrieved using the call_count() function.

Call the Algorithm!!!
Line 41 calls the MCMC function, hmc_nuts_diag_e_adapt(...)!

int return_code = stan::services::sample::hmc_nuts_diag_e_adapt(model, context, random_seed, chain, init_radius, num_warmup, num_samples, num_thin, save_warmup, refresh, stepsize, stepsize_jitter, max_depth, delta, gamma, kappa, t0, init_buffer, term_buffer, window, interrupt, logger, init, parameter, diagnostic);

This line is where all the algorithmic work happens. We have provided the function with a model, arguments to control the algorithm, and a handful of callback objects to get the output back. This is the entry point!

Check Results
The rest of the test, lines 47–54, check the results of running the function.

EXPECT_EQ(0, return_code);

int num_output_lines = (num_warmup + num_samples) / num_thin;
EXPECT_EQ(num_warmup + num_samples, interrupt.call_count());
EXPECT_EQ(1, parameter.call_count("vector_string"));
EXPECT_EQ(num_output_lines, parameter.call_count("vector_double"));
EXPECT_EQ(1, diagnostic.call_count("vector_string"));
EXPECT_EQ(num_output_lines, diagnostic.call_count("vector_double"));

I won’t get into the details of these assertions right now; we can dig into them later. For right now, it’s important that we’ve gotten far enough to:

  1. set up the model
  2. run the algorithm code
  3. check the output

That’s plenty for now.

4. Tests 2–4

Tests 2 (parameter_checks), 3 (output_sizes) and 4 (output_regression) are structured identically to the first test!

We Found the Entry Point!

If you got this far, we found out how to instantiate the algorithm code. There’s a lot more we need to dig into going forward, but this is gets us to the starting point.

Running the Tests

Before getting into the depths of C++ interfaces, let’s start with getting tests instantiated and running.

Clone the Repo

git clone https://github.com/syclik/stan-algorithms.git

Change directories into the stan-algorithrms folder.

Update Git Submodules and Download the Stan Compiler

Option 1: Use the script in the repo

Note: this only works for Linux and Mac. If you’re on Windows, follow Option 2 below. (If enough people reach out, I’ll update the script to be cross-platform.)

./setup.sh

This does two steps, described below. If you’ve run this, skip down to the “Run the Tests” subsection.

Option 2: Manually update git submodules and download the Stan compiler

Step 1: update git submodules.

git submodule update --init --recursive

Step 2: download the v2.18.1 version of the Stan compiler

On Mac:

mkdir -p stan/bin
curl -L "https://github.com/stan-dev/stanc3/releases/download/v2.28.1/mac-stanc" -o stan/bin/stanc

On Linux:

mkdir -p stan/bin
curl -L "https://github.com/stan-dev/stanc3/releases/download/v2.28.1/linux-stanc" -o stan/bin/stanc

On Windows:

Download https://github.com/stan-dev/stanc3/releases/download/v2.28.1/windows-stanc and put in stan/bin/

Run the Tests

./test.sh

This will compile and run all the relevant algorithm tests. In this post, we looked at one of the many tests that are run using this script.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s