Release of CmdStan 2.29

We are very happy to announce that the 2.29.0 release of CmdStan is now available on Github! As usual, the release of CmdStan is accompanied by new releases of Stan Math, core Stan and Stanc3. A PyStan update will follow shortly.

This new release brings a new differential-algebraic equation solver, new functions, function overloading, Stan-to-C++ compiler optimization levels and new optimizations, automatic promotion of arrays of scalars, new deprecations, improved error messages, auto-formatter and canonicalizer.

Install instructions are provided at the end of the post.

Contributors

We would like to thank everyone that contributed to this release with their bug reports, feature or bug fix discussions, code or code reviews!

In total, we had code and docs contributions from 22 developers and users in this release cycle, including 2 first time contributors. Thank you to each and everyone for your contributions!

Sponsors and donators

We would like to express a big THANK YOU to all of our sponsors and donators that have been supporting us during this last release cycle and in years past.

You can support Stan via NumFocus or the Github sponsorship program.

Release highlights

Differential-Algebraic Equation solver

Stan now supports solving differential-algebraic equation systems (DAEs). DAEs can be considered as an extension of the concept of ordinary differential equations (ODEs), so that the system may contain algebraic equations that constrain the state variables and state derivatives, and the state derivatives may not be explicitly expressed as the right-hand-side of ODEs. Instead, the relationship of state variables and state derivatives could be implicitly expressed in a residual function. Similar to ODE solvers, the DAE solver dae and dae_tol support variadic signature.

This can be done using two new higher-order functions: dae() and dae_tol(). The interface of the new functions is similar to the one for ODEs. For more details see:

New functions and function signatures:

  • Von Mises CDF functions: von_mises_cdf, von_mises_lcdf and von_mises_lccdf.
  • Log-logistic distribution: loglogistic_lpdf, loglogistic_log, loglogistic_rng and loglogistic_cdf.
  • Inverse of the complementary error function: inv_erfc.
  • [RNG function for the Bernoulli-logit generalized linear model](https://mc-stan.org/docs/2_29/functions-reference/bernoulli-logit-glm.html#stan-functions-2): bernoulli_logit_glm_rng
  • Additional ordered_probit_lpmf signatures:
    • ordered_probit_lpmf(array[] int, real, vector) => real
    • ordered_probit_lpmf(array[] int, real, array[] vector) => real
  • Additional normal_id_glm signatures:
    • normal_id_glm_lpdf(real, matrix, real, vector, real) => real
    • normal_id_glm_lpdf(real, matrix, vector, vector, real) => real
    • normal_id_glm_lpdf(vector, row_vector, real, vector, real) => real
    • normal_id_glm_lpdf(vector, row_vector, vector, vector, real) => real
  • Additional signatures for lchoose. Now matching those supported by the deprecated binomial_coefficient_log.

Function overloading

User-defined functions can now be overloaded and can overload core Stan functions. Multiple definitions of the same function name are allowed if the arguments are different in each definition.

Example of an overloaded function:

functions {
   real foo(row_vector p, real a) {
       return sum(p) + a;
   }
   real foo(vector p) {
       return sum(p) + 1;
   }
}

An example of an overloaded core stan function is

functions {
   array[] row_vector transpose(array[] vector a) {
       // ...
   }
}

See the new section on overloading functions in the Stan User’s Guide for more.

Stan compiler optimization levels

Stan-to-C++ compiler has had an experimental feature to optimize the model before compiling it to C++. With the 2.29 release, the optimization were split into 3 levels: --O0, --O1 and --Oexperimental. --O0 disables optimization and is currently used by default. --O1 uses optimizations that are simple, do not dramatically change the program, and are unlikely to noticeably slow down compile times are applied. These optimizations include dead code elimination, copy and constant propagation, automatic-differentiation level optimization and detection of opportunities to represent parameter vectors and matrices as structs-of-arrays (see below).

Finally, with --Oexperimental the Stan compiler will use all available optimizations, some of which are not thoroughly tested.

An example of a simple but powerful optimization with the --O1 flag. The call

target += bernoulli_logit_lpmf(y | mx * beta);

will automatically be replaced with a call to the bernoulli-logit GLM function:

target += bernoulli_logit_glm_lpmf(y, mx, 0, beta);

Which is a much more performant way of writing the exact same model.

A web demo of the optimization is available here. There you can the changes to the intermediate representation of your Stan model caused by the optimizations.

With CmdStan, you can enable these optimization by adding

STANCFLAGS += --O1

to the make/local file. With CmdStanR and CmdStanPy you can use the stanc_options argument:

# cmdstanr
mod <- cmdstan_model("model.stan", stanc_options = list("O1")) 

#cmdstanpy
mod = CmdStanModel(stan_file=“model.stan”, stanc_options={“O1”:True})

New optimization to better utilize vectorization and memory throughput

By default Stan uses a so-called Array-of-Struct approach of representing vectors or matrices of parameters, meaning that the value and adjoint of each element of a container are stored next to each other in memory. The opposite approach is that the values for all parameters in the container are stored close to each other and then the adjoints in a similar fashion separately. This can be used when the vector or matrix is used in a vectorized way, and can vastly improve efficiency.

The Stan Math library, which Stan uses for automatic differentiation, has supported this new way of storing containers of parameters for a few versions now, but it has not been exposed to Stan users. With this release, Stan users are able to utilize this for models as well. This optimization can be turned on by using the --O1 stanc3 flag.

For more backstory on this topic see the design doc: https://github.com/stan-dev/design-docs/blob/master/designs/0005-static-matrices.md#summary

Automatic promotion of array of scalars

Users can now call functions that require array[] real with integer arrays – array[] int. The types are automatically promoted in the call to the function. Similarly, arrays of reals or integers can be used with functions that expect arrays of complex values.

Users need to take care when combining these promotions with function overloading. See the User’s Guide section for more.

Deprecations

Starting with this release, the Stan compiler issues warnings when using functions or features of the Stan language that are deprecated. The warnings also note if the deprecated feature/function is scheduled to be removed and when that will occur.

Notable functions and features that will be removed in the next years January release (which will most likely be the 2.32 release):

  • The old array syntax:
// old and deprecated syntax
int a[5];
real b[4];
vector[3] c[2];

// non-deprecated array syntax
array[5] int a; 
array[4] real b;
array[2] vector[3] c;

This change is required to enable the introduction of new features like tuples.

  • Using reserved words array, upper, lower, offset and multiplier. The use of these names as variable name will not be allowed in future versions. Please make sure you replace these reserved words in time.
  • Assignment with <- and commenting the code with #.

Models can automatically be updated to use non-deprecated functions/features by using the canonicalizer (see below for more).

Improved user-facing error and warning messages:

  • More informative error messages for ODE solvers.
  • When an unknown identifier is encountered Stan will suggest nearby known names you might have meant.
  • When a user tries to declare a function argument as a constrained type, the compiler will produce a more informative error.
  • More informative messages in case of incorrect variable declarations.

Improved auto-formatting and canonicalizer

  • Users can pass --max-line-length=# when auto-formatting to customize the line length.
  • Canonicalizer adds brackets around single state statements in if-else/for/while.
  • Modular canonicalizer: users can separately canonicalize for deprecations, braces, and/or parenthesis. Use

An example of an input to the auto-formatter:

transformed data{
real m = 0;
for(i in 1:5)
m += 1.1* i;
}
parameters {
real
y; // comment
}
model {
y
~ std_normal();
}

The output:

transformed data {
  real m = 0;
  for (i in 1 : 5) 
    m += 1.1 * i;
}
parameters {
  real y; // comment
}
model {
  y ~ std_normal();
}

An example of an input to the canonicalizer:

parameters {
  real y[5];
  real x;
}
model {
  for (i in 1 : 5)
    target += normal_log(x, y[i], (0.5 + 1));
}

The output:

parameters {
  array[5] real y;
  real x;
}
model {
  for (i in 1 : 5) {
    target += normal_lpdf(x | y[i], 0.5 + 1);
  }
}

An online demo of the auto-formatter is available here and for the canonicalizer here. Some form of this demo will be available on the main Stan website in the future. Note that this demo compiles and formats everything locally (using Javascript in your browser), your model is not sent to a server.

Miscellaneous

  • Support for standalone function definitions – .stanfunctions The compiler can now compile or format standalone function definitions in a .stanfunctions file. These are compiled as if a normal Stan program was compiled with stanc3 --standalone-functions and can be used with #include statements in the functions block.
  • Upgraded Sundials to 6.0.0.

Stan Saplings: A Preview of Projects in Development

Below is a partial list of the many exciting projects currently being worked on in Stan. Stan is maintained and developed by volunteers, if any of these projects are of interest to you, please come join us in building the next version of Stan (Github, Discourse forums, Twitter)!

Complex Number Support (contact Brian Ward)

Stan’s complex number support works for complex scalars and arrays at the level of the language and math library! We’ve also added support for covariant typing, meaning that we can now assign int to real to complex , and that also works for arrays.

To do

  • Add support for vectors and matrices, which requires work in the parser and code generator and also in the math library for polymorphic arithmetic (e.g., multiplying a real and complex matrix) and covariant typing.
  • There is work to do on extending the C++ Eigen library to low-level BLAS level of functions to support.
  • We could also use an actual complex-number based application for our user’s guide.
  • After we get vectors and matrices, we want to add support for complex linear algebra (e.g., Schur decomposition and asymmetric eigendecomposition), and for fast Fourier transforms. Pretty much every single scalar and arithmetic funciton could be productively specialized for complex numbers. It’d be nice to think about the equivalent of var_matrix on the real side for the complex case.
  • There is remaining work to do to generate actual complex numbers in the Python, R, and other interfaces rather than just returning real and complex components.
  • There is a design document and issues and branches in progress in the language lib, the math library, and the interfaces.

Tuples in Stan (contact Brian Ward)

  • Tuples PR is up to date with Stan 2.29 and work is progressing on getting them into Stan

To do

  • Lots of IO work
  • C++ handling of tuples
  • Updating docs for tuples
  • Testing

Quantile functions (contact Andrew Johnson)

To do

  • Implement more foundational functions: inverses of the gamma_p and gamma_q functions

Packaging cmdstan for linux (contact Andrew Johnson)

  • Introducing flags to link Math headers against user-specified dependencies (rather than those provided, PR)

To-Do

  • Implement same approach for Stan headers
  • Add compiler flag to default these paths to system locations

User-defined gradients (contact Andrew Johnson)

Very early stages. Prototype framework for using gradient functions is being tested in this Math library PR

To-Do

  • Expand testing across multiple types of function inputs and outputs
  • Decide the user interface for the Stan language

How to install the new release?

Download the tar.gz file from the link above, extract it and use it the way you use any Cmdstan release. We also have an online Cmdstan guide available at https://mc-stan.org/docs/2_29/cmdstan-guide/

If you are using cmdstanpy you can install using

cmdstanpy.install_cmdstan()

With CmdStanR you can install using

cmdstanr::install_cmdstan(cores = 4)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s