Spilling Your Guts to the Shell

April 24, 2022

Nowadays, most programming languages have some kind of read-eval-print loop, or REPL, which lets you enter expressions and immediately see the results of evaluating them. Using a REPL makes it easy to learn new features of the language or its libraries, since you can try out, say, a single function, without having to set up a whole program and its supporting infrastructure. Unfortunately, there’s no widely-used REPL for C++, which generally makes learning to use C++ libraries more difficult than learning to use libraries for other languages.

OpenFst, a C++ library for manipulating weighted finite-state transducers (WFSTs), solves the C++-has-no-REPL problem in a slick way: it exposes most of its functionality as standalone programs that can be invoked at the shell and composed together with pipelining. In essence, the shell is the REPL for OpenFst operations.

This approach works for OpenFst because:

WFST operations are generally pure functions, so it’s possible to wrap them in standalone programs.
There are enough interesting unary functions on WFSTs to make it useful to compose functions with the pipeline operator. Even those functions that aren’t unary can usually be specialized by providing a fixed value for all but one of the parameters. For example, a lot of speech and language processing tasks using WFSTs apply a fixed vocabulary or language model to an input defined as a WFST.