Polynomial parser in C++ using Boost.Spirit

Edit (07/07/2014): I’m sorry, the initial code compiled but didn’t work. It’s been updated now, along with the blog.

First of all, here is a link to the code. It depends on Boost.Spirit, which is a header-only part of the Boost library. It compiles, and it parses bivariate polynomials (e.g. xy^2 + 2x – 5y).

The input is a user-input string, and the output is a polynomial, which is represented internally as a vector of terms. A term has a coefficient and two exponents, one for each variable.

struct term
{
  boost::optional<int> coefficient;
  boost::optional<int> x_exponent;
  boost::optional<int> y_exponent;
};

So imagine the parser going over the string and processing each term individually, stuffing it into a vector. That’s exactly how the parser rules work.

The rules in the parser define how the user input is parsed, and we attach actions in the form of lambda functions to each rule to be executed when each part is parsed. It starts at the top with start:

 namespace phx = boost::phoenix;
 ...
 start = eps[_val = std::vector<term>()]
>> poly_term[phx::push_back(_val, qi::_1)]
>> *(
('+' >> poly_term[phx::push_back(_val, qi::_1)])
|
('-' >> negative_poly_term[phx::push_back(_val, qi::_1)])
)

First of all, rules have attributes, and the start rule’s attribute is std::vector of term‘s. The rule says that we should initialize with an empty vector, then expect to parse at least one poly_term (i.e. polynomial term).

The poly_term is another rule, whose attribute is term. eps is another rule that consumes no input and always matches.

Inside the brackets of each rule is a lambda function that does something with the attribute of the rule. For example, the rule for poly_term says to take the term, stored in the placeholder qi::_1, and stuff it into the vector that we initialized at the beginning.

A single polynomial term contains three pieces of information: the coefficient out in front, and the two exponents for each variable, and the next rule shows how to break it down.

 poly_term = eps[_val = term()]
>>
-int_[phx::bind(&term::coefficient, _val) = qi::_1]
>>
-x_term[phx::bind(&term::x_exponent, _val) = qi::_1]
>>
-y_term[phx::bind(&term::y_exponent, _val) = qi::_1]
;

First, we try to pick out the coefficient. int_ is a “primitive” Spirit rule that will pick up an int from the string being parsed. The action says that the int will be taken and assigned to the coefficient field of the term attribute of this rule.

Note that the coefficient is optional, as indicated by the dash outside of the parens. The Kleene operator in the start rule up above is another one, which matches any number of poly_terms. Here’s a complete list of the parser operators.

Somehow, it is quite beautiful to look at this code.

Diary #21: Knee deep in templates, long compiles

So CGAL makes pretty use of C++ templates, and the Summer of Code project that I am working on builds a Qt4 visualization of 2D arrangements, one of these CGAL data structures for building collections of 2D curves. I decided to make the individual UI and event handling components templated by the type of arrangement and specialize as needed, and I think it ended up saving me a lot of typing. The problem is the compilation time for the full demo is pretty substantial.

arr_demo_clean_build

A clean actually took 5 minutes using a parallel build on a 4-core system. What’s hard to see is that actually one core gets stuck compiling a certain few files, for example, ArrangementDemoWindow.cpp is a major culprit. This bad boy is responsible for instantiating ArrangementDemoTab, which is templated by currently six different types of arrangements. Each of these tabs instantiate another level of at least six callback components that handle various operations on arrangements. There is a Qt GraphicsItem subclass for visualizing the actual arrangement that is also templated based on the arrangements. Finally, each of the components make use of little utility classes that are templated based on types provided by the arrangement type.

Now, C++ has no choice but to generate code for implicitly instantiated classes on the spot. so it’s probably no surprise that the memory usage spikes up to 4.1 GB when it hits a “fat” class like ArrangementDemoWindow. Sure, ArrangementDemoWindow.cpp is 1200 lines of code, but that shouldn’t cause such a massive effect. Unless you are a compiler that needs to instantiate a wide swath of template classes on demand to do type checking and such. So the silent console fails to convey the chaos that is happening behind the scenes, and the point is that there is a lot of redundant compilation happening.

So, it’s become an absolute pain to try to do any debugging with the full demo. It means that a single line change in a fat class like ArrangementDemoWindow means you have no choice but to sit through the deep compilation of all those templated classes, even if none of them changed at all. I’m fed up with it. Currently, I have to write a smaller UI example that instantiates only one type at a time so that recompilation is not so ridiculous.

But all this really makes me ask, is all this really unavoidable? Couldn’t I explicitly instantiate a certain set of template classes, precompile them once, and save them to a library for linking later on? How can I indicate this to the compiler?

Actually, if we could make use of C++11 features, we can use the extern templates feature to introduce some modularity. For example, suppose you wanted to precompile a vector of ints type. You can put this in your IntVector.h:

    #include <vector>
    extern template class std::vector<int>;
    typedef std::vector<int> IntVector;

and this in your IntVector.cpp:

    #include "IntVector.h"
    // instantiates the entire class from the default template
    template class std::vector<int>;

Then you can include the header file whenever you want to use IntVector. Wherever you use it, the compiler will not generate any code like it normally does, but you will need to compile IntVector.cpp and link its object. Now imagine if you have a lot of templated classes that are tied together with dependent parameter types and collaborate closely. Then this can save a ton of compilation time.

Clang will do it, and I guess g++ should do it if you set it to c++11 mode. But as always, I need to support the older compilers, so that’s right out.

Getting started with CppSharp

CppSharp is one cool project that generates C# bindings for C++ libraries. It’s under active development now and I finally got around to getting a sample up and running on OSX. Here’s a running example from following the Getting Started guide on the github page.

First, you’ll need to set yourself up with Mono. I set mine up through a Homebrew keg, and it set me up with a 32-bit runtime. So that means we need to make sure that C++ library we interface with is compiled for 32-bits.

Here’s a sample library. Make it and extract the archive into the working directory. You’ll see the following structure get unpacked:

libsample
libsample/include
libsample/lib

Next, you’ll need to clone and build CppSharp. The Getting Started page is the quickest way to build it. In short, you

  1. Clone the particular revisions of LLVM and Clang into the tools subdirectory.
  2. Configure and build LLVM with CMake, enabling C++11 and libc++ standard library support by adding cache entries LLVM_ENABLE_CXX11 and LLVM_ENABLE_LIBCXX.
  3. Configure and build CppSharp. They use premake as their build system, which is a lot simpler to deal with.

The result of building CppSharp is a set of .dlls. The easiest thing is to copy all the resulting .dll files to the working directory of where the executable for your binding generating code will go. Otherwise, you will need to add the directory containing these libraries to MONO_PATH.

To generate your bindings for libsample, you implement ILibrary. Here’s a barebones example that compiles and runs. It assumes you’ve got the following in your working directory:

lib/Release_x32/
libsample/{include,lib}

The binding generator is Sample.exe. parses the sample library and spits out bindings in the out/ directory.

Then you can proceed to use your C++ assets from C# — TestSample.exe compiled in the barebones example above will show you how. You just have to make sure the .dylib is in the working directory or visible through LD_LIBRARY_PATH.

Now that I’ve got this up and running, I’m looking into experimenting with QtSharp. Now that I kind of see what’s going on, it looks like the developer has committed bindings for Qt5. I guess the path of least friction is to do the same rather than mess around with Qt4. I just built 32-bit Qt5 overnight last night and will be testing it shortly.

Custom alternative search path for a.vim

If you’re a C++ developer, you’ll spend a lot of time toggling back and forth between your header (.h) files and the corresponding implementation (.cpp) files. In object-oriented programming, classes will typically map one-to-one with the files — Widget objects will typically be split into Widget.h holding declarations and Widget.cpp holding the actual implementations. On the other hand, when you write templated code, it needs to go in the header file. So although the code is split up between .h and .cpp, they really should be viewed as a single unit. Typically, I’ll be coding in vim with multiple tabs, each tab holding a .h/.cpp pair split like so:

alternateHeaderImpl

Opening files can be a pain, especially when your code is tucked away in nested directory structures. You potentially waste a lot of time typing in the filename you want to open. It’s not so bad if you keep your .h and .cpp files together. In this case, you can use the a.vim (version 2.18 at the time of this writing) plugin. Open up the header file. Then, a simple :AV will open up the corresponding .cpp file in a new split pane window.

It turns out a.vim searches a handful of locations relative to the file from which you are toggling for the corresponding alternate file. For example, I’m in Widget.h, and to toggle to the .cpp file, a.vim will look in the current directory as well as ../src (it’s typical for some projects that build libraries to have separate src and include directories).

If your project’s directory structure is a little more complicated (as in the screenshot above where you have to go between src/CGAL_Qt4 and include/CGAL/Qt), you can add a new relative path to search for the alternate file. Put the following in either your user .vimrc or a project-specific .vimrc:

let g:alternateSearchPath = 'sfr:../../../src/CGAL_Qt4,sfr:../../include/CGAL/Qt'

Note that the search paths start with sfr: and are comma-separated.

Actually, setting your own g:alternateSearchPath will blow away the defaults, but you can patch a.vim to just append to the default search paths in a way you usually do with the PATH variable:

--- a.vim	2013-12-23 13:10:42.458540027 -0800
+++ a.vim.new	2013-12-23 13:10:59.270540049 -0800
@@ -108,6 +108,8 @@
 " a path in their [._]vimrc. 
 if (!exists('g:alternateSearchPath'))
   let g:alternateSearchPath = 'sfr:../source,sfr:../src,sfr:../include,sfr:../inc'
+else
+  let g:alternateSearchPath = g:alternateSearchPath . 'sfr:../source,sfr:../src,sfr:../include,sfr:../inc'
 endif
 
 " If this variable is true then a.vim will not alternate to a file/buffer which

Note the dot operator in this vim configuration language is used for string concatenation.

Wrap Fortran code to use in C++

There’s a lot of useful code in Fortran, and a lot of people actually prefer to work in Fortran, so as a C++ developer, it would be nice to be able to call their code from C++. Here’s a small CMake example for how to do it so you can follow along below: Download

I put the example together and tested it with the following versions of these tools:

  • g++ 4.7.3
  • gfortran++ 4.7.3
  • cmake 2.8.10.1

From a high level, the example does the following:

  1. Create a one-function Fortran library.
  2. Set up the interface to the Fortran library in C++ code.
  3. Link to the Fortran library and call the Fortran function.

Something to note is that all the parameters to Fortran function is a pointer type, even for non-array data types.

To demystify the part of the example that sets up the interface to Fortran code:

extern "C" 
{
    void average_(int *n, double *a, double *ave);
}

The extern "C" part tells the C++ compiler not to expect a mangled name but rather take the function name as it is. C++ renames function names at compile-time in order to support function overloading automatically, but this behavior should be disabled when interfacing with code from other languages like Fortran.

Note that the average function has an underscore appended to it, whereas it does not have it in the Fortran file where it is defined. This is default behavior of gfortran. I think I read that there is a flag to disable this behavior, but I think it’s nice to make it clear that it is a Fortran function.

Boost.Python hello world example using CMake

Somehow it seems I’m the only one among my labmates who is working mainly in C++/CMake and not Python. I want to do more than help them by discussing code — I want to share my code. So let’s use Boost.Python wrap our C++ library so it can be called from Python code. This tutorial is based on the Hello world example in the Boost.Python docs and uses CMake.

To begin with, here’s a mention of the versions of everything I’m using when I wrote this post so you can judge if stuff is still applicable:

  • Ubuntu 13.04
  • cmake version 2.8.10.1
  • gcc version 4.7.3
  • Boost 1.49 with Python component
  • Python 2.7.4

Here, we create a simple C++ library called libgreet, and we make a wrapper library called greet_ext. The wrapper library can be loaded as a Python module that exposes the function in the C++ library. The files are shown below, and the full example can be downloaded here. To use it, just download it and run these commands:

tar xf BoostPythonHelloWorld.tar.gz
cd BoostPythonHelloWorld
cmake .
make
./test.py

greet.h:

#ifndef GREET_H
#define GREET_H
char const* greet( );
#endif // GREET_H

greet.cpp:

#include "greet.h";

char const* greet( )
{
    return "Hello world";
}

greet_ext.cpp:

#include "greet.h";
#include <boost/python.hpp>;

BOOST_PYTHON_MODULE(greet_ext)
{
    using namespace boost::python;
    def( "greet", greet );
}
  • The parameter to BOOST_PYTHON_MODULE must match the name of the wrapper library we import in Python below. Note where greet_ext appears throughout the different files.

CMakeLists.txt:

cmake_minimum_required( VERSION 2.8 )

project( BoostPythonHelloWorld )

# Find necessary packages
find_package( PythonLibs 2.7 REQUIRED )
include_directories( ${PYTHON_INCLUDE_DIRS} )

find_package( Boost COMPONENTS python REQUIRED )
include_directories( ${Boost_INCLUDE_DIR} )

# Build our library
add_library( greet SHARED greet.cpp )

# Define the wrapper library that wraps our library
add_library( greet_ext SHARED greet_ext.cpp )
target_link_libraries( greet_ext ${Boost_LIBRARIES} greet )
# don't prepend wrapper library name with lib
set_target_properties( greet_ext PROPERTIES PREFIX "" )
  • The line with PythonLibs locates the python includes, which are necessary because the Boost.Python header pulls in a config file from there.
  • By default, CMake builds library targets with the lib prefix, but we actually want our wrapper library to be named greet_ext.so so we can import it with the name below.

test.py:

#!/usr/bin/python

import greet_ext

print greet_ext.greet()
  • This python script works for me if I put it in the same folder as the greet_ext.so wrapper library after it is built.

What I need to figure out
This is a nice, simple example to get started with, but I have to figure out more details.

  • How do I wrap functions that take/return C++ types? How about pointers and references to those types? How about global operators?
  • How does a wrapper library deal with C++ templates? Is it the case that you have to instantiate some of those template classes are just work with those when you’re in Python?
  • Is Boost.Python the right choice? There’s a fairly recent webpage that discusses SWIG versus Boost.Python, and I think I’ve made the right choice for my project.

I’m looking forward to working on this further. This probably will be one of the things I work on this coming Christmas break.

P.S. God WordPress is so annoying when it escapes your code when you toggle between visual and text editing modes. Note to self, don’t switch modes.

CMake C++ project template

Here’s a minimal CMake C++ project setup that I rolled together that reflects the project structure that I’ve been working with for a while now. Hopefully you can use it to get started on a project pretty quickly:

https://dl.dropbox.com/u/508241/wordpress/barebones.tar.gz

You’ll need to have a few things to get started.

  1. CMake
  2. Doxygen
  3. GoogleTest

CMake is the build system that generates the Makefiles after you specify the build targets and libraries. The convention is to have a CMakeLists.txt file for each subdirectory of your project, specifying any build targets along the way. The root CMakeLists.txt specifies the libraries, custom compiler flags, and subdirectories containing other CMakeLists.txt files. To build the project, do:


cd build/
cmake ..
make

Doxygen is the typical way to generate documentation in C++. If you just follow Javadoc-style comment convention, then you can generate pretty good documentation pretty trivially by just running doxygen from the root directory. This barebones setup is configured to scan the src and doc directories for documented source code and dedicated documentation files (*.dox).

GoogleTest is a unit testing framework that’s pretty similar to JUnit. The conventional use is to create a TestXXX.cpp file for every class XXX that you write, and add it to the test directory. In this project, there is an AllTests build target that will compile a driver program that will run all unit tests that it was built with.

On Ubuntu, you can grab the GoogleTest library files directly from the package manager:


sudo apt-get install libgtest-dev

The annoying thing though is that you’ll need to build the library yourself. There is a CMakeLists.txt file that you can configure and build the library with. You can copy the resulting library file to /usr/local/lib and you should be able to compile the unit test in this project template.