The Lazy Programmer

June 15, 2009

Dynamic C++

Filed under: C#,Dynamic-Typing,Programming — ferruccio @ 10:01 pm
Tags:

A while back, I started building a PDF parser in C++. I had been using the Adobe PDF IFilter to extract text from PDF files in order to index the content, but I wanted to be able to be able to also extract formatting information so I dug into the PDF format. The PDF format itself is fairly easy to parse, but the contents can be quite complex.

The PDF format consists of a series of objects, expressed in a simple syntax based on PostScript. There are primitives such as strings and numbers, and there are collections (arrays and dictionaries) which can contain both primitives and containers. You can see how things quickly become complicated when you have dictionaries containing arrays containing other complex objects.

So, I built a parser which would collect a stream of objects into a data structure for further processing. Since there was no way to know ahead of time the shape of the final structure, I used Boost’s variant library to contain each object. An object could be a primitive type or a shared pointer to an STL vector or map, depending on the underlying structure. The STL collections contained the same object class, so the resulting structure could be arbitrarily  complex.

After I got the basic parser working, I realized that I had implemented a dynamic typing system for C++ . . . poorly. Even though you could build these complex and dynamic data structures at runtime, the syntax necessary to manipulate them was a bit ugly. For  example, if you had an object (named ‘o’) that happened to be an array and you wanted to access the second element, which happened to be a string the syntax would be:

std::string s = (*o)[1];

Not exactly horrible, but it could be cleaner.

It also seemed like a good project on its own, so I tore it out of the PDF parser and rewrote it. Twice. I called the class ‘var’ so that the syntax for declaring a dynamic object would look familiar to anyone used a dynamic language. i.e. you could say:

var name = "fred";
var age = 35;

But I made a mistake in trying to make var a subclass of boost::variant<…>. I had the same issues getting the usage syntax under control. On my second attempt, I made the variant a private member of the var class and explicitly defined every supported operator. This made it simpler and more intuitive to use. The previous example became:

std::string s = o[1];

Also, because I was focused on building a stand-alone dynamic type system rather than a data structure to support another project, I put a bit more thought into making object initialization easier. For example you can do this:

var a = new_array(1)(2)(2.5)("hello")(new_dict("name", "fred")("age", 35));

which creates a five element array named ‘a’ which contains two integers (1 and 2), a double (2.5), a string (“hello”) and a dictionary with two name-value pairs. Note that in a dictionary both the name and the value can be any type.

There is also a null type, represented by a $. The default value of any uninitialized var is $, but you can set it explicitly:

var name = $;

I picked $ because it seems to be supported by many C++ compilers as a valid character in a variable name, but is not frequently used. It stands out.

There is one issue with null values that I haven’t resolved. If you try to retrieve a dictionary value whose key does not exist, it will return a null value, but it is perfectly valid to place a null value into a dictionary. e.g:

var d = new_dict("a", "xxx")("b", $);
cout << d["a"] << endl; // prints: xxx
cout << d["b"] << endl; // prints: $
cout << d["c"] << endl; // prints: $

When operator[] returns $ on a dictionary, you don’t know if it was because the item didn’t exist or it existed but was $. What’s worse is that if it didn’t exist before calling operator[], it would exist afterwards (and be set to $). This is a consequence of the fact that operator[] returns a var&. It has no idea if it’s being used as an l-value or an r-value, so it has to create the instance in case it is being used as an l-value. i.e.

d["e"] = 1;

You can down load dynamic-cpp from Google Code at: http://code.google.com/p/dynamic-cpp/

To use it, you need the following code:

#include "dynamic.h"

using namespace dynamic;

Also, you need Boost. I’ve been using 1.37.0, but older versions should work as well. You need to build the project and link against the resulting static library. The code is platform independent but so far I have only Visual Studio project files. I will try to resolve that soon.

It comes with a suite of unit tests which will give you a better idea of its capabilities.

Let me know if you find something useful to do with it.

Advertisements

9 Comments

  1. re: operator[] returning a var&, you could provide const versions of the same functions that the compiler will use if they are R-values.

    Comment by cam — June 16, 2009 @ 1:29 am

  2. cam,

    Thanks for the tip. When I was writing this post, it occurred to me that something might be possible along those lines. I’ll have to explore that later.

    Comment by Ferruccio — June 16, 2009 @ 6:08 am

  3. Great 🙂

    I was doing the very same thing but for the postscript language 😛

    Comment by beb0s — June 16, 2009 @ 1:22 pm

  4. ===========================================
    There is one issue with null values that I haven’t resolved. If you try to retrieve a dictionary value whose key does not exist, it will return a null value, but it is perfectly valid to place a null value into a dictionary.
    ===========================================
    Wouldn’t boost::optional solve this problem?

    Nice library btw 🙂

    Comment by AraK — September 24, 2009 @ 9:51 pm

  5. Hi, this looks neat, but what’s the difference between this and boost::any?

    Comment by Yang — November 1, 2009 @ 3:06 am

  6. Yang,

    I don’t believe a boost::any object can contain lists, sets and maps of heterogeneous objects. Otherwise, this library is just a wrapper around boost::variant.

    — Ferruccio

    Comment by Ferruccio — November 1, 2009 @ 6:46 am

  7. I’m not sure what you mean, but this works fine:

    #include <boost/any.hpp>
    #include <vector>
    #include <string>
    using namespace boost;
    using namespace std;
    int main() {
      any a = 3;
      any b = (const char *) "hello";
      vector<any> c;
      c.push_back(1);
      c.push_back((const char*) "hi");
      c.push_back(vector<any>());
      return 0;
    }
    

    It’s not quite as convenient as using the `var,` to be sure, but it gets you much of the way there, which begs the question – isn’t `any` a better fit for `var` than `variant`?

    Comment by Yang — November 1, 2009 @ 1:08 pm

  8. Yang,

    Sorry, WordPress ate the angle brackets in your code. I edited your code to reflect what I think you meant.

    You’re right in that it’s possible to achieve the same effect using boost::any and STL containers. In this case, I was building data structures whose shape was unknown until run-time. I was actually using boost::variant, std::list and std::map directly in my original code. Using var objects just made the code easier to work with.

    It may have been better to build this on top of boost.any rather than boost.variant. I generally prefer to use variant simply because it forces you to specify up front which types you will be storing in it.

    Comment by Ferruccio — November 1, 2009 @ 5:07 pm

  9. But you can certainly bind a vector, list, or map to an any as well, e.g. `any v = vector(); any_cast<vector >(v).push_back(0);`.

    By “forces you to specify up front which types you will be storing in it,” that implies `var` can store only certain types, not arbitrary types. I just checked out the source, and that seems to indeed be the case. This isn’t necessarily a bad thing – these cover a broad range of use cases, and allow for things like the stream operators `<>` to work with `var` (but require `any_cast` calls for `any`).

    Thanks for the library!

    Comment by Yang — November 1, 2009 @ 7:27 pm


RSS feed for comments on this post.

Blog at WordPress.com.

%d bloggers like this: