A while back, I started building a PDF parser in C++. I had been using the Adobe PDF IFilter to extract text from PDF files in order to index the content, but I wanted to be able to be able to also extract formatting information so I dug into the PDF format. The PDF format itself is fairly easy to parse, but the contents can be quite complex.
The PDF format consists of a series of objects, expressed in a simple syntax based on PostScript. There are primitives such as strings and numbers, and there are collections (arrays and dictionaries) which can contain both primitives and containers. You can see how things quickly become complicated when you have dictionaries containing arrays containing other complex objects.
So, I built a parser which would collect a stream of objects into a data structure for further processing. Since there was no way to know ahead of time the shape of the final structure, I used Boost’s variant library to contain each object. An object could be a primitive type or a shared pointer to an STL vector or map, depending on the underlying structure. The STL collections contained the same object class, so the resulting structure could be arbitrarily complex.
After I got the basic parser working, I realized that I had implemented a dynamic typing system for C++ . . . poorly. Even though you could build these complex and dynamic data structures at runtime, the syntax necessary to manipulate them was a bit ugly. For example, if you had an object (named ‘o’) that happened to be an array and you wanted to access the second element, which happened to be a string the syntax would be:
std::string s = (*o);
Not exactly horrible, but it could be cleaner.
It also seemed like a good project on its own, so I tore it out of the PDF parser and rewrote it. Twice. I called the class ‘var’ so that the syntax for declaring a dynamic object would look familiar to anyone used a dynamic language. i.e. you could say:
var name = "fred"; var age = 35;
But I made a mistake in trying to make var a subclass of boost::variant<…>. I had the same issues getting the usage syntax under control. On my second attempt, I made the variant a private member of the var class and explicitly defined every supported operator. This made it simpler and more intuitive to use. The previous example became:
std::string s = o;
Also, because I was focused on building a stand-alone dynamic type system rather than a data structure to support another project, I put a bit more thought into making object initialization easier. For example you can do this:
var a = new_array(1)(2)(2.5)("hello")(new_dict("name", "fred")("age", 35));
which creates a five element array named ‘a’ which contains two integers (1 and 2), a double (2.5), a string (“hello”) and a dictionary with two name-value pairs. Note that in a dictionary both the name and the value can be any type.
There is also a null type, represented by a $. The default value of any uninitialized var is $, but you can set it explicitly:
var name = $;
I picked $ because it seems to be supported by many C++ compilers as a valid character in a variable name, but is not frequently used. It stands out.
There is one issue with null values that I haven’t resolved. If you try to retrieve a dictionary value whose key does not exist, it will return a null value, but it is perfectly valid to place a null value into a dictionary. e.g:
var d = new_dict("a", "xxx")("b", $); cout << d["a"] << endl; // prints: xxx cout << d["b"] << endl; // prints: $ cout << d["c"] << endl; // prints: $
When operator returns $ on a dictionary, you don’t know if it was because the item didn’t exist or it existed but was $. What’s worse is that if it didn’t exist before calling operator, it would exist afterwards (and be set to $). This is a consequence of the fact that operator returns a var&. It has no idea if it’s being used as an l-value or an r-value, so it has to create the instance in case it is being used as an l-value. i.e.
d["e"] = 1;
You can down load dynamic-cpp from Google Code at: http://code.google.com/p/dynamic-cpp/
To use it, you need the following code:
#include "dynamic.h" using namespace dynamic;
Also, you need Boost. I’ve been using 1.37.0, but older versions should work as well. You need to build the project and link against the resulting static library. The code is platform independent but so far I have only Visual Studio project files. I will try to resolve that soon.
It comes with a suite of unit tests which will give you a better idea of its capabilities.
Let me know if you find something useful to do with it.