The Lazy Programmer

June 7, 2010

An XML Writer for C++

Filed under: C++,XML — ferruccio @ 9:31 am
Tags: ,

I’ve worked on a couple of projects where XML files were generated via printfs or iostreams and this has always turned out to be a mistake. Maintaining such code can be tricky and you always spend an inordinate amount of time debugging the same type of problems over and over again. Usually a character with special meaning to XML (such as < or >) winds up not getting quoted correctly or you forget an end element tag.

The project I’m currently working on involves a search engine which returns its results in XML over an HTTP connection. I decided to use an XML writer rather than make the same mistake again. After looking around for an open source solution and not finding anything suitable for my needs, I decided to roll my own and place the project on Google Code.

To use xml-writer, simply drop xml_writer.h into your project directory and #include it as necessary. It is a very simple (less than 200 LOC) class. The accompanying unit tests also serve as documentation for its use.

Briefly, you instantiate an xml::writer object bound to an std::ostream. This allows you to send the resulting XML to a string (std::stringstream), a file (std::fostream) or any ostream compatible object. In my case, I created a stream object which writes directly to an HTTP connection.

You instantiate xml::element objects to actually create the XML element tags. An xml::element object has overloaded attr() methods to write element attributes and contents() to write element contents. When an xml::element object goes out of scope, it will write the necessary end element tag.

A simple example:

#include "xml_writer.h"
#include <sstream>
#include <iostream>

using namespace std;

int main()
{
    stringstream ss;
    xml::writer xw(ss);
    {
        xml::element record("record", xw);
        record.attr("name", "fred").attr("age", 35);
        record.contents("yabba dabba doo!");
    }
    cout << ss.str() << endl;
    return 0;
}

will result in the following output:

<?xml version=”1.0″ encoding=”utf-8″?><record name=”fred” age=”35″>yabba dabba doo!</record>

NB: the extra scope created to encapsulate the record element was necessary so that the xml::element destructor is called before the resulting string is output. xml::element should always be enclosed in some kind of local scope for this to work properly.

March 17, 2010

.NET DeflateStream/zlib compatibility

Filed under: /NET,Programming — ferruccio @ 3:37 pm
Tags:

This is just a quick post to describe how to format compressed data with the .NET DeflateStream class so that it can be read back in a C or C++ program using zlib. Compressing data with DeflateStream is extremely simple. The original code I was using looked like this: (more…)

Building Boost 1.42 with zlib 1.2.4 support

Filed under: Programming — ferruccio @ 7:09 am
Tags:

I am currently using Boost.IOStreams to read some compressed data out of a database. When I tried to user a zlib_decompressor object to deflate the incoming stream, it failed to compile. It turns out that you have to add zlib to the boost build process,  so I grabbed a copy of the latest zlib sources and made the necessary change to my boost build script (added -sZLIB_SOURCE=… so that boost.build could find the zlib sources).

Unfortunately the build failed. Bjam complained that it was missing a dependency (gzio.c). After a bit of trial and error with various boost.build options, I found the problem. I checked the zlib change log and it turns out that in version 1.2.3.9 they removed the gzio.c file from the zlib sources.  Unfortunately, this was a very recent change so boost.build has not yet been updated to reflect the current zlib. (more…)

March 2, 2010

Berkeley DB Viewer

Filed under: Database,Programming — ferruccio @ 11:27 am
Tags: , ,

I’m currently working on a project which uses Berkeley DB (BDB) as it’s data storage engine. I can’t say enough good things about BDB. It has proven to be a very fast and flexible way to store and retrieve data, it is very easy to use and the documentation is absolutely top notch.

One issue I ran into, though, is that there is no good way to examine the databases for debugging purposes. Initially, I used the provided db_dump command-line tool, which was fine for small databases. Db_dump dumps the entire contents of a database, which was OK when I was dealing with databases with only a few records. But now I am working with databases with thousands and soon millions of records. Db_dump just won’t do. (more…)

September 19, 2009

OpenDiff/SVN command line shortcut

Filed under: OSX,Programming,SVN — ferruccio @ 8:18 am
Tags: ,

For cross-platform projects, I switched from using XCode to TextMate and CMake. I found this to be a more productive environment for me but I miss the OpenDiff integration that’s built into XCode. Most of the time svn diff is all I need, but for more complex changes the visualization provided by OpenDiff makes life so much easier. (more…)

August 9, 2009

Dynamic C++ Update

Filed under: C#,Dynamic-Typing — ferruccio @ 2:51 pm
Tags: ,

I’ve been tinkering with my Dynamic C++ project on occasion in order to get it to build successfully under OSX without much luck. Most of it built just fine, but there were a bunch of places where the boost::variant::apply_visitor() function was giving me all sorts of grief.
The original problem was that I was passing an instance of a locally defined struct as the functor argument to apply_visitor(), such as:

unsigned int var::count() const {
    struct count_visitor : public boost::static_visitor<unsigned int> {
        unsigned int operator () (null_t) const { throw exception("invalid .count() operation on $"); }
        unsigned int operator () (int_t) const { throw exception("invalid .count() operation on int"); }
        unsigned int operator () (double_t) const { throw exception("invalid .count() operation on double"); }
        unsigned int operator () (string_t s) const { return s.ps->length(); }
        unsigned int operator () (list_ptr l) const { return l->size(); }
        unsigned int operator () (array_ptr a) const { return a->size(); }
        unsigned int operator () (set_ptr s) const { return s->size(); }
        unsigned int operator () (dict_ptr d) const { return d->size(); }
    };

    return boost::apply_visitor(count_visitor(), _var);
}

(more…)

June 15, 2009

Dynamic C++

Filed under: C#,Dynamic-Typing,Programming — ferruccio @ 10:01 pm
Tags:

A while back, I started building a PDF parser in C++. I had been using the Adobe PDF IFilter to extract text from PDF files in order to index the content, but I wanted to be able to be able to also extract formatting information so I dug into the PDF format. The PDF format itself is fairly easy to parse, but the contents can be quite complex.

The PDF format consists of a series of objects, expressed in a simple syntax based on PostScript. There are primitives such as strings and numbers, and there are collections (arrays and dictionaries) which can contain both primitives and containers. You can see how things quickly become complicated when you have dictionaries containing arrays containing other complex objects.

(more…)

May 28, 2009

<XAML fest>

Filed under: /NET — ferruccio @ 7:01 am
Tags:

I just finished XAML fest, a two day introduction to SilverLight, XAML and Expression Blend.  The event was held at Microsoft’s New England R&D  Center in Cambridge,  Massachusetts. The class centered around building a small web app using SilverLight. A lot of time was spent learning how to use Blend to build user interfaces.

Having spent a good portion of my career building Windows apps, I’ve had the opportunity to create UIs using the Win32 API, OWL, MFC, WTL and wxWidgets. I’ve dabbled in WPF but never did much with it since I’ve been spending most of my free time tinkering with Cocoa and Cocoa-Touch. What I really like about using XAML is that you can lay out an entire interface, including a lot of behavior without writing a single line of code.

(more…)

April 5, 2009

A Python snippet for reading binary data

Filed under: Programming,Python — ferruccio @ 7:31 pm
Tags: ,

I’ve been experimenting using Python to read data from binary files and started to notice the following pattern in my code.

  1. Read a block of binary data.
  2. Use struct.unpack() to break out individual fields.
  3. Create a dictionary from those fields using the appropriate key names.

(more…)

January 5, 2009

Returning multiple values from a function in C++

Filed under: C#,Programming — ferruccio @ 10:22 pm
Tags: , ,

Ideally all functions should return just one value. There are many times, however, when returning more than one value makes a function so much more convenient. The classic example of this convenience is file input. When we read data from a file we want to know two things: Did we reach the end of the file? and if not, what is the next piece of data from the file.

Sometimes, we can encode multiple return values into one. For many of us, the first C idiom we learned from K&R is processing input a character at a time:

int ch;
while ((ch = getchar()) != EOF) {
    // do character processing...
}

This works because the EOF macro was set to something outside the range of valid characters (usually -1). While this approach can work fairly well for simple cases, it quickly breaks down as the types we wish to return get more complex.

(more…)

Next Page »

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.