The Lazy Programmer

April 5, 2009

A Python snippet for reading binary data

Filed under: Programming,Python — ferruccio @ 7:31 pm
Tags: ,

I’ve been experimenting using Python to read data from binary files and started to notice the following pattern in my code.

  1. Read a block of binary data.
  2. Use struct.unpack() to break out individual fields.
  3. Create a dictionary from those fields using the appropriate key names.

So, let’s suppose I needed to read a block which contained a name (20 characters), an age (unsigned integer) and a salary (float). The code might look something like this:

import struct

(name, age, salary) = struct.unpack("< 20s I f", src.read(28))
record = {'name' : name, 'age' : age, 'salary' : salary}

That’s not bad, but it starts getting tedious to have to calculate the size of the block and construct the dictionary manually each time, so I wrote a small function to do all that automatically:

import struct

def readStruct(src, format, names):
    s = {}
    for nv in zip(names, struct.unpack(format, src.read(struct.calcsize(format)))):
        s[nv[0]] = nv[1]
    return s

The code to read the record from the previous example then becomes:

record = readStruct(src, "< 20s I f", ('name', 'age', 'salary'))

It’s a small change, but it has made working with binary files easier and less error-prone. The next step will be to combine the format and names parameters into a single format string that has the names embedded in it. We’re going to use regular expressions to achieve that goal.

The updated code is:

import struct
import re

def readStruct(src, format):
    rex = re.compile("\{([a-zA-Z_]+)\}")
    names = rex.findall(format)
    format = rex.sub("", format)
    s = {}
    for nv in zip(names, struct.unpack(format, src.read(struct.calcsize(format)))):
        s[nv[0]] = nv[1]
    return s

and the previous example now becomes:

record = readStruct(src, "< {name} 20s {age} I {salary} f")

Which, I think is a lot easier to write and read. Note that you don’t have to interleave the formatting and name strings. You could just as easily have written the format string as “<20sIf{name}{age}{salary}” but I think interleaving them makes their use much clearer.

Advertisements

1 Comment

  1. Nice code. My only comment is that I would add a check that the format had the same number of fields/names, since zip() silently clips the longer list, and you will have a mismatch between the name and field assigned. For example:

    import struct
    import re
    
    def readStruct(src, format):
        rex = re.compile("\{([a-zA-Z_]+)\}")
        names = rex.findall(format)
        format = rex.sub("", format)
        fields = struct.unpack(format, src.read(struct.calcsize(format)))
        if len(fields) != len(names):
            raise ValueError("Mismatch in lengths of struct format and field names")
        s = {}
        for name, field in zip(names, fields):
            s[name] = field
        return s
    

    Comment by Jared.Grubb — April 6, 2009 @ 3:10 pm


RSS feed for comments on this post.

Create a free website or blog at WordPress.com.

%d bloggers like this: