The Lazy Programmer

April 2, 2008

C++: How to create variable length arrays on the stack

Filed under: C#,Programming — ferruccio @ 9:12 pm
Tags: , ,

A very common pattern in C++ programming is allocating a variable size array in a function and releasing it before the function exits:

void function(int size)
{
   char* pbuf = new char[size];

   ... function body ...

   delete [] pbuf;
}

This works but is error-prone. Especially if there are multiple exit points or if an exception is thrown. You could use STL vectors, but you usually don’t need to change the size of the array once it has been created. Also, you may not want the heap allocation overhead.

I think that what we would all like to be able to do is:

void function(int size)
{
   char buffer[size];

   ... function body ...

}

Unfortunately, you can’t do that in C++. You can however, use the alloca() call that is part of the C runtime to allocate space on the stack. alloca() works by increasing the size of the current stack frame to make room for the allocation. Because it’s part of the current stack frame, the memory will be freed as soon as our function returns.

Now our code looks like:

void function(int size)
{
   char *pbuf = static_cast<char *>(alloca(size * sizeof(char)));

   ... function body ...
}

We get the benefit of the fast allocation. alloca() doesn’t need to scour the heap for a block of suitable memory, and automatic deallocation. Everything’s good. Right?
Well, no it isn’t. There are a few problems with this approach.

  • The allocation could be a little clearer.
  • You have to specify the number of bytes to allocate, leading to potential errors.
  • If you’re allocating space for objects, no constructor or destructor will ever be called.

Fortunately we can solve this problem using a combination of templates and the C preprocessor.

#include <malloc.h>

#define STACK_NEW(type,size) new(static_cast<type*>(alloca(sizeof(type) * size))) type[size], size

template <typename T>
class stack_array
{
public :
   stack_array(T* p, size_t n) : p_(p), n_(n) {}

   ~stack_array()
   {
      for (size_t i = 0; i < n_; ++i)
         p_[i].~T();
   }

   operator T*() { return p_; }
   size_t size() { return n_; }

   typedef T* iterator;
   T* begin() { return p_; }
   T* end() { return p_ + n_; }

private :
   T*       p_;
   size_t   n_;
};

You can then create an array of x chars with the following syntax:

stack_array<char> buffer(STACK_NEW(char,x));

How does it work?

You instantiate an array of T by using the stack_array<T> template. There is a single constructor which takes a pointer to a block of memory and an item count. The constructor merely records these values. The real magic is in the STACK_NEW macro.

The STACK_NEW macro takes a type and a size (number of items). It calls alloca() to make room on the stack for the new array and then uses a placement new to initialize the array. A placement new is a little known form of the new operator which does not allocate any memory but it does call constructors. It’s used when you want to allocate memory yourself but still want to be able to construct C++ objects in that memory.

Note: It’s tempting to put the alloca() call inside the constructor in order to make the declaration even simpler. i.e.

stack_array<char,x> buffer;

The problem is that once the constructor exits, the memory would immediately be freed. Making the constructor inline does no good. Inline is a suggestion to the compiler. It’s still free to do what it wants.

The operator T*() function provides an automatic conversion to the array item type so that you can use the stack_array anyplace you can use a pointer of that type. There is even a simple STL-like iterator provided along with begin() and end() functions.

The next major hurdle is destruction. This is why I wrapped this whole thing in a class. When the destructor for our stack_array object is called we explicitly call the destructor for every element in the array, ensuring its proper clean-up.

And this is how you would use it:

#include <iostream>
using namespace std;

// test object
class MyObject
{
public :
   MyObject()
   {
      value = 0;
      cout << "ctor @" << this << endl;
   }

   ~MyObject()
   {
      cout << "dtor @" << this << endl;
   }

   int value;
};

int main(int argc, char** argv)
{
   size_t x = 100;

   // equivalent to: char s[x];
   stack_array<char> s(STACK_NEW(char,x));

   strcpy(s, "this is a test");
   cout << s << endl;

   strcat(s, " -- appended");
   puts(s);

   x = 5;

   // equivalent to: MyObject o[x];
   stack_array<MyObject> o(STACK_NEW(MyObject,x));

   o[0].value = 1;
   o[1] = o[0];
   o[x-1].value = x;
   for (stack_array<MyObject>::iterator it = o.begin(); it != o.end(); ++it)
      cout << it->value << " ";

   cout << endl;
   return 0;
}

Running our little test program yields the following output:

this is a test
this is a test -- appended
ctor @0012FE94
ctor @0012FE98
ctor @0012FE9C
ctor @0012FEA0
ctor @0012FEA4
1 1 0 0 5
dtor @0012FE94
dtor @0012FE98
dtor @0012FE9C
dtor @0012FEA0
dtor @0012FEA4

I tried to make the stack_array template really simple. There are potentially many more things that could be added to it such as:

  • An overloaded operator[] that does index bounds checking.
  • The full blown STL treatment (const_iterator, reverse_iterator, clear(), etc.)
  • Use Boost type_traits to eliminate constructor/destructor calls if the array object type has a trivial constructor/destructor. (I suspect most compilers may do this anyway, but it’s always good to be explicit about such things)

If you do happen to make these or any other changes, please let me know.

Update: April 18, 2008: I’ve done some benchmarking on stack_arrays.
Update: April 24, 2008: After reading some of the comments here and on reddit, I realized that I should have made something clear: This technique is not intended to be a replacement for vectors. It is meant to be a safe(r) alternative to using a naked alloca().

About these ads

16 Comments

  1. Since this blog is “The Lazy Programmer”, here’s the lazy way:

    Use GNU G which already supports variable length arrays in C until C 0x is released which AFAIK adds official support for C99 VLAs.

    =====
    I would be very happy to use the VLA feature when it becomes available. Right now the compiler I use the most, Visual C ++ doesn’t have it. vector allocates from the heap. array allocates from the stack but can only create fixed-size arrays, it just provides STL semantics for standard C arrays.

    — ferruccio

    Comment by Jose Sanchez — April 3, 2008 @ 2:11 am

  2. Hrm, your blog ate the plus symbols and changed a zero into an “o” in my above comment. Lazy programming?

    =====
    I’m sorry about the way WordPress mangles code in the comments. It was a pain to format the code in the article too. Anyone care to recommend a better programmer-oriented blogging system?

    — ferruccio

    Comment by Jose Sanchez — April 3, 2008 @ 2:13 am

  3. What’s the point of the macro, exactly? The stack_array template could handle doing the alloca() in the ctor, why make it more difficult and uglier?

    =====
    The point of the macro is that alloca() must be called directly from the function doing the allocation, because as soon as that function returns and the its stack frame is discarded, the memory it allocated is freed. If you call it from the constructor, the memory will be freed as soon as the constructor completes. Not very useful behavior.

    — ferruccio

    Comment by James Longstreet — April 3, 2008 @ 2:52 am

  4. Since you don’t allow the size to change after construction how is this any different than just declaring char[size]? If this was a “variable length” RandomAccessContainer, then there would be push_back and resize functions… Right now it is just a boost::array in terms of capabilities.

    According to my (quick) tests, there is no real speed benefit either, although I could probably use a better clock. I have to do some actual work now, someone care to finish the testing???

    O3 nodebug output:
    :~/dev/container_test/Debug$ ./container_test
    array:0, stack_array: 10000, vector: 10000
    :~/dev/container_test/Debug$ ./container_test
    array:0, stack_array: 10000, vector: 10000
    :~/dev/container_test/Debug$ ./container_test
    array:0, stack_array: 10000, vector: 10000
    :~/dev/container_test/Debug$ ./container_test
    array:0, stack_array: 0, vector: 20000
    :~/dev/container_test/Debug$ ./container_test
    array:0, stack_array: 0, vector: 10000
    :~/dev/container_test/Debug$ ./container_test
    array:0, stack_array: 10000, vector: 10000
    :~/dev/container_test/Debug$ ./container_test
    array:0, stack_array: 0, vector: 20000
    :~/dev/container_test/Debug$ ./container_test
    array:0, stack_array: 10000, vector: 10000
    :~/dev/container_test/Debug$ ./container_test
    array:0, stack_array: 10000, vector: 10000

    test:
    #include
    #include
    #include

    #include
    #include
    #include

    #define STACK_NEW(type,size) new(static_cast(alloca(sizeof(type) * size))) type[size], size

    template
    class stack_array
    {
    public :
    stack_array(T* p, size_t n) : p_(p), n_(n) {}

    ~stack_array()
    {
    for (size_t i = 0; i < n_; ++i)
    p_[i].~T();
    }

    operator T*() { return p_; }
    size_t size() { return n_; }

    typedef T* iterator;
    T* begin() { return p_; }
    T* end() { return p_ + n_; }

    T operator[](const size_t i) { return p_[i]; }

    private :
    T* p_;
    size_t n_;
    };

    using namespace boost;
    using namespace std;

    #define test_size 250000

    clock_t test_array()
    {
    const clock_t t(clock());
    array c;
    generate(c.begin(), c.end(), variate_generator<mt19937, uniform_real >(mt19937(time(0)), uniform_real(0, 10000)));
    return clock()-t;
    }

    clock_t test_stack_array()
    {
    const clock_t t(clock());
    stack_array c(STACK_NEW(float, test_size));
    generate(c.begin(), c.end(), variate_generator<mt19937, uniform_real >(mt19937(), uniform_real(0, 10000)));
    return clock()-t;
    }

    clock_t test_vector()
    {
    const clock_t t(clock());
    vector c; c.reserve(test_size);
    generate_n(back_inserter(c), test_size, variate_generator<mt19937, uniform_real >(mt19937(time(0)/2), uniform_real(0, 10000)));
    return clock()-t;
    }

    int main()
    {
    clock(); //not sure if in a lib

    const clock_t at(test_array());
    const clock_t sat(test_stack_array());
    const clock_t vt(test_vector());

    cout << “array:” << at << “, stack_array: ” << sat << “, vector: ” << vt << endl;
    }

    =====
    To do a real performance test you would need to do a lot more memory allocations, intermingled with allocations that aren’t freed causing a memory fragmentation. Allocating from the heap is an O(n) operation, from the stack an O(1) operation.

    — ferruccio

    Comment by csharper — April 3, 2008 @ 5:26 am

  5. comment system destroyed the code above, see here: http://www.shorttext.com/a887az

    Comment by csharper — April 3, 2008 @ 5:28 am

  6. nice post. i’ve been looking for an example of this in the last few weeks. thanks. bookmarked :)

    Comment by vilaca — April 3, 2008 @ 7:33 am

  7. regarding the performance issue, apart from stack allocation being a O(1) operation, regarding the way x86 processores handle caching, you can gain some performance by having your code and data in near to each other (IMHO)

    Comment by vilaca — April 3, 2008 @ 7:39 am

  8. In all of your examples you fail to address exceptions and yet in your original code (using new) you complained that it isn’t safe with exceptions, that’s ironic. Making the original code expection safe is trivial with auto_ptr, your new code however is bit more complicated (handling exceptions in constructors and destructors while making sure the array stays consitent)

    =====
    My fault. I wasn’t very clear. I wasn’t trying to say my way is more exception-safe. What I meant was that if you allocate stuff on the heap, then an exception will cause that memory to be orphaned. If it’s on the stack, though, it’ll get freed when the stack is unwound. I don’t know if the destructors will be called but at least the memory will be freed.

    — ferruccio

    Comment by chaotic — April 3, 2008 @ 8:07 am

  9. ferruccio, you have a very valid point in that boost array needs a compile time size

    however, correct me if I am wrong, stack_array is not safe to return (copy) from a function?

    =====
    You’re absolutely right. You can’t safely return it. My goal was to copy the semantics of a fixed-size array on the stack, but add the flexibility to determine the size at run-time.

    — ferruccio

    Comment by csharper — April 3, 2008 @ 8:58 am

  10. While this seems clever, it reduces the portability of your code. From alloca’s man page:

    $ man alloca

    BUGS
    The alloca() function is machine and compiler dependent; its use is dis-
    couraged.

    =====
    So true. But it works on every platform my code needs to run on right now. :-)

    — ferruccio

    Comment by Anon — April 3, 2008 @ 10:51 am

  11. One nitpick regarding your destructor – it should destroy the objects in the array starting at the last element down to the first (destruction should occur in reverse order of construction).

    You should also take a look at STLsoft’s auto_buffer template.

    http://synesis.com.au/software/stlsoft/doc-1.9/classstlsoft_1_1auto__buffer.html

    It’s a bit different from what you do in that the stack allocated array has a default size, and if you ask for a larger array it allocates it from the heap. I find in practice that this works quite well. It has the advantage of not using any non-standard constructs and handles potentially stack blowing allocations by deferring those to the heap.

    Comment by mikeb — May 27, 2008 @ 2:35 am

  12. In alloca (unlike new) you have to treat a NULL return value when the stack frame is too small.

    Comment by Motti — November 3, 2008 @ 4:10 am

  13. Tried your example, but I was having problems… when the function leaves the stack, and the memory allocated by alloca was released, the debugger was signalling corrupted stack memory, just two bytes before the memory address generated by alloca…

    With same research I found out that when you call

    double *x = new double[5];

    the compile allocates 44 bytyes, not 40! In the previous 4 bytes, it writes the number of elements, so that delete[] can magically work.

    The same thing happens when you call new with a pointer
    new (ptr) classX[n].

    It writes n before ptr!

    If we did know it is always 4 bytes, then it would be easy to compensate, but fact is we do not know for sure. Is it a size_t? Is it an int? Is it platform dependent?

    So safest thing is to call new on the pointer returned from alloca in a loop, like you do for the destructor.

    Fabio

    Comment by Fabio — April 4, 2009 @ 1:43 am

  14. Actually my example is not entirely correct, but it still ok in principle.

    new double[5]

    allocates 8×5=40 bytes, but

    new myClass[5]

    if sizeof(myClass)==8 and has a destructor will allocate 8×5 4=44 bytes.

    Fabio

    Comment by Fabio — April 4, 2009 @ 2:23 am

  15. Thanks for the feedback, Fabio.

    I had never noticed that behavior. In that case you are right, looping through the new statements would be safer. By the way, what C++ compiler are you using?

    — Ferruccio

    Comment by Ferruccio — April 5, 2009 @ 5:43 pm

  16. […] By ferruccio In my last post I presented a stack_array template/macro combination that I use to create dynamically sized arrays […]

    Pingback by Benchmarking stack_array<T> « The Lazy Programmer — November 29, 2009 @ 11:58 am


RSS feed for comments on this post.

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: