C: Power of 2 Array Sizes

I just picked up a 10-year old copy of O’Reilly’s Practical C, 3rd Edition from the library for some light reading. Mostly, I just wanted to see if I would learn anything (as a guage of my own C proficiency). I’m glad to announce that I learned almost nothing, until, that is, I read page 258, The Power of Powers of 2.

This page is in the section on optimization. Before I go further, I must say that I agree with the author, that when it comes to code optimization: don’t. With that aside, let’s proceed.

I have seen a lot of C arrays declared and allocated with a power-of-2 size (like 256):

int *values = (int*)malloc(256);

And I never really knew why coders used sizes of 256, instead of 200 or some other more “comfortable” number. I figured that it did something to make the code “better”, but I didn’t know exactly what. Today, I found the answer. Most C compilers will optimize this:

x = x * 8;

Into this:

x = x < < 3;[/cpp] The left shift operation requires fewer cycles than the multiply operation, so malloc() can get its job done faster when calculating how many bytes to allocate. The fact is also true for accessing elements of multi-dimensional arrays. Take, for example, the following matrix: [cpp]int matrix[256][256][/cpp] When accesing an element in this matrix array, say element (12,39), the compiler has to do the following computations: [cpp]base_address + 12 * 256 * sizeof(int) + 39 * sizeof(int)[/cpp] The 12 * 256 can be optimized to 12 << 8, making each array access a tiny bit faster. In code where this is happening a lot, it may even make a difference. I guess this also explains why there is no 3-byte integer type, while there are 2-byte and 4-byte integers. And there will likely never be a 6-byte integer either. Anyway, when it comes to optimization, there is so much voodoo and black magic in play that it's really hard to know what provides the most bang for its buck. I might not even be right on this account either. Please correct me if you know better.

Has one comment to “C: Power of 2 Array Sizes”

You can leave a reply or Trackback this post.
  1. http://Michael%20J%20M%20Thomson says: -#1

    I’m sorry to say, but you’re on the wrong track. The chief reason to request sizes from malloc() that are powers of two, is to reduce fragmentation in the way malloc() and free() etc. manage the heap. It has to do with the fact that often when you request memory areas with malloc() and then later free() them, instead of these functions getting the OS every single time to adjust how much heap is available (which would be incredibly expensive), free() will often instead make a note of the freed area in a list, the “free list”, that malloc() can then exploit when fulfilling later requests. malloc() searches that list for an unused area of the heap that is at least the size requested, and if it finds one and it is larger than requested, the size of the excess area will get put back in the list. Now, with all that in mind, it should make sense that the free list is much more useful when the sizes are _evenly_divisible_ by virtue of them being powers of two. E.g. a call to free(32) followed by two calls to malloc(16) will perform better than if the former was a call to free(30), or if the latter were two calls to malloc(20).

    Also worth pointing out, increasing e.g. a string’s capacity by multiplying by 2 each time (and starting with a power of 2 of course), always means far less calls to realloc() than if you were to increase it linearly.

    As for arrays, provided they’re on the stack, make them the size you actually need. There’s no performance gain to be had, the stack size will be increased once when the array comes into scope and decreased once when it leaves, regardless. I don’t think indexing them is really an issue, as you say compilers will optimize things whereever possible, and I doubt modern processors have much more trouble with integer multiplication (generally a shift followed by zero or more adds) than with integer shift.

    HTH. If not, try http://en.wikipedia.org/wiki/Dynamic_memory_allocation and the links from there, or google for Sun or IBM docs on the subject (I knew of at least one but can’t find it offhand).