Neither one nor Many

 
April 8 2013

Faster than random reads that is. I knew this to be true for harddisks and SSD drives because of this presentation. (For average devices: Random I/O 2.7 MB/s vs. Sequential I/O of 213 MB/s for HDD's, 60-300 MB/s vs. 293-366 MB/s for SSD's).

But I never realized it was similarly true for RAM access, or that the impact would be this big. Now that I do however, I realize where I must have destroyed performance in the past.. and why something else surprised me at the time in being unexpectedly fast!

SFML fast pixel read/write access!

Anyway, my goal was finding the fastest possible read and write access for pixels inside an image. In this case I used SFML and desired for no overhead by getters or setters (providing f.i. bounds checking). Just raw access.

Found out you can get a (read only) pointer to the underlying pixel data for an sf::Image object (that I force non-const):

sf::Image image;

image.create(1280, 720, sf::Color(0, 0, 0, 255));

auto *pixels      = const_cast<sf::Uint8 *>(image.getPixelsPtr());

Wrote a simple loop to initialize an image with fancy colors (you can probably guess where this is going.. img1):

for (int x = 0; x < width; x++) {
    for (int y = 0; y < height; y++) {
        int index = (x + y * width) * 4;
        // My trademark color setup
        int index = x + y;
        pixels[index + 0] = index         % 255; // R
        pixels[index + 1] = (index + 100) % 255; // G
        pixels[index + 2] = (x     + 200) % 255; // B
        pixels[index + 3] = 80;                  // A
    }
}

In the mainloop I have similar code: for each pixel, increment the RGB color values a bit. You can view the code in the screenshot a few paragraphs from now. The result was 42.65 FPS (frames per second).

Measuring FPS every 0,5 seconds 30 times, results in this average of 42.65 fps with a Standard Error of 0.08. See [1] in the following table.

Table: performance results
[1] [2] [3]
N 30 N 30 N 30
Mean 42.6518 Mean 122.4701 Mean 125.8626
S.E. Mean 0.0801 S.E. Mean 0.2189 S.E. Mean 0.3322
Std. Dev 0.4387 Std. Dev 1.1991 Std. Dev 1.8193
Variance 5.5810 Variance 41.6968 Variance 95.9866
Minimum 42.1456 Minimum 119.8428 Minimum 120.3156
Maximum 44.7471 Maximum 124.7525 Maximum 128.2051
Median 42.6357 Median 120.7921 Median 125.0000

The fix: Changing the order of pixel access

I don't have the fastest PC so initially I thought it wouldn't get that much faster, but then I ran the profiler and discovered the first write to the color values was extremely slow. Once the pointer was in position for the pixel however, successive writes to green (G) and blue (B) (of RGBA) are fast. This made me realize it was seeking for each pixel.

So I swapped the two for loops (to first Y then X), thus aligning the loop with the memory representation of the pixels to get the much better 122.47 FPS! (see [2]).

Another minor improvement was by making the intermediate "index" variable obsolete (see [3]).


Figure: Visual studio 2012's awesome profiler output.

Note that you don't really need two for loops if you don't do stuff with the colors related to x or y.

This fix may seem obvious now, but for me this mistake of swapping the for loops was one easily made. I found it subtle and it resulted in unnecessarily poor performance. That's why I hope others to find this reminder useful!


Figure: If you didn't want poor performance, you shouldn't have swapped the for loops! (image source: here)

Also, SFML stores the RGBA values this way, other libraries may do so differently.

Blog Comments (0)


Leave a Reply

Comment may not be visible immediately, because I process everything manually.**

**) I plan to automate this.., but it's on my ToDo since for ever..


Author:
Ray Burgemeestre
february 23th, 1984

Topics:
C++, Linux, Webdev

Other interests:
Music, Art, Zen