Neither one nor Many
Software engineering blog about my projects, geometry, visualization and music.
Faster than random reads that is.
I knew this to be true for harddisks and SSD
drives because of this presentation. (For average devices: Random I/O 2.7 MB/s vs. Sequential I/O of 213 MB/s for HDD
's,
60-300 MB/s vs. 293-366 MB/s for SSD
's).
But I never realized it was similarly true for RAM
access, or that the impact would be this big.
Now that I do however, I realize where I must have destroyed performance in the past.. and why something else surprised me at the time in being unexpectedly fast!
Anyway, my goal was finding the fastest possible read and write access for pixels inside an image. In this case I used SFML
and desired for
no overhead by getters or setters (providing f.i. bounds checking). Just raw access.
Found out you can get a (read only) pointer to the underlying pixel data for an sf::Image
object (that I force non-const
):
sf::Image image;
image.create(1280, 720, sf::Color(0, 0, 0, 255));
auto *pixels = const_cast<sf::Uint8 *>(image.getPixelsPtr());
Wrote a simple loop to initialize an image with fancy colors (you can probably guess where this is going.. ):
for (int x = 0; x < width; x++) {
for (int y = 0; y < height; y++) {
int index = (x + y * width) * 4;
// My trademark color setup
int index = x + y;
pixels[index + 0] = index % 255; // R
pixels[index + 1] = (index + 100) % 255; // G
pixels[index + 2] = (x + 200) % 255; // B
pixels[index + 3] = 80; // A
}
}
In the mainloop I have similar code: for each pixel, increment the RGB color values a bit. You can view the code in the screenshot a few paragraphs from now. The result was 42.65 FPS (frames per second).
Measuring FPS every 0,5 seconds 30 times, results in this average of 42.65 fps with a Standard Error of 0.08. See [1] in the following table.
[1] | [2] | [3] | |||
---|---|---|---|---|---|
N | 30 | N | 30 | N | 30 |
Mean | 42.6518 | Mean | 122.4701 | Mean | 125.8626 |
S.E. Mean | 0.0801 | S.E. Mean | 0.2189 | S.E. Mean | 0.3322 |
Std. Dev | 0.4387 | Std. Dev | 1.1991 | Std. Dev | 1.8193 |
Variance | 5.5810 | Variance | 41.6968 | Variance | 95.9866 |
Minimum | 42.1456 | Minimum | 119.8428 | Minimum | 120.3156 |
Maximum | 44.7471 | Maximum | 124.7525 | Maximum | 128.2051 |
Median | 42.6357 | Median | 120.7921 | Median | 125.0000 |
I don't have the fastest PC so initially I thought it wouldn't get that much faster, but then I ran the profiler and discovered the first write to the color values was extremely slow. Once the pointer was in position for the pixel however, successive writes to green (G) and blue (B) (of RGBA) are fast. This made me realize it was seeking for each pixel.
So I swapped the two for loops (to first Y then X), thus aligning the loop with the memory representation of the pixels to get the much better 122.47 FPS! (see [2]).
Another minor improvement was by making the intermediate "index" variable obsolete (see [3]).
Note that you don't really need two for loops if you don't do stuff with the colors related to x or y.
This fix may seem obvious now, but for me this mistake of swapping the for loops was one easily made. I found it subtle and it resulted in unnecessarily poor performance. That's why I hope others to find this reminder useful!
Also, SFML stores the RGBA values this way, other libraries may do so differently.