Let me start by describing what I have here: two Raspberry Pi 5's 8Gb. Both of them from the same batch (Pi store in Cambridge UK, before people started receiving them at home). One has meaningful workloads running in the background, and the other is completely blank. Both exhibit the same results, baring statistical noise.
And you are right, I neglected the number of threads. The results are still weird, but here they come:
No SDRAM_BANKLOW=1. 4 threads, increasing the total memory size tenfold to 100G, 1G blocks:
No SDRAM_BANKLOW=1. 1 thread, total memory size 100G, 1G blocks:
With SDRAM_BANKLOW=1. 4 threads, total memory size 100G, 1G blocks:
With SDRAM_BANKLOW=1. 1 thread, total memory size 100G, 1G blocks:
I also tested with 2 threads. Now, I can certainly do this more systematically (and running multiple experiments in multiple workloads). I am not so concerned about the overall performance improvement of the Pi 5, as I am trying to understand the memory impact here:
I have my own interpretation of this data, but would love to hear your thoughts. As for the concept of realistically measuring this without synthetic benchmarks, I will refrain from commenting on the likes of Geekbench. Here's something tangible: how much time does it take for the PI (full of services) to boot:
NUMA enabled:
NUMA disabled:
I wouldn't read too much on the 3% uplift on the kernel (or the 0.4% in userspace). In my world this is statistical noise.
And you are right, I neglected the number of threads. The results are still weird, but here they come:
No SDRAM_BANKLOW=1. 4 threads, increasing the total memory size tenfold to 100G, 1G blocks:
Code:
sysbench memory --memory-block-size=1G --memory-total-size=100G --threads=4 --memory-oper=write runsysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)Running the test with following options:Number of threads: 4Initializing random number generator from current timeRunning memory speed test with the following options: block size: 1048576KiB total size: 102400MiB operation: write scope: globalInitializing worker threads...Threads started!Total operations: 72 ( 6.96 per second)73728.00 MiB transferred (7131.56 MiB/sec)General statistics: total time: 10.3370s total number of events: 72Latency (ms): min: 408.45 avg: 571.50 max: 607.71 95th percentile: 601.29 sum: 41148.34Threads fairness: events (avg/stddev): 18.0000/1.00 execution time (avg/stddev): 10.2871/0.05
Code:
sysbench memory --memory-block-size=1G --memory-total-size=100G --threads=1 --memory-oper=write runsysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)Running the test with following options:Number of threads: 1Initializing random number generator from current timeRunning memory speed test with the following options: block size: 1048576KiB total size: 102400MiB operation: write scope: globalInitializing worker threads...Threads started!Total operations: 100 ( 12.00 per second)102400.00 MiB transferred (12283.80 MiB/sec)General statistics: total time: 8.3350s total number of events: 100Latency (ms): min: 82.68 avg: 83.35 max: 85.03 95th percentile: 84.47 sum: 8334.53Threads fairness: events (avg/stddev): 100.0000/0.00 execution time (avg/stddev): 8.3345/0.00
Code:
sysbench memory --memory-block-size=1G --memory-total-size=100G --threads=4 --memory-oper=write runsysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)Running the test with following options:Number of threads: 4Initializing random number generator from current timeRunning memory speed test with the following options: block size: 1048576KiB total size: 102400MiB operation: write scope: globalInitializing worker threads...Threads started!Total operations: 86 ( 8.39 per second)88064.00 MiB transferred (8593.18 MiB/sec)General statistics: total time: 10.2470s total number of events: 86Latency (ms): min: 265.19 avg: 472.53 max: 719.95 95th percentile: 612.21 sum: 40637.52Threads fairness: events (avg/stddev): 21.5000/0.50 execution time (avg/stddev): 10.1594/0.06
Code:
sysbench memory --memory-block-size=1G --memory-total-size=100G --threads=1 --memory-oper=write runsysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)Running the test with following options:Number of threads: 1Initializing random number generator from current timeRunning memory speed test with the following options: block size: 1048576KiB total size: 102400MiB operation: write scope: globalInitializing worker threads...Threads started!Total operations: 84 ( 8.38 per second)86016.00 MiB transferred (8581.84 MiB/sec)General statistics: total time: 10.0216s total number of events: 84Latency (ms): min: 113.32 avg: 119.30 max: 122.29 95th percentile: 118.92 sum: 10021.14Threads fairness: events (avg/stddev): 84.0000/0.00 execution time (avg/stddev): 10.0211/0.00
Code:
| NUMA / Threads | 1 | 2 | 4 ||----------------|-----------|-----------|-----------|| Off | 12,283.80 | 9,162.35 | 7,131.56 || On | 8,581.84 | 9,208.26 | 8,593.18 || Gain/Loss | -30.14% | 0.50% | 20.50% |
NUMA enabled:
Code:
Startup finished in 3.482s (kernel) + 10.074s (userspace) = 13.556smulti-user.target reached after 10.052s in userspace.
Code:
Startup finished in 3.593s (kernel) + 10.134s (userspace) = 13.728smulti-user.target reached after 10.099s in userspace
Statistics: Posted by bytter — Sat Dec 07, 2024 12:24 pm