Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 4834

Advanced users • Re: NUMA Testing

$
0
0
Let me start by describing what I have here: two Raspberry Pi 5's 8Gb. Both of them from the same batch (Pi store in Cambridge UK, before people started receiving them at home). One has meaningful workloads running in the background, and the other is completely blank. Both exhibit the same results, baring statistical noise.

And you are right, I neglected the number of threads. The results are still weird, but here they come:

No SDRAM_BANKLOW=1. 4 threads, increasing the total memory size tenfold to 100G, 1G blocks:

Code:

sysbench memory --memory-block-size=1G --memory-total-size=100G --threads=4 --memory-oper=write runsysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)Running the test with following options:Number of threads: 4Initializing random number generator from current timeRunning memory speed test with the following options:  block size: 1048576KiB  total size: 102400MiB  operation: write  scope: globalInitializing worker threads...Threads started!Total operations: 72 (    6.96 per second)73728.00 MiB transferred (7131.56 MiB/sec)General statistics:    total time:                          10.3370s    total number of events:              72Latency (ms):         min:                                  408.45         avg:                                  571.50         max:                                  607.71         95th percentile:                      601.29         sum:                                41148.34Threads fairness:    events (avg/stddev):           18.0000/1.00    execution time (avg/stddev):   10.2871/0.05
No SDRAM_BANKLOW=1. 1 thread, total memory size 100G, 1G blocks:

Code:

sysbench memory --memory-block-size=1G --memory-total-size=100G --threads=1 --memory-oper=write runsysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)Running the test with following options:Number of threads: 1Initializing random number generator from current timeRunning memory speed test with the following options:  block size: 1048576KiB  total size: 102400MiB  operation: write  scope: globalInitializing worker threads...Threads started!Total operations: 100 (   12.00 per second)102400.00 MiB transferred (12283.80 MiB/sec)General statistics:    total time:                          8.3350s    total number of events:              100Latency (ms):         min:                                   82.68         avg:                                   83.35         max:                                   85.03         95th percentile:                       84.47         sum:                                 8334.53Threads fairness:    events (avg/stddev):           100.0000/0.00    execution time (avg/stddev):   8.3345/0.00
With SDRAM_BANKLOW=1. 4 threads, total memory size 100G, 1G blocks:

Code:

sysbench memory --memory-block-size=1G --memory-total-size=100G --threads=4 --memory-oper=write runsysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)Running the test with following options:Number of threads: 4Initializing random number generator from current timeRunning memory speed test with the following options:  block size: 1048576KiB  total size: 102400MiB  operation: write  scope: globalInitializing worker threads...Threads started!Total operations: 86 (    8.39 per second)88064.00 MiB transferred (8593.18 MiB/sec)General statistics:    total time:                          10.2470s    total number of events:              86Latency (ms):         min:                                  265.19         avg:                                  472.53         max:                                  719.95         95th percentile:                      612.21         sum:                                40637.52Threads fairness:    events (avg/stddev):           21.5000/0.50    execution time (avg/stddev):   10.1594/0.06    
With SDRAM_BANKLOW=1. 1 thread, total memory size 100G, 1G blocks:

Code:

sysbench memory --memory-block-size=1G --memory-total-size=100G --threads=1 --memory-oper=write runsysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)Running the test with following options:Number of threads: 1Initializing random number generator from current timeRunning memory speed test with the following options:  block size: 1048576KiB  total size: 102400MiB  operation: write  scope: globalInitializing worker threads...Threads started!Total operations: 84 (    8.38 per second)86016.00 MiB transferred (8581.84 MiB/sec)General statistics:    total time:                          10.0216s    total number of events:              84Latency (ms):         min:                                  113.32         avg:                                  119.30         max:                                  122.29         95th percentile:                      118.92         sum:                                10021.14Threads fairness:    events (avg/stddev):           84.0000/0.00    execution time (avg/stddev):   10.0211/0.00
I also tested with 2 threads. Now, I can certainly do this more systematically (and running multiple experiments in multiple workloads). I am not so concerned about the overall performance improvement of the Pi 5, as I am trying to understand the memory impact here:

Code:

| NUMA / Threads |     1     |     2     |     4     ||----------------|-----------|-----------|-----------|| Off            | 12,283.80 |  9,162.35 |  7,131.56 || On             |  8,581.84 |  9,208.26 |  8,593.18 || Gain/Loss      | -30.14%   |   0.50%   |  20.50%   |
I have my own interpretation of this data, but would love to hear your thoughts. As for the concept of realistically measuring this without synthetic benchmarks, I will refrain from commenting on the likes of Geekbench. Here's something tangible: how much time does it take for the PI (full of services) to boot:

NUMA enabled:

Code:

Startup finished in 3.482s (kernel) + 10.074s (userspace) = 13.556smulti-user.target reached after 10.052s in userspace.
NUMA disabled:

Code:

Startup finished in 3.593s (kernel) + 10.134s (userspace) = 13.728smulti-user.target reached after 10.099s in userspace
I wouldn't read too much on the 3% uplift on the kernel (or the 0.4% in userspace). In my world this is statistical noise.

Statistics: Posted by bytter — Sat Dec 07, 2024 12:24 pm



Viewing all articles
Browse latest Browse all 4834

Trending Articles