Platform Best Practices
The KeySquare Platform can be run in a number of configurations depending on your performance requirements. This is really a tradeoff between hardware resources and performance.
- If hardware resource constrained, you may choose to use less CPU resources at the cost of latency
- If not hardware resource constrained, best practice is to isolate CPU cores and pin core threads to CPUs while allowing all applications ample memory to breathe
Core Threads
Within a platform, the following components have core threads:
- Aeron Media Driver
- Sender Thread
- Receiver Thread
- Conductor Thread
- Sequencer
- ks Thread (aka the main business logic thread)
- Relay Live
- ks Thread (aka the main business logic thread)
- Relay Cache
- ks Thread (aka the main business logic thread)
Configuring the Aeron Media Driver
The core property here is
aeron.threading.mode=SHARED|SHARED_NETWORK|DEDICATED
Modes are
- SHARED
- The Sender, Receiver and Conductor will all share a single thread
- SHARED_NETWORK
- The Sender and Receiver will share a thread. The Conductor will also use another thread.
- DEDICATED
- All three functions each take a thread
Furthermore, you can control the idle strategies for each of these agents with:
aeron.media.conductorIdleStrategy=org.agrona.concurrent.BusySpinIdleStrategy
aeron.media.receiverIdleStrategy=org.agrona.concurrent.NoOpIdleStrategy
aeron.media.senderIdleStrategy=org.agrona.concurrent.NoOpIdleStrategy
KeySquare recommends running DEDICATED with the idle strategies as described here (BusySpin and NoOp) should you have the capacity.
See further detail here Thread Utilisation in Aeron Media Driver
Configuring the Sequencer, Relay Live and Relay Cache
Override this property in each of the components
KS_APP_APPLICATION_IDLE_STRATEGY
Controlling the core pinning configuration
If optimising for performance, KeySquare recommends isolating CPU cores on the same numa node. For example:
$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-6.8.0-59-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro isolcpus=8-15,24-31 nohz_full=8-15,24-31
This box has cores 8-15 and 24-31 isolated from the kernel scheduler (indicated by the isolcpus). The same cores are operating in full dynticks (no timer interrupts for scheduling ticks). This reduces timer interrupts on these CPUs when idle.
You can configure this by editing your /etc/default/grub configuration:
$ cat /etc/default/grub
...
GRUB_CMDLINE_LINUX="isolcpus=8-15,24-31 nohz_full=8-15,24-31"
...
Alternatively, using Redhat Enterprise Linux, we recommend using tuned-adm to set a profile of cpu-partitioning.
You can then use the following configuration to pin core threads mentioned above to isolated cores.
AERON_MEDIA_DRIVER_CPUS=7,8,9,10
SEQUENCER_CPUS=6,11
RELAY_LIVE_CPUS=5,12
RELAY_CACHE_CPUS=4,13
ECHO_SERVICE_CPUS=
Additional performance tuning
Performance tuning is very much a skill and also going to vary by environment. For example, certain HP servers have a mode whereby they scale their CPU frequencies in the base to turbo range settling on a speed that can be held to reduce jitter of clock speed changes.
In addition to platform configuration aspects, we recommend tuning your boxes to include:
- Configuring your cpu scaling governor
- Disable power saving
- Disable CPU from dynamically changing its frequency
- Disable unnecessary services
- Kernel bypass on networking
Core isolation and nohz_full with pinning will get you most of the way there in terms of application configuration. Gains beyond core isolation with nohz_full combined with pinning and cpu scaling changes are marginal in comparison. Please feel free to drop KeySquare a line to further optimise.