Best Practices
Java Development
There are several ways of developing Java applications against the KeySquare Platform:
- Local Windows/MacOS development connecting to a locally running KeySquare Platform
- Local Linux development connecting to a locally running KeySquare Platform
- Local development connecting to a remote KeySquare Platform
- Remote SSH development on the core KeySquare Platform box
- Remote SSH development on a secondary KeySquare Platform box
Local Windows development connecting to a locally running KeySquare Platform
In this configuration, you can:
- Initialise a dedicated KeySquare platform using Docker Desktop isolated from everyone else
- Develop your applicaitons locally without interfering with others
Local Linux development connecting to a locally running KeySquare Platform
In this configuration, you can:
- Initialise a dedicated KeySquare platform using Docker isolated from everyone else
- Develop your applicaitons locally without interfering with others
Local development connecting to a remote KeySquare Platform
In this configuration, you can:
- Connect to a central environment
Remote SSH development on the core KeySquare Platform box
This is the preferred way of developing here at KeySquare. This offloads the burden from your local workstation giving you access to better networking and faster compute.
GC Strategy
KeySquare has tested and highly recommends using Generational ZGC if on Java 21 and ZGC on Java 17.
For Java 21, the following configuration has been tested
java \
-Xlog:gc*:logs/component.gc.log:time,uptime \
-XX:+UseZGC -XX:+ZGenerational -Xmx$MAX_HEAP_SIZE_TO_USE -XX:SoftMaxHeapSize=$SOFT_MAX_HEAP_SIZE_TO_USE \
-XX:-ZUncommit -XX:ZUncommitDelay=300 -XX:+AlwaysPreTouch \
--add-opens java.base/jdk.internal.misc=ALL-UNNAMED
Idle Strategies
Application idle strategies play in with Platform Idle Strategy configuration. The idle strategy choice will dictated by the latency budget and throughput requirements of an application.
| Transport | Idle Strategy | Pinning | 99.999% | Latency Pathway | × Over Base |
|---|---|---|---|---|---|
| UDP_IPC | org.agrona.concurrent.NoOpIdleStrategy | Y | 11.5us | ![]() | 1.00× |
| UDP_IPC | org.agrona.concurrent.SleepingMillisIdleStrategy | Y | 1.11ms | ![]() | 96.52× |
The latency difference is dramatically slower at 96.52x slower. However, this latency ia gained by busy spinning on a core. If we were to analyse using a NoOpIdleStrategy, we would see it is using 100% CPU as the process is effectively busy-spinning polling for additional work

Threading Strategy
KeySquare delivers data to the callbacks on a single thread. We recommend developing low latency applications to be single-threaded for a number of reasons:
- Avoids thread contention and synchronization overhead, reducing latency spikes
- Ensures predictable execution order, which is critical for deterministic processing
- Minimizes context switching, improving CPU cache locality and throughput
- Simplifies debugging and reasoning about code paths in latency-sensitive environments
- Reduces the risk of concurrency bugs such as race conditions and deadlocks
Core pinning the main Java business logic thread
Core pinning can be controlled with a system property added to your Java command line
-DAPP_CPUS=n
where n is the core you want to use. Specifically, as per Platform Best Practices, you should core pin your application to an isolated core.
Deployment strategy
KeySquare's recommendation is to favour density over sparcity when it comes to hardware selections. Specifically, oee big monstrous box as your core server with the intention of running as many services on it as possible over components running across many machines. This means:
- Most processes can communicate back to the Sequencer and relays over IPC allowing the lowest possible latency
- More cores available to isolate and make available to your processes
Transport selection
Always favour UDP_IPC assuming availability of the multicast network. We think of this as concentric cirlces around the Sequencer:
- Running same box as Sequencer enables IPC
- Running in the same data centre as the Sequencer with UDP multicast visibility
- Running in another data centre or for remote development
The decision of where to locate a component often comes down to latency requirements too. If not latency sensitive, favour cost sensitivity and host your process wherever is cheaper.
Bringing it all together
This table illustrates the results of:
- Varying transport option
- Idle Strategy
- Pinning application thread to an isolated core
| Transport | Idle Strategy | Pinning | 99.999% | Latency Pathway | × Over Base |
|---|---|---|---|---|---|
| UDP_IPC | org.agrona.concurrent.NoOpIdleStrategy | Y | 11.5us | ![]() | 1.00× |
| UDP_IPC | org.agrona.concurrent.NoOpIdleStrategy | N | 61.9us | ![]() | 5.39× |
| UDP_IPC | org.agrona.concurrent.SleepingMillisIdleStrategy | N | 1.06ms | ![]() | 92.17× |
| UDP_IPC | org.agrona.concurrent.SleepingMillisIdleStrategy | Y | 1.11ms | ![]() | 96.52× |
| TCP | org.agrona.concurrent.NoOpIdleStrategy | Y | 1.29ms | ![]() | 112.17× |
| TCP | org.agrona.concurrent.NoOpIdleStrategy | N | 1.67ms | ![]() | 145.22× |
| TCP | org.agrona.concurrent.SleepingMillisIdleStrategy | N | 2.38ms | ![]() | 206.96× |
| TCP | org.agrona.concurrent.SleepingMillisIdleStrategy | Y | 2.42ms | ![]() | 210.43× |
From the best to worst case, we can gain a 210x speed up by tuning the environment and pinning our thread.
UDP_IPC org.agrona.concurrent.SleepingMillisIdleStrategy; N






UDP_IPC org.agrona.concurrent.SleepingMillisIdleStrategy; Y






UDP_IPC org.agrona.concurrent.NoOpIdleStrategy N






UDP_IPC org.agrona.concurrent.NoOpIdleStrategy Y






TCP org.agrona.concurrent.NoOpIdleStrategy N






TCP org.agrona.concurrent.NoOpIdleStrategy Y






TCP org.agrona.concurrent.SleepingMillisIdleStrategy N






TCP org.agrona.concurrent.SleepingMillisIdleStrategy Y





