Best Practices

Java Development

There are several ways of developing Java applications against the KeySquare Platform:

Local Windows/MacOS development connecting to a locally running KeySquare Platform
Local Linux development connecting to a locally running KeySquare Platform
Local development connecting to a remote KeySquare Platform
Remote SSH development on the core KeySquare Platform box
Remote SSH development on a secondary KeySquare Platform box

Local Windows development connecting to a locally running KeySquare Platform

In this configuration, you can:

Initialise a dedicated KeySquare platform using Docker Desktop isolated from everyone else
Develop your applicaitons locally without interfering with others

Local Linux development connecting to a locally running KeySquare Platform

In this configuration, you can:

Initialise a dedicated KeySquare platform using Docker isolated from everyone else
Develop your applicaitons locally without interfering with others

Local development connecting to a remote KeySquare Platform

In this configuration, you can:

Connect to a central environment

Remote SSH development on the core KeySquare Platform box

This is the preferred way of developing here at KeySquare. This offloads the burden from your local workstation giving you access to better networking and faster compute.

GC Strategy

KeySquare has tested and highly recommends using Generational ZGC if on Java 21 and ZGC on Java 17.

For Java 21, the following configuration has been tested

java \
-Xlog:gc*:logs/component.gc.log:time,uptime \
-XX:+UseZGC -XX:+ZGenerational -Xmx$MAX_HEAP_SIZE_TO_USE -XX:SoftMaxHeapSize=$SOFT_MAX_HEAP_SIZE_TO_USE \
-XX:-ZUncommit -XX:ZUncommitDelay=300 -XX:+AlwaysPreTouch \
--add-opens java.base/jdk.internal.misc=ALL-UNNAMED

Idle Strategies

Application idle strategies play in with Platform Idle Strategy configuration. The idle strategy choice will dictated by the latency budget and throughput requirements of an application.

Transport	Idle Strategy	Pinning	99.999%	Latency Pathway	× Over Base
UDP_IPC	org.agrona.concurrent.NoOpIdleStrategy	Y	11.5us		1.00×
UDP_IPC	org.agrona.concurrent.SleepingMillisIdleStrategy	Y	1.11ms		96.52×

The latency difference is dramatically slower at 96.52x slower. However, this latency ia gained by busy spinning on a core. If we were to analyse using a NoOpIdleStrategy, we would see it is using 100% CPU as the process is effectively busy-spinning polling for additional work

alt text

Threading Strategy

KeySquare delivers data to the callbacks on a single thread. We recommend developing low latency applications to be single-threaded for a number of reasons:

Avoids thread contention and synchronization overhead, reducing latency spikes
Ensures predictable execution order, which is critical for deterministic processing
Minimizes context switching, improving CPU cache locality and throughput
Simplifies debugging and reasoning about code paths in latency-sensitive environments
Reduces the risk of concurrency bugs such as race conditions and deadlocks

Core pinning the main Java business logic thread

Core pinning can be controlled with a system property added to your Java command line

-DAPP_CPUS=n

where n is the core you want to use. Specifically, as per Platform Best Practices, you should core pin your application to an isolated core.

Deployment strategy

KeySquare's recommendation is to favour density over sparcity when it comes to hardware selections. Specifically, oee big monstrous box as your core server with the intention of running as many services on it as possible over components running across many machines. This means:

Most processes can communicate back to the Sequencer and relays over IPC allowing the lowest possible latency
More cores available to isolate and make available to your processes

Transport selection

Always favour UDP_IPC assuming availability of the multicast network. We think of this as concentric cirlces around the Sequencer:

Running same box as Sequencer enables IPC
Running in the same data centre as the Sequencer with UDP multicast visibility
Running in another data centre or for remote development

The decision of where to locate a component often comes down to latency requirements too. If not latency sensitive, favour cost sensitivity and host your process wherever is cheaper.

Bringing it all together

This table illustrates the results of:

Varying transport option
Idle Strategy
Pinning application thread to an isolated core

Transport	Idle Strategy	Pinning	99.999%	× Over Base
UDP_IPC	org.agrona.concurrent.NoOpIdleStrategy	Y	11.5us	1.00×
UDP_IPC	org.agrona.concurrent.NoOpIdleStrategy	N	61.9us	5.39×
UDP_IPC	org.agrona.concurrent.SleepingMillisIdleStrategy	N	1.06ms	92.17×
UDP_IPC	org.agrona.concurrent.SleepingMillisIdleStrategy	Y	1.11ms	96.52×
TCP	org.agrona.concurrent.NoOpIdleStrategy	Y	1.29ms	112.17×
TCP	org.agrona.concurrent.NoOpIdleStrategy	N	1.67ms	145.22×
TCP	org.agrona.concurrent.SleepingMillisIdleStrategy	N	2.38ms	206.96×
TCP	org.agrona.concurrent.SleepingMillisIdleStrategy	Y	2.42ms	210.43×