Skip to main content

High Precision Time

· 11 min read
Chief Technology Officer

In electronic trading applications we need access to high precision time - in this post we look to answer: why do we need it? what is high precision? and how do we get it?

We need to be able to answer questions like:

  • When did an order come in... and how long did it take to process?
  • How long did it take for that order to come in (from the market)?
  • What are my end-to-end latencies?

There's regulatory requirements in the space that we need to consider. MiFID II mandates timestamp granularities of at least 1ms for electronic trading and 1μs for high-frequency algorithmic trading, with a maximum divergence from UTC of 1ms and 100μs respectively.

Given how quickly operations can complete on modern day computers we generally want need nanosecond granularity which provides us with enough resolution to see the impact of changes being made. Without this, a new code change could negatively impact performance, but you wouldn't be able to see it because you're not measuring with enough detail.

When measuring time we want to be mindful of the following:

  • Relativity - the time that we capture needs to be relative to something like the epoch. Otherwise we wouldn't be able to answer when something happened. This is also referred to as a wall clock time. It is possible to capture non-relative timestamps, but this limits what you can do. Specifically, you're only able to take the difference between these sorts of timestamps to work out how long an operation took, but can't answer the question of when that operation occurred - think of this as stopwatch time.

  • Clock Synchronisation - this is the coordination of clocks across a number of computers. Trading systems are made up of several computers so its important that each of their clocks are synchronized relative to each other. In practice, clocks are synchronised to Universal Coordinated Time (UTC) within a given tolerance. This allows timestamps across computers to be meaningfully compared. Getting this right is a challenging problem. This is a topic that is deserving of its own post.

  • Precision - It's worth pointing out that granularity and precision are often confused, but they're different. We're measuring time in nanoseconds, that's just the unit or granularity - it doesn't mean our measurements are actually accurate to a nanosecond. This is because measuring time on a computer takes time itself. The computer needs to run instructions to check and report the time, which creates a delay. We therefore don't want the call to obtain time to take so long that it interferes with the measurements that we're making. This topic is discussed by Aleksey Shipilёv in Nanotrusting the Nanotime.

  • Zero-Garbage - when using Java in the electronic trading space, we want to keep the amount of object allocation to a minimum. The reason for this is that memory from any unused objects eventually gets freed up by the garbage collector - this process can lead to jitter (intermittent latency spikes) in the performance profile of an application.

Getting Time in Java

There's a number of ways that you can obtain time in Java.

Lets review these quickly...

MechanismWall Clock?GranularityComment
System.currentTimeMillis()YesmillisecondsOld school :) This delegates to gettimeofday on Linux.
new java.util.Date()YesmillisecondsInternally this calls System.currentTimeMillis().
System.nanoTime()NonanosecondsThis delegates to clock_gettime using CLOCK_MONOTONIC on Linux. The timestamp provided isn't a wall clock timestamp and therefore only useful for measuring elapsed time between multiple invocations.
Instant.now()YesnanosecondsProvided as part of the Java 8 Time API

There's only really one choice here, which is using Instant.now() - it's the only one that offers nanosecond precision in the form of a wall clock time.

Lets take a deeper dive... internally the Instant class makes use of a combination of System.currentTimeMillis() and VM.getNanoTimeAdjustment(), which is an approximation. With the complexities of operating systems and hardware its actually very difficult for Java to provide a platform independent solution here.

This therefore does't appear to be suitable for regulatory use. In addition, this also requires the creation of a new Instant each time a timestamp is required and therefore falls short on the zero-garbage front.

So really none of what's provided out-of-the-box meet the criteria that we're searching for...

Quick Digression: Representation of Time

If we want to achieve zero object allocation when obtaining a timestamp, then we have two options:

  • Operate with primitive types, e.g. long - but would a long be big enough to house a timestamp in nanosecond granularity?
  • Accept or return a re-usable object that can be passed/returned when obtaining a timestamp.

The first option is certainly the simplest and feels more like the existing methods to obtain a timestamp. Lets explore this...

A Java long is a signed 64 bit number, using one of the bits for the sign, it gives the range

  • -263 to 263-1.

The start and end of the range are held within the constants: Long.MIN_VALUE and Long.MAX_VALUE respectively. We can use the following code to determine the earliest and latest time that we can represent in a long:

  public static void main(String[] args) {
printTime(Long.MIN_VALUE);
printTime(Long.MAX_VALUE);
}

public static void printTime(long timestamp) {
long seconds = timestamp / 1_000_000_000;
int nanos = (int) (timestamp % 1_000_000_000);

LocalDateTime dateTime = LocalDateTime.ofInstant(
Instant.ofEpochSecond(seconds, nanos), ZoneId.of("UTC"));
System.out.println(dateTime);
}

The output provides us with the following time range: 1677-09-21T00:12:43 to 2262-04-11T23:47:16. This is sufficiently wide to cope with the expected values that would need to be represented in the electronic trading space (at least for the near term) and therefore using a long to store a high precision timestamp is viable.

To put this into context, a Java Instant, is internally modelled as two parts:

  • a long that stores the seconds since epoch.
  • an int that stores the nanosecond part.

Using two fields allows a much wider range of time to be represented. The JavaDoc tells us that an Instant can represent times from -1000000000-01-01 to 1000000000-12-31. Which is ~2 billion years. However that comes at the expense of needing both a long and int, which we can opt to forgo for the simplicity and performance of just being able to use a long.

In addition, using just a long provides the secondary benefit of not requiring an object header, we can see this using the JOL tool.

> java -jar jol-cli-0.17-full.jar internals java.time.Instant

# VM mode: 64 bits
# Compressed references (oops): 3-bit shift
# Compressed class pointers: 0-bit shift and 0x7FEA6B000000 base
# Object alignment: 8 bytes
# ref, bool, byte, char, shrt, int, flt, lng, dbl
# Field sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8
# Array base offsets: 16, 16, 16, 16, 16, 16, 16, 16, 16

Failed to find matching constructor, falling back to class-only introspection.

java.time.Instant object internals:
OFF SZ TYPE DESCRIPTION VALUE
0 8 (object header: mark) N/A
8 4 (object header: class) N/A
12 4 int Instant.nanos N/A
16 8 long Instant.seconds N/A
Instance size: 24 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

The size (SZ) column in the highlighted output shows that the total object header overhead is 12 bytes. This accounts for half of the 24 bytes required for the Instant. Using just a long, which is 8 bytes, reduces the overall footprint to just a third.

Native Solutions

After attending a MiFID II conference prior to it being enforced, there seemed to be varying opinions on how to solve the clock synchronisation problem (making sure that clocks between computers are in sync), solutions being proposed included using Atomic Clocks, GPS, etc.

Once a solution was in place to set the system clock of a computer, the next problem was how do you source that in the Java space? The consensus was to use the native clock_gettime function with CLOCK_REALTIME for applications deployed on Linux:

#include <time.h>

long get_time() {
struct timespec ts;
if (clock_gettime(CLOCK_REALTIME, &ts) == 0) {
return ts.tv_sec * 1000000000L + ts.tv_nsec;
}
return -1; // error
}

Native Functions from Java

In the past, to invoke a C function from Java, the defacto choice was to use the Java Native Interface (JNI). This was performant and provided an approach that didn't require object creation, so ideal for etrading applications.

In newer versions of Java we have the Foreign Function & Memory API (FFM). This was initially provided as a preview feature in Java 18 and has now matured to be officially part of Java 22.

But how does FFM compare with JNI? And can we get away from writing C code and the boilerplate that comes with using JNI?

Lets try and invoke clock_gettime directly...

Based on Java 21, which is the current LTS (Long Term Support) version
import java.lang.foreign.Arena;
import java.lang.foreign.FunctionDescriptor;
import java.lang.foreign.Linker;
import java.lang.foreign.MemorySegment;
import java.lang.foreign.SymbolLookup;
import java.lang.foreign.ValueLayout;
import java.lang.invoke.MethodHandle;

class ForeignFunctionMemoryLinuxClock {

private static final int CLOCK_REALTIME = 0;
private static final Linker LINKER = Linker.nativeLinker();
private static final SymbolLookup LOOKUP = LINKER.defaultLookup();
private static final Arena ARENA = Arena.global();
private static final MemorySegment TIMESPEC =
ARENA.allocate(2 * ValueLayout.JAVA_LONG.byteSize());
private static final MethodHandle CLOCK_GETTIME =
LINKER.downcallHandle(LOOKUP.find("clock_gettime").orElseThrow(),
FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.JAVA_INT,
ValueLayout.ADDRESS));

public long getTime() {
try {
int result = (int) CLOCK_GETTIME.invoke(CLOCK_REALTIME, TIMESPEC);
if (result == 0) {
long seconds = TIMESPEC.get(ValueLayout.JAVA_LONG, 0);
long nanoseconds = TIMESPEC.get(ValueLayout.JAVA_LONG,
ValueLayout.JAVA_LONG.byteSize());
return seconds * 1_000_000_000 + nanoseconds;
} else {
throw new IllegalStateException("clock_gettime failed, result: " + result);
}
} catch (Throwable t) {
throw new IllegalStateException("clock_gettime failed", t);
}
}
}

A quick JMH benchmark shows...

Invoking get_time using JNI vs clock_gettime using FFM (non-thread-safe)
Benchmark                        Mode  Cnt     Score     Error  Units
foreignFunctionMemoryLinuxClock avgt 3 25.726 ± 0.143 ns/op
jniLinuxClock avgt 3 26.125 ± 0.175 ns/op

The FFM implementation seems to be slightly faster than the JNI one. Note that this isn't an apples-to-apples comparison, since we're really doing two separate things:

  • The JNI version is invoking the get_time function we defined above which internally invokes clock_gettime.
  • The FFM version is invoking the clock_gettime directly.

Nevertheless, what's interesting is that FFM performed very well and therefore may well be an alternative to replace the need for JNI with its complexities.

The FFM solution shown above where we directly invoke clock_gettime isn't ideal. What lets it down is the requirement to have a MemorySegment which gets populated with the seconds and nanoseconds when the call is made. The implication here is that the code shown isn't thread safe; two independent threads invoking the method at the same time could cause unexpected behaviour. Potential solutions:

  • Use a thread local to allow each thread to have its own MemorySegment - this comes at a cost of a map lookup.
  • Use synchronized / locking - uncontended lock costs can be low, but since we don't know the context in which this would be called, it would be best to avoid this.
  • Create a new instance of the class for each thread/call site - this has challenges and its difficult to ensure correctness.

Since only the first option here is a contender, lets focus on that... Adding in use of a thread local in the FFM implementation gives us the following stats:

Invoking get_time using JNI vs clock_gettime using FFM (thread-safe)
Benchmark                        Mode  Cnt    Score      Error  Units
foreignFunctionMemoryLinuxClock avgt 3 30.113 ± 0.145 ns/op
jniLinuxClock avgt 3 26.130 ± 0.172 ns/op

As expected, the additional map lookup increases the time taken for each call.

Conclusion

A solution to obtaining high precision time in Java on Linux, which meets regulatory requirements, is to invoke the nativeclock_gettime using CLOCK_REALTIME. This can be done using JNI, however FFM looks like a contender that may be able to replace JNI - further experimentation is needed here.

Next steps: we should do an apples-to-apples comparison - FFM can invoke a native function that returns a long without the need to allocate a MemorySegment. Therefore we should profile JNI vs FFM invoking the get_time implementation defined above.

Welcome

· One min read
Cheif Executive Officer
Chief Operating Officer
Chief Product Officer
Chief Technology Officer

Welcome to KeySquare blog.

We'll use this space to provide you updates about what's new with KeySquare!