What it means to set your clock over cross-country latencies
Have you ever wondered what process allows your phone to get the time after it has been dead for hours, or keeps all your devices reporting consistent times across versions and operating systems? The answer is a commonly used but often ignored protocol called Network Time Protocol (NTP), which was developed to allow consistent timekeeping between network-connected computers. This protocol has grown up with the internet: the first Request for Comments (RFC) was written in 1985, and was itself based on Time Protocol, which was written in 1983. The latest version of NTP was published in 2010 and keeps computers, phones, and even internet-connected toasters around in the world in sync.
Typically, even for servers in a modern data center, using NTP is very simple. It requires just a basic configuration of the unix NTP daemon (ntpd), or equivalent for other OS. So why would a tech startup today care about a problem that was solved over 30 years ago? Here at Wealthfront we are working at the intersection of finance and software and our engineering challenges are often uniquely determined by the financial industry or financial regulation. Sometimes that means developing new ideas and technologies, but sometimes it means understanding an old solution and figuring out how to apply it to a new problem.
Last year, the Financial Industry Regulatory Authority (FINRA) passed a rule that states that any computer clocks used to trade securities must be synchronized to within 50 milliseconds of the National Institute of Standards and Technology (NIST) atomic clock. After reading that rule, I was left with a few burning questions: How does NTP really work (how is it even possible to tell time using the internet)? Is 50ms a lot or a little in practice? Is it even theoretically possible to stay within 50ms of the NIST time from any location in the country, given the locations of the eight NIST time servers?
As part of my work to implement a system to keep our time synchronized with NIST I got to answer those questions! This blog post outlines some of the things I learned along the way: the theory behind NTP and an intuition around the 50ms limit. Towards the end of the post I will also describe the time synchronization system we use at Wealthfront. Spoiler alert: it’s still pretty simple since NTP does most of the work for you!
How can you tell time using the internet?
Imagine you need to set your watch. Typically you look at another clock and set your watch to the same time. But what would you do if you didn’t have another clock? Maybe you would text your friend who has a clock. Then when they answered you could set your watch to the time they give you. But what if it took them 30 seconds between getting the time and sending their response? And what if it took 10 seconds between them sending their message and you receiving it? Now your watch is off by 40 seconds!
The algorithm behind NTP is designed to solve that exact problem. Let’s say we know of a server that is a reliable time source (Server B in the example below, NIST servers in reality). In practice there are highly accurate time-keeping methods such as GPS or atomic clocks that are used to determine the time on these servers, and then all other servers (Server A in the example below) can use NTP to synchronize their clocks against the reference servers.
In the following diagram, Server A is requesting time from Server B and Server B is responding with its time to Server A. The diagram shows the order of the messages that are sent between the two servers, with time increasing from top to bottom. Server A sends its message at time t1 and the message is received by Server B at time t2. Server B sends a response at time t3 and the response is received by Server A at time t4. All of those timestamps are recorded so when Server A receives the response from Server B it knows the values of t1 and t4 as reported by its clock, and t2 and t3 as reported by the clock on Server B.
How does knowing those times help us? Let’s assume that the travel time across the network was symmetric between the two servers. In that case, intuitively we know the following, where d is the network delay (how long each message was in flight, otherwise known as half the round trip time (RTT)):
If the two servers were perfectly synchronized then d would also equal the following:
However, we need NTP because the two servers are never perfectly synchronized! The problem with those equations is that you are subtracting times from two different clocks, without knowing how the clocks relate to each other. In an extreme example, Server B could be so far ahead of Server A that would give you a negative network delay.
To solve that problem let’s define f as the offset between the two servers, or the difference you would see between the times if you could check the two server clocks at the exact same instant. This is actually the value for which we want to solve; since we are assuming the time on Server B is correct, we can fix the clock on Server A using that offset. Now we can correct the above delay equations using the offset:
Note that f can be positive or negative depending on if Server A is behind or ahead of Server B. Either way, however, it will skew the difference between the times in the reverse direction in the response as it will in the request. Solving for the offset from those three equations we get:
So when Server A receives those four times it can use them to adjust its clock with the offset! Of course the complete algorithm is more complicated in practice, since it must factor in the reality that the delays are never perfectly symmetric and that they can change a lot for the same two servers based on a shifting network topology. So while it’s fairly straightforward to get any given offset, keeping the clock consistent over time requires a good deal of statistical weighting and filtering.
How far does 50ms get you?
Going back to our original example of setting your watch gives us some intuition that the further away your time source is, the less accurate your offset will be. When you are setting your watch based on a clock that is right in front of you, your accuracy is based on how quickly you can set the watch after the minute changes. When you are setting your watch based on a text from a friend, your accuracy is based on how good you are at estimating the time it takes for the text to reach you. Basically, your time is only as good as your measure of how quickly information travels between your reference clock and your watch.
Luckily, we already have a measure for how quickly information travels between server clocks: the network delay! Network delay is related to physical distance so we expect reference servers that are physically closer to give us a more accurate time (i.e. to get a time within 10ms we probably need a closer server than we need to get a time within 50ms). This intuition is formulated in NTP as the “synchronization distance” or the maximum error in the time offset calculation. Let’s define this maximum error as L. Ignoring the accuracy of each time measurement (which will be several orders of magnitude smaller than the network delay), in practice we can write the synchronization distance as:
The next interesting question we can ask is: how close does our reference clock have to be to guarantee that we can stay within 50ms of its time? Let’s assume once again that we have a symmetric network delay and use speed = distance/time to solve for the max distance (M) our reference server can be to have a max error of L:
If information could travel between two servers at the theoretical maximum (speed of light) we could use servers with a round trip of up to 30,000 kilometers! With realistic internet speeds, however, we can use servers a few thousand miles away at most. Even so, those distances do mean that it is possible to stay within 50ms of the NIST time from any location in the continental US, given the locations of the eight NIST time servers.
What else do we need?
Back-of-the-envelope calculations like those outlined above helped us here at Wealthfront to understand which of the NIST time servers would be viable choices for keeping our clock within the 50ms limit. Our full time synchronization solution involves using NTP to calculate our offset against multiple in-range NIST servers. This redundancy allows us to prevent temporary connectivity issues between us and any one server from causing a synchronization problem. No engineering solution at Wealthfront would be complete without automated monitoring so we also built checks to alert our brokerage operations team if we are not within 50ms of NIST. However, thanks to the robustness of NTP and the built in redundancy in our solution, we haven’t had any unexpected synchronization issues since we implemented this process!
If thinking about this type of problem is interesting to you maybe it’s time you also thought about working at Wealthfront!
References and Further Reading:
- First NTP RFC
- Time Protocol RFC
- Current NTP RFC
- FINRA rule on clock synchronization
- Locations of NIST time servers
Nothing in this communication should be construed as an offer, recommendation, or solicitation to buy or sell any security. Wealthfront’s financial advisory and planning services, provided to investors who become clients pursuant to a written agreement, are designed to aid our clients in preparing for their financial futures and allow them to personalize their assumptions for their portfolios. Additionally, Wealthfront and its affiliates do not provide tax advice and investors are encouraged to consult with their personal tax advisors.
All investing involves risk, including the possible loss of money you invest, and past performance does not guarantee future performance. Wealthfront and its affiliates rely on information from various sources believed to be reliable, including clients and third parties, but cannot guarantee the accuracy and completeness of that information.