The importance of time
Only very few readers will know the name Clyde Coleman. You have to be an expert on cars to know that Coleman developed the electric starter for passenger cars in 1899. The starter replaced the crank handle, which until then had to be plugged in under the radiator and turned until the engine started.
A first step from analogue to electric and above all towards more user-friendliness and usability in the vehicle. Today, the starter is an aid that you hardly notice and that works as a matter of course in the background when you press the start/stop button. Only when the engine does not start and the vehicle breaks down do we notice this basic technology.
A comparable basic technology in IT is the Network Time Protocol (NTP), a support element for synchronising clocks in computer systems via packet-based communication networks, which you do not notice in everyday IT life. Provided it functions stably, it plays a minor role in IT departments. Consequently, know-how about it is hardly developed or not available. But NTP can be the trigger for a full-blown crisis situation if it is not configured and used properly. In extreme cases, this can lead to a complex IT or cloud infrastructure failing and all services and applications running on it being swept away into digital nirvana.
The majority of IT systems cannot be operated without time synchronisation. An example is the Active Directory, which uses a time stamp to take over the most current data in the event of replication conflicts. More critical is the use of system time in various authentication mechanisms (for example Kerberos). External tokens (RSA hardware tokens, mobile apps such as Microsoft/Google Authenticator) use either their own time measurement or the timing of mobile networks to realise the defined validity limits of the tokens. Here, deviations in the timing of more than 60s between different instances can, in extreme cases, lead to the login process not being completed successfully and users not being able to log in to their applications. From an audit perspective (and of course also from an operational and security perspective), an essential time-relevant aspect is the logging of information for possible evaluations. These logs are all time-based. If they are not valid and reliable, they can be unabashedly kicked into the Recycle.bin bin.
In this article, we want to shed light on the topic of time in modern IT infrastructures and show the background and effects of NTP as well as give tips on how to check the NTP configuration in a risk-oriented manner and avoid failure escalation.
The right time
To do this, we first have to deal with determining the "right" time. Timing technology has always played a critical role in information processing. Since the beginning of the IT era, every physical server and PC has contained a timer that was buffered by a battery to bridge power failures and reproduce the correct time on restarts. In the process, weak CMOS batteries could lead to incorrect timing in the device.
Today's virtual servers measure the time independently with the help of the CPU clock and without additional internal hardware. The effective clock can fluctuate with the load of a server or deviate from the real time. For example, with current systems, large time jumps (> 1 minute) can occur during a reboot process. In addition, virtual machines can also be completely paused or slowed down.
To ensure a stable and synchronised time clock for computer clocks, a powerful support tool is available in the form of NTP. Due to the universal availability of NTP and because it is a proven and stable protocol, internal time clocking plays a subordinate role in the design of modern IT infrastructures.
What is NTP exactly?
NTP is a standard for synchronising the timing in computer systems with external high-precision timers and was developed by David L. Mills in 1985 to provide reliable, externally fed time synchronisation across various IT devices and networks. In common usage, NTP refers to both the protocol and the software reference implementation of it.
Time synchronisation is achieved through a hierarchical structure of public servers that synchronise their times with each other. The highest level is fed by precise time sources (atomic clocks, GPS receivers). Other time sources include servers from universities, government institutions, companies and private individuals. Currently, well over 100,000 NTP nodes are operated worldwide.
Since deviations in the timing can also occur here, especially due to latency or runtimes in the network, algorithms for correcting deviations have been implemented, which we will not go into here for the sake of clarity. However, it is important to understand that NTP does not work like DNS: the first server is not tried, and the second only if the first is unavailable, but an average time is formed from all configured servers. For robustness, at least four sources that are as different as possible are required.
For the correct implementation of NTP in a complex IT infrastructure, it must be defined who sets the "correct" time and who takes it over, or with whom the time is returned to the "correct" time. NTP is usually integrated into the internal services, mostly as part of the DNS and domain controller design. To simplify, it can be said that modern servers today offer all the prerequisites for correct NTP integration, regardless of the operating system.
Surprisingly, however, best practices are stuck in time and do not take cloud and also virtualisation sufficiently into account. A study of the Microsoft Technet reveals contradictory and unclear recommendations (for example, virtualised domain controllers as time sources). Furthermore, the NTP recommendation of at least four time sources is hardly implemented in any company network today, although this would be easily possible, and many companies only trust internal NTP servers, thus massively reducing robustness.
What can be done in the context of an audit?
For IT managers, it makes sense to implement the NTP reference recommendation when designing the company's internal IT infrastructures. What should be considered when implementing NTP and what pitfalls should or can be avoided? In the following, we have described some aspects for responsible persons and IT auditors on how to carry out a quick check and identify potential sources of error, e.g. in a cloud infrastructure. This is of course not exhaustive and may have further dimensions depending on the environment.
Assess the implementation of the NTP reference design in the architecture and validate the server-client hierarchy for NTP.
If the server-client structure is faulty, rapid propagation of the wrong system time to domain controllers and clients can be the result, especially if the NTP recommendations on four independent sources are not implemented. Running what-if scenarios shows possible effects.
2. External or internal timers
How is the time signal provided to the own infrastructure? Are internal timers alternatively connected in addition to external sources? The use of e.g. a physical GPS receiver in the infrastructure as a secure time source can provide additional security. This means that additional complexity in the infrastructure is accepted, but on the other hand it makes the infrastructure independent of external timers.
3. Feeding the external time signal
How is an external time signal fed into the IT infrastructure? By default, the feed of an external time signal is via port 123. What happens if port 123 is blocked at the network level? And what happens if the upstream provider or an external attacker blocks port 123?
4. Checking the Hyper-V settings
The virtualisation layer has its own complexity. Virtual servers use their own rules to determine the correct time, so attention should be paid to time synchronisation e.g. in Hyper-V and Hyper-V parameterisation. In the default configuration, Hyper-V time is imposed on all virtual servers. By default, the following order of precedence for determining the reference time is found in the configuration setting in Hyper-V:
- Checking the registry entries (e.g. VMICTimeProvider) on the host
- Hyper-V Time Synchronization setting in the Hyper-V console
- NTP registry entries on the client
- NTP setting in the client system settings
- Time from the domain controller hierarchy
- Hypervisor emulated clock
Although this setting corresponds to a Microsoft best practice recommendation, it is not recommended for service providers, for example. This can be particularly relevant when service providers provide a cloud infrastructure to service recipients who want to use their own NTP servers for log consistency reasons.
In the NTP context, virtualised domain controllers can prove fatal, as they are the NTP source for all machines connected to the domain.
5. Completeness of the operating manuals
Is the use and deployment of NTP described and regulated in the operating manuals? Is appropriate information on the service available to staff?
Is the monitoring designed to detect time deviations, misbehaviour and events related to the NTP service and does the classification of events correspond to the increased importance of NTP? The classification of time deviations or correlated effects as information / warning does not correspond to the criticality of the service.
Even if the elements shown cannot be adopted for every complex IT infrastructure and, above all (disclaimer!), are not complete, they do give a few hints on how to make a quick assessment of the NTP configuration. By assessing these elements, one gets a quick feedback whether the implementation meets the basic requirements or whether there is a risk.
In this respect, on the next audit plan, you can possibly include the issue of time and assess the risk of failures and the impact. Our tech-driven audits division can gladly support you in planning or conducting IT audits.
- Complex IT infrastructures have a critical dependency on correct timekeeping
- Deviations from the correct time and asynchrony can lead to malfunctions and failures
- An assessment of the correct time reference within an IT infrastructure is possible for Internal Audit with "on board" means