What is clockless security and how does it affect CPU design?

Last update: 13/01/2026
Author Isaac
  • Clockless security relates to asynchronous CPU designs, where dependence on a global clock is reduced or eliminated.
  • Modern processors combine caches, MMUs, parallelism, and multiple threads, which complicates both performance and security.
  • Clockless designs can mitigate timing attacks, but they require new forms of monitoring and auditing. hardware.
  • Virtualization, vCPUs, and specialized accelerators expand the attack surface, making it essential to integrate security from the silicon level.

Illustration about clockless security

Expression Clockless security sounds like a futuristic concept.But it's actually closely tied to how current processors and systems are designed and protected. To understand it properly, we need to delve into how a CPU works internally, how instruction execution is organized, and what role the famous clock signal plays in setting the pace for the entire system.

In recent decades, processors have been in a race to increase their speed. clock frequency, integrate more transistors and multiply their parallelismAt the same time, designs have emerged that attempt to break free from dependence on the global clock, either across the entire chip or in specific parts. This area, that of asynchronous or clockless designs, opens up very interesting opportunities in terms of power consumption and heat dissipation… and also specific security challenges that are often grouped under the concept of clockless security.

The CPU as the center of the system and its relationship with the clock

When we talk about watch-related security, the first thing to remember is exactly what a CPU is. Essentially, the The central processing unit is the brain of the computer.: the component that interprets and executes program instructions, coordinates memory, input/output and specialized coprocessors such as GPUs.

Within a modern CPU, we find several distinct blocks. On one hand, there is the arithmetic logic unit (ALU)The arithmetic logic unit (ALU) is responsible for mathematical and logical operations with integers. Then there are the registers, which are small, ultra-fast memories where the data the processor is currently working with is stored. And on top of everything, there's a control unit that decides, cycle by cycle, what to do, what to read from memory, and what to write.

Most modern processors are synchronous designs. This means that all of those internal blocks are coordinated using a periodic clock signala kind of electronic metronome that sets the pace of execution. Each tick of this clock advances one step of the so-called instruction cycle: the instruction is fetched, decoded, executed, the results are stored, and the cycle begins again.

In a traditional processor, the clock is generated by an external oscillator that sends millions or billions of pulses per second. The frequency of these pulses, measured in hertz, megahertz, or gigahertz, tells us how many "ticks" the CPU has available each second to move data and perform operations. The higher the clock frequency, the more potential work per second.provided that the rest of the architecture complements it.

Thus, performance depends not only on the clock, but also on how many instructions per cycle (IPC) It is capable of completing the processor. The frequency-to-IPC product gives us an idea of ​​the millions of instructions per second it can execute, although theoretical figures are usually much more optimistic than what is actually seen with real programs.

From fixed wiring to integrated microprocessors

To put clockless designs into context, it's helpful to review how the CPU has evolved. Early electronic computers, such as the ENIAC, were wired fixed-program machinesTo change tasks, the system had to be physically rewired. The revolutionary idea was the stored-program computer, in which instructions reside in memory; the processor simply reads and executes them.

That stored-program architecture associated with John von Neumann eventually prevailed. In it, Instructions and data share the same memory spaceUnlike Harvard architecture, which physically separates both types of information, today almost all general-purpose CPUs follow a von Neumann architecture, although many pure or hybrid Harvard processors still exist in the embedded world.

The first processors were built with relays or vacuum tubes. They were bulky, slow, and had a reliability very limited. The leap to the solid-state transistor in the 50s and 60s allowed radically increase speed and reduce consumption and sizeFrom there, the transition was made from discrete circuits to integrated circuits (ICs), putting more and more transistors onto a single chip.

With the advent of the integrated circuit, first small-scale (SSI), then medium-scale (MSI), large-scale (LSI), and finally very-large-scale (VLSI), the CPU was compressed until it fit all on one or a few chipsThis integration culminated in the microprocessor, in which the entire processing unit is manufactured on a single silicon chip.

El Intel 4004The Intel 8080, released in 1971, was one of the first commercial microprocessors. More powerful designs soon followed, such as the Intel 8080, which became the foundation of personal computers. From that point on, the term CPU was almost always used to refer to these microprocessors.

internal architecture of a CPU

Key internal components of a modern CPU

Modern CPUs dedicate a huge portion of their silicon surface to auxiliary elements designed for to get the most out of every clock cycleFor example, almost every processor incorporates several levels of cache: small but very fast memories located near the cores that store copies of the most used data so that they don't have to constantly access the RAM.

  Advanced use of Get-WinEvent and WEF for auditing and security

In addition to the L1, L2, and often L3 caches, a complex CPU includes a memory management unit (MMU) which translates virtual addresses (those handled by the operating system) into physical addresses in RAM, manages the virtual memory and provides isolation between processes.

In the computational plane we have several specialized execution units: the ALU for integers, the floating point unit (FPU) For decimal operations, address generation units (AGUs) are used to quickly calculate memory locations, and in many architectures, vector units or SIMDs are used to operate on multiple data points simultaneously.

There is also a control unit, which can be hardwired logic or microcode-based, that is, a internal program that translates each high-level instruction in a sequence of internal control signals. In many processors, this microcode can be updated, allowing for the correction of design errors or adjustment of behavior after the fact.

Finally, there is a battery of internal registers: general-purpose registers, accumulators, program counters, status registers with flags that indicate things like whether the result of an operation is zero, negative, or has produced an overflow, etc. All of this is coordinated following the classic loop. capture, decoding and execution of instructions.

How to run a program step by step

The basic operation of any CPU boils down to fetching instructions from memory and processing them one after another. This happens in three main phases. First, the stage of capture (fetch), in which the instruction whose addressing is given by the program counter is read from memory.

Next comes the decoding phase. The newly captured instruction passes through a binary decoder that examines its operation code (opcode) and translates that bit pattern into concrete signals that enable or disable parts of the processor. That's where it's decided whether it's an addition, a jump, a load from memory, etc., and which registers or addresses are involved.

Finally, the operation is executed. The ALU or the corresponding unit performs the calculation or data movement, and the result is usually stored in a register or in memory. If the program flow needs to be altered, for example with a conditional jump, the program counter is updated with a new address. That set of instructions, data, and jumps It is the one that ends up forming loops, functions, conditionals and all the logic of our programs.

In simple processors, everything happens linearly and sequentially. But in modern CPUs Many of these stages are overlapped using parallelism techniquesThe goal is for each clock cycle to be doing as much work as possible, and for the hardware not to be idle.

Parallelism, channeling, and out-of-order execution

To avoid wasting the watch, the designers introduced the instruction pipeliningThe data path is divided into several stages, similar to an assembly line. While one instruction is being decoded, the next is already being fetched from memory, and yet another may be executing in the ALU.

The problem is that sometimes one instruction needs the result of another that hasn't finished yet. This creates data dependencies and forces the introduction of bubbles or waits in the pipeline. To minimize these delays, techniques such as operand forwarding, branch prediction, and later, the out of order execution, in which the processor internally reorders the instructions as long as the final result of the program is respected.

The next step was superscalar design: equipping the processor with several execution units of the same type in order to be able to issue multiple instructions per clock cycleprovided there are no conflicts between them. An internal dispatcher analyzes the flow of instructions, detects what can be executed in parallel, and distributes them among the different units.

All these Tricks They are included within the so-called instructional parallelism (ILP)The practical limitations of these techniques and the increasing difficulty of further increasing clock speeds without significantly increasing power consumption and heat meant that, at a certain point, manufacturers also began to invest in... task-level parallelism: multiple threads and multiple cores per chip (and mechanisms such as the parking for cores).

This is how they are born multicore processors and architectures with hardware multithreading, where each core can maintain the state of several threads of execution and quickly switch between them to make better use of internal resources while some threads wait for data from memory.

The role of clock frequency and its physical limits

Returning to the clock, it's important to note that the signal that synchronizes the processor is, ultimately, a electrical signal that propagates through the chipAs frequencies increase and the number of transistors grows, maintaining that signal perfectly aligned throughout becomes very difficult. Clock distribution, phase shifts, and signal integrity problems arise.

On the other hand, each clock transition causes numerous transistors to change state, even if a certain area of ​​the processor isn't doing anything useful at that moment. This translates into energy consumption and heat dissipation simply to keep the metronome running. To alleviate this, techniques such as clock gating were introduced, which selectively turns off the clock signal in unused blocks, reducing energy consumption.

  How to Uninstall Norton on Windows, Mac and Android

However, beyond a certain threshold, increasing the frequency ceases to be reasonable: problems with consumption, temperature, and clock distribution skyrocket. That bottleneck This is one of the reasons why the idea of ​​dispensing, totally or partially, with a global clock has been explored: this is where asynchronous or “clockless” designs come into play.

In an asynchronous design, instead of having a single clock that marks There for the entire chip, It is the data and control signals themselves that synchronize the operationsThe blocks communicate using request and acknowledgment protocols (handshaking): when data is ready, the producer notifies the consumer, and the consumer reacts without waiting for a fixed clock edge.

They have been built fully asynchronous processors Compatible with known instruction sets, such as the ARM-based AMULET family or MIPS-derived projects. There are also hybrid designs, where only certain units (for example, a specific ALU) operate without a global clock, while the rest of the processor remains synchronous.

What do we mean by clockless security?

When talking about clockless security, two ideas are mixed: on the one hand, asynchronous design as technique to reduce consumption and heatOn the other hand, there is the implication of dispensing with the clock when analyzing, monitoring and protecting the system's behavior against attacks or failures.

In synchronous systems, many security and monitoring tools rely on the existence of a stable and predictable temporal rhythmIt is relatively easy to count cycles, measure how long a certain operation takes, or try to detect anomalous behavior by measuring variations in times that should be constant.

In an asynchronous or partially clockless system, these rigid time references become diluted. The execution time of an operation can depend on the actual availability of data, congestion on certain internal routes, or minor physical variations. From an attacker's perspective, this can make more difficult to mount timing-based side-channel attacksbecause the global clock that serves as a common reference disappears.

However, this same dynamic nature also complicates matters for anyone wanting to observe and audit the system internally. Many probes and hardware counters are designed to operate based on clock cycles; without a clear global clock, measure performance and detect suspicious activities It then requires other metrics and mechanisms.

Furthermore, the asynchronous design, by being freed from the clock, allows data paths to be activated at slightly different times in each execution, which potentially randomizes temporary leaks But it could also open other doors, for example in the form of different and more complex energy consumption patterns that could be exploited by power analysis attacks.

Data representation, word size, and security

Another important factor related to CPU architecture is how it represents and handles data. Almost all modern processors use binary representation, with voltage values ​​corresponding to 0 and 1. The word size (8, 16, 32, 64 bits…) determines the range of integers that can be handled directly and the amount of addressable memory.

From a security standpoint, word size affects the address space and the probability of collisions, overflows, and pointer errorsA 32-bit system with 2^32 possible addresses has very clear limitations compared to a 64-bit system. Furthermore, many modern protection mechanisms, such as certain protected memory extensions, rely on having a large address space.

The use of MMU and address translation also introduces an extra layer between the program and physical memory, something crucial for isolate processesImplement virtual memory and protect the kernel. In asynchronous contexts, the coordination between these translations and hand signals between clockless blocks must be very well designed to avoid creating security holes or race conditions.

In turn, vector extensions (SIMD) and floating-point units allow working with large volumes of data in parallel. This is a double-edged sword: on the one hand, It accelerates cryptographic algorithms and analysis tasks.On the other hand, if exploited maliciously, it provides a large computing capacity to break weak ciphers or launch brute-force attacks.

In a clockless or partially asynchronous scenario, the way these parallel computing units are programmed and protected must take into account that Execution and consumption patterns no longer follow a fixed rhythm dictated by the clock.but will respond to the real dynamics of the data, which also influences the design of countermeasures against side channels.

Massive parallelism, multithreading, and vectors: impact on clockless security

Modern processors aim to increase performance not only by raising the clock speed, but also by running more work in parallel. This involves multiple cores, hardware multithreading, and vector units capable of processing multiple data points per instructionAdded to all this is the rise of specific accelerators such as GPUs, DSPs, or TPUs.

From a security perspective, each new execution block and each new level of parallelism is an additional surface to protect. Coordination is necessary. cache consistency, shared memory management, mutual exclusion mechanisms and avoid race conditions and information leaks between threads or concurrent processes.

  Oh no! Has your battery suffered permanent failure and needs to be replaced?

In clockless or hybrid environments, this coordination relies more on communication protocols between blocks than on global clock cycles. For example, a kernel might use signals of request and recognition to access memory or a shared resource, and the effective delay will depend on the actual traffic at that time, not on a fixed number of cycles.

This behavior, viewed from the outside, makes certain attacks that rely on very precise time measurements based on the number of clock cycles difficult. But at the same time, security designers have to go beyond cycle counting and rely on... event counters, traffic measurement, energy consumption and other signs to detect suspicious behavior.

That's why many manufacturers integrate hardware performance counters, which allow real-time monitoring of things like cache misses, failed branch predictions, specific memory accesses, etc. When used correctly, these counters are a powerful tool for both optimizing performance and... find anomalous patterns characteristic of malware or advanced exploits, even in partially asynchronous architectures.

Virtualization, vCPU and isolation in modern environments

Another key ingredient in today's landscape is virtualization. In the cloud, we constantly work with Virtual CPUs (vCPU), which are logical fragments of processing capacity allocated to Virtual machines or containers on top of shared physical hardware.

Each vCPU is essentially a set of threads or execution times that the hypervisor schedules on the physical cores. For this to work well, the physical CPU offers special privileged modes that allow hypervisors to create and isolate Virtual machines, intercept certain sensitive instructions and manage the memory of each guest without them being able to interfere with or spy on each other.

In this context, clockless security implies that the allocation of CPU time between virtual machines depends not only on a uniform clock, but also on more dynamic planning mechanisms supported by the hardware. The hypervisor still sees clock cycles, but how those cycles are converted into effective work on each core can be altered by internal asynchronous blocks.

From a security standpoint, this necessitates the design of monitoring tools that do not simply count ticks, but can also interpret performance counters, usage statistics, and low-level events. detect resource abuse, virtual machine escapes, or irregular patterns that point to an intrusion.

Furthermore, in compute-intensive environments, where vector units, GPUs, and other accelerators are fully utilized, security managers must consider that these blocks, whether synchronous or asynchronous, can become tools for accelerate crypto attacks, mining cryptocurrencies behind the user's back or perform analysis of large volumes of stolen data.

Performance, power consumption, and overclocking versus a design without a clock

Finally, we must consider the relationship between performance and power consumption. Increasing the clock frequency through overclocking (for example, by performing a Stability test with OCCT) allows a CPU perform more operations per secondHowever, this significantly increases power consumption and temperature. In fact, many current processors already dynamically adjust their frequency and voltage based on workload and internal temperature.

Asynchronous designs offer an alternative: instead of using a very fast clock and trying to keep everything in phase, They let each block function at the pace dictated by the data.During periods of low load, inactive parts barely change state, reducing consumption without the need for complex clock-based power management mechanisms.

From a safety perspective, less consumption and less heat is not just an environmental issue or a matter of electricity bills. It also means less stress on the components, less likelihood of failures induced by electromigration or current leakage, and potentially less exposure to attacks that attempt to exploit the system's behavior under extreme temperature or voltage conditions.

However, designing a fully asynchronous and secure system is not trivial. It requires very rigorous verification of communication protocols between blocks, race conditions, and intermediate states to prevent errors. non-deterministic behaviors that can be exploited by an attackerThe complexity of the design, the scarcity of mature tools, and the need for backward compatibility with existing software have meant that, for the time being, most commercial processors remain mostly synchronous with small asynchronous islands.

The combination of all these factors—internal architecture, clock management, parallelism, virtualization, and power—makes security in environments without a global clock a delicate balance. Asynchronous designs mitigate certain timing-based attacks and facilitate highly refined power-saving strategies, but they also present new challenges for monitoring, auditing, and verifying hardware behavior, so the key lies in integration. robust observability and isolation mechanisms from the silicon itself to the highest level software.

How to see the CPU temperature in Windows 11 without installing anything
Related article:
How to check CPU temperature in Windows 11 without installing any programs