Speculative Execution, SPECTRE and why you should care?

Abhinav Agarwal
3 min readDec 21, 2019

--

The views presented here are solely based author’s personal opinion, research, observations and are in no way affiliated to any organization.

Processors today are reaching limits of clock frequency. There was a time when CPU frequency used to increase dramatically over the years. In the 1990s from a few Mhz clock frequency, we have reached a few GHz in less than two decades. However, clock speed has stopped increasing. From Moore’s law, transistor size is shrinking, allowing us to put more transistors in the same area, but due to Dennard scaling, the power consumption of these transistors has not reduced at the same pace. It is becoming increasingly infeasible to cool chips at higher frequencies of operations. Even Moore’s law is about to hit its limit. When transistor size reduces below 5nm, the quantum tunneling effect kicks in, which creates a whole new level of complexity.

The research is getting more focused on getting more parallelization. The number of cores per chip is increasing, Instruction Level Parallelism (ILP) is improving, CPU pipelines are getting sophisticated, cache hierarchy is getting deeper.

Sandy Bridge Microarchitecture

A modern superscalar processor has multiple execution ports, each having a deep pipeline. Since each part of the chip performs different operations, most of the chip might remain unused. To alleviate this situation, modern processors deploy various tricks like out-of-order execution, Simultaneous Multi-Threading (SMT), speculative execution, etc. This allows the CPU to achieve Instruction Level Parallelism (ILP). A single core can be executing several loads, store, additions, etc simultaneously.

Enter Speculative Execution

It is a technique where the CPU executes instruction speculatively. A common example is prefetching, where the CPU speculatively prefetches cache lines to leverage spatial locality of access. Another big challenge in achieving
higher ILP is branching and data dependencies. CPU tries to predict the branch ahead of time and starts executing that branch. If later, the prediction is proven to be incorrect, then instructions are rolled back.

Speculative execution doesn’t affect the state of CPU, but the contents of Translation Lookaside Buffers (TLBs) and caches. Consider the following harmless piece of code:

Function foo accesses the array by doing proper bound checking. This should be safe right? Let’s say we call foo with valid indexes, so branch predictor is trained to predict (1) as true. Now, if we invoke foo with an invalid index (> 1024), the speculative execution will kick in. This can bring data to cache, that might ordinarily be inaccessible. When the branch is evaluated, the CPU register state is rolled back (as the prediction was incorrect), but the cache state remains changed. This is the gist of SPECTRE. This type of bound check bypass vulnerability can be mitigated by inserting a barrier (serializing instruction) like LFENCE or CPUID instruction on x86 architecture. Such serializing instructions ensure that no later instruction will execute, even speculatively, until all prior instructions have completed.

JIT engines used for JavaScript were found to be vulnerable to SPECTRE. A website can read data stored in the browser for another website, or the browser’s memory itself. Most new intel CPU's are vulnerable. Various patches have been released in Linux kernels to circumvent these exploits like restricting speculative execution, kernel page table isolation, etc. Performance hit due to these mitigations has been up to 30% for some workloads.

Phoronix tests across various intel platforms including Broadwell, Skylake, Skylake-X, etc show about 15% of performance regression with spectre/meltdown patches applied.

Reducing the performance penalty

For systems that are well protected, and performance is critical, these protection mechanisms can be disabled in parts or whole. This red-hat article guides on the method of disabling these patches to avoid the performance penalty.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Abhinav Agarwal
Abhinav Agarwal

Written by Abhinav Agarwal

I have a particular interest in developing high-performance systems which require in-depth knowledge of computer systems and networks.

No responses yet

Write a response