Intel processor flaw could be a virtualisation nightmare
5 January 2018 | 0
2018 is off to a very bad start for Intel after the disclosure of a flaw deep in the design of its processors, dubbed Meltdown. And while the company has publicly said the issue won’t affect consumers, they aren’t the ones who need to be worried.
The issue is found in how Intel processors work with page tables for handling virtual memory. It is believed that an exploit would be able to observe the content of privileged memory by exploiting a technique called speculative execution.
AMD has minimal if any exposure and said so, despite Intel saying it is at risk. Even though AMD came up with 64-bit extensions, which Intel licenses, the two firms implemented their 64-bit architectures in completely different ways
Speculative execution exploit
Speculative execution is a part of a methodology called out-of-order execution (OOE), where basically the CPU makes an educated guess on what will happen next based on the data it has. It is designed to speed up the CPU rather than burn up CPU cycles working its way through a process. It is all meant to make the CPU as efficient as possible.
Intel has been mum on how long the problem has been around, but it is believed to date back to its move to 64-bit processors and the Penryn/Merom family of processors in 2006. Intel was first informed of the problem back in June 2017 by Google researchers, and Google kept quiet about it while Intel and the OS vendors addressed the problem. Google has since published its findings.
Fixing speculative execution flaw
All told, there are three variants of the problem, all of them unique to how Intel handles speculative execution, and all three can be fixed — but only through the operating system (OS). These errors are baked into the silicon. There is no replacing them — no BIOS update that will fix it. Only an OS fix will work. Linux distros are already rolling out fixes, and Microsoft is expected to introduce one in a future Patch Tuesday fix.
This can only be fixed by addressing the architecture of the CPUs. How long that may take is open to debate. Jim McGregor of Tirias Research said a design fix could add six to nine months to Intel’s roadmap, while Nathan Brookwood of Insight64 says two to four years. Intel was informed last June, but it is unclear if it was able to institute changes into chips on its 2018 roadmap.
Normally, the OS kernel and applications share address space in memory to optimise performance when the application makes OS calls. They have to switch page tables whenever an application calls the kernel and returns data. The solution is to preclude an application from sharing the kernel memory space. That is going to add a lot of overhead to every OS call. The fix means the kernel has to be loaded into memory and the application unloaded — and then vice versa.
The worst part is that this has to happen whenever there is an interrupt. What causes an interrupt? Well, let us start with I/O, like a disk read or write or network connections. Now, instead of keeping the OS kernel and application in memory, CPUs are going to load and unload one or the other. It will happen at CPU speeds, which is to say exceptionally fast, but it is still going to impact performance.
It also impacts any scenarios where the OS and an application talk to each other. Can you think of a more intensive situation than a virtualised server running dozens of VMs, each with its own OS instance, talking to the hardware through the hypervisor? Virtualisation is going to be hit the hardest by this.
How performance is impacted
How much impact? The Register estimates anywhere from 5% to 30%, depending on the task, while an open source site called Phoronix ran tests of patched Linux systems and put the hit at between 7% and 20% for things like databases, but virtually no impact at all on games. One analyst has mentioned anecdotal stories of Amazon Web Services (AWS) slowing down in the past week as the fixes are rolled out, but there is, as yet, little to back that up.
Intel said it has not seen any exploits in the wild and that the exploit only allows for reading the contents of memory, not altering it. But that is more than enough. The greatest threat is to multi-tenant scenarios where multiple AWS or Microsoft Azure customers have their VMs on the same CPU and one user is able to peek into the contents of another VM.
That is completely unacceptable to any customer. But so is a VM slowdown of 20% to 30%. This looks very bad for Intel’s year, and we are just three days into it.
AMD’s limited exposure
And here, perhaps, is real rub: AMD has minimal if any exposure and said so, despite Intel saying it is at risk. Even though AMD came up with 64-bit extensions, which Intel licenses, the two firms implemented their 64-bit architectures in completely different ways.
The difference is AMD’s chips do not do speculative loads if there is the potential for memory access violations. They do not load data beyond the branch point, so no predicting is done. Intel does the exact opposite. It is more aggressive in its use of branch prediction and it bit them.
IDG News Service