The bug allows attackers to swipe data from a CPU’s registers. […] the exploit doesn’t require physical hardware access and can be triggered by loading JavaScript on a malicious website.
What are the rules on responsible disclosure? Shouldnt they have waited until patches are ready before public disclosure of the exploit?
I mean, this was disclosed to AMD a few months back and there actually is a patch available currently for Epyc CPUs.
It’d be nice if they waited until all the patches were out, but I’d rather this than a full zero-day exploit of this scale in the wild.
It’s very weird it takes them so long to fix this for consumers tbh. You’d think they could just take the snippet from Epyc and patch it into AGESA, since it’s exactly the same architecture. December is hardly acceptable for a critical vulnerability like this.
This is a great opportunity to remind people the NoScript, HTTPS-only modes and filter lists for malicious websites (to use in your adblock of choice) exist. Use them.
Unless it was exposed to a zero day, in which they need to publicize the problem immediately and provide a timeline
I’m curious - does this kind of report make people less likely to go with an AMD cpu? The last time I was thinking about building a new pc, AMD had just definitively taken the lead in speed per dollar, and I would have gone with one of the higher end chips. I’m not sure whether this would have affected my decision, but I’d probably be concerned with performance degradation as well as the security issue. I’d have waited for the patch to buy a system with updated firmware, but Od still want to see what the impact was as well as learn more about the exploit and whether there were additional issues.
I ended up just getting a steam deck and all of my other computers are macs, so it’s hard to put myself back into the builder’s/buyer’s headspace.
does this kind of report make people less likely to go with an AMD cpu?
I doubt it, since Intel has its share of similar CPU security issues. For several generations the understanding has been that Intel’s CPUs arrive with impressive performance on day 1, then gradually leak that performance away as security issues are patched over subsequent months.
Thanks! That’s exactly the kind of insight I was hoping for.
Honestly no. Remember Spectre & Meltdown vulnerabilities back in 2018? Yeah that security bug that only affected Intel CPUs until it was patched seriously told consumers and enterprise customers to “please turn off hyperthreading” to prevent exposure. Fucking LOL. Voluntarily cut my CPU performance in half!? Based on a theoretical exploit that was only found in a very specific and controlled environment before everyone started FREAKING out?
Neither spectre nor meltdown are specific to Intel. They may have been discovered on Intel hardware but the same attacks work against any system with branch prediction or load speculation. The security flaw is inherent to those techniques. We can mitigate them with better address space separation and address layout randomization. That is, we can prevent one process from reading another process’s data (which was possible with the original attacks), but we can’t guarantee a way to prevent malicious browser tab from reading data from a different tab (for example), even if they are both sandboxed. We also have some pretty cool ways to detect it using on-chip neural networks, which is a very fancy mitigation. Once it’s detected, a countermeasure can start screwing with the side channel to prevent leakage at a temporary performance cost.
Also, disabling hyper threading won’t cut your performance in half. If the programs that are running can keep the processor backend saturated, it wouldn’t make any noticeable difference. Most programs can only maintain about 70-80% saturation, and hyper threading fills in the gaps. However the result is that intensive, inherently parallelizable programs are actually penalized by hyper threading, which is why you occasionally see advice to disable it from people who are trying to squeeze performance out of gaming systems. For someone maintaining a server with critically sensitive data, that was probably good advice. For your home PC, which is low risk… you’re probably not worried about exposure in the first place. If you have a Linux computer you can probably even disable the default mitigations if you wanted.
Well TIL!!
I’ve been out of the builder world for long enough that I didn’t follow the 2018 bug. I’m more from the F00F generation in any case. I also took a VLSI course somewhere in the mid-90s that convinced me to do anything other than design chips. I seem to remember something else from that era where a firmware based security bug related to something I want to say was browser-based, but it wasn’t the CPU iirc.
In any case, I get the point you and others are making about evaluating the risks of a security flaw before taking steps that might hurt performance or worrying about it too much.
I’m curious - does this kind of report make people less likely to go with an AMD cpu?
For me, nah. This is well within the vein of “normal” problems for a CPU these days (neither AMD nor Intel seem to be able to avoid this sort of thing 100%)… and this particular issue seems to be fixed in hardware already for their Zen 3 chips (Nov 2020-Sept 2022) and Zen 4 chips (Sept 2022 - Present).
I think the mitigations are acceptable, but for people who don’t want to worry about that, yes, it could put them off choosing AMD.
To reiterate what Tavis Ormandy (who found the bug) and other hardware engineers/enthusiasts say, getting these things right is very hard. Modern CPUs apply tons of tricks and techniques to go fast, and some of them are so beneficial that we accept that they lead to security risks (see Spectre and Hertzbleed for example). We can fully disable those features if needed, but the performance cost can be extreme. In this case, the cost is not so huge.
Plus, even if someone were to attack your home computer specifically, they’d have to know how to interpret the garbage data that they are reading. Sure, there might be an encryption key in there, but they’d have to know where (and when) to look*. Indeed, mitigations for attacks like spectre and hertzbleed typically include address space randomization, so that an attacker can’t know exactly where to look.
With Zenbleed, the problem is caused by something relatively simple, which amounts to a use-after-free of an internal processor resource. The recommended mitigation at the moment is to set a “chicken bit,” which makes the processor “chicken out” of the optimization that allocates that resource in the first place. I took a look at one of AMD’s manuals and I’d guess for most code, setting the chicken bit will have almost no impact. For some floating-point heavy code, it could potentially be major, but not catastrophic. I’m simplifying by ignoring the specifics but the concept is actually entirely accurate.
* If they are attacking a specific encrypted channel, they can just try every value they read, but this requires the attack to be targeted at you specifically. This is obviously more important for server maintainers than for someone buying a processor for their new gaming PC.
For some floating-point heavy code, it could potentially be major, but not disastrous.
That’s a really interesting point (no pun intended)
I had run into a few situations where a particular computer architecture (eg, the Pentiums for a time) had issues with floating point errors and I remember thinking about them largely the same way. It wasn’t until later that I started working in complexity theory, by which time I completely forgot about those issues.
a one of the earliest discoveries in what would eventually become chaos and complexity theory was the butterfly effect. Edward Lorenz was doing weather modeling back in the 60s. The calculations were complex enough that the model could have to be run over several sessions, starting and stopping with partial results at each stage. Internally, the computer model used six significant figures for floating point data. When Lorenz entered the parameters to continue his runs, he used three sig figs. He found that the trivial difference in sig digs actually led to wildly different results. This is due to the nature of systems that use present states to determine next states and which also have feedback loops and nonlinearities. Like most complexity folks, I learned and told the story many times over the years.
I’ve never wondered until just now whether anyone working on those kinds of models ran into problems with floating point bugs. I can imagine problematic scenarios, but I don’t know if it ever actually happened or if it would have been detected. That would make for an interesting study.
These would be performance regressions, not correctness errors. Specifically, some false dependencies between instructions. The result of that is that some instructions which could be executed immediately may instead have to wait for a previous instruction to finish, even though they don’t actually need its result. In the worst case, this can be really bad for performance, but it doesn’t look like the affected instructions are too likely to be bottlenecks. I could definitely be wrong though; I’d want to see some actual data.
The pentium fdiv bug, on the other hand, was a correctness bug and was a catastrophic problem for some workloads.
Thanks for the clarification!
I remember having to learn about fp representations in a numerical analysis class and some of the things you had to worry about back then, but by the time I ended up doing work where I’d actually have to worry about it, most of the gotchas had been taken care of so I largely stopped paying attention to the topic.