Apple may integrate new custom ARM cores into future Macs, but x86 swap still unlikely
When Apple launched its new MacBook Pros with the new Touch Bar, they also included a new ARM core, dubbed the T1, to handle the Touch Bar screen and certain other aspects of the system. Now, a new report claims that Apple will include more ARM silicon in future products.
Apple is working on a new ARM part, codenamed T310, that would take over some of the system’s low power mode functionality, according to Bloomberg. Currently, Apple’s Power Nap feature is handled by x86 processors, and is roughly analogous to Microsoft’s Connected Standby mode. Power Nap allows a system to check email, install necessary updates, and sync its calendar with the laptop closed and the system ostensibly powered off. Pushing standby connectivity without impacting battery life has been a major focus of both ARM and x86 vendors over the last decade, and we’ve seen huge gains in standby battery life as a result.
Offloading OS functions to an ARM CPU in low power mode would intertwine x86 and ARM cores together in the same system much more tightly than is currently the case. The T1 chip in the latest MacBook Pros doesn’t just run the Touch Bar — it also handles Touch ID and Apple Pay and has an image processing sensor built into it. Instead of reprogramming Apple Pay and Touch ID for macOS, Apple put an ARM core running watchOS in the MacBook Pro.
There’s a tempting line of argument here. First, Apple introduces a new CPU core to run a specialized product while still using x86 chips. Next, Apple introduces a low-power ARMcore to take over specific functions while the x86 chip is retained for high-end work. Finally, Apple removes the x86 chip and shifts entirely over to ARM.
It’s not a crazy idea, and it could still happen one day, but (again), it’s probably not happening any time in the next 18-24 months. Apple’s ARM CPU cores have improved dramatically — much more quickly than Intel’s — but there are still some significant differences between them.
CPU design and specialization
Over most of the last decade, Apple and Intel have pursued completely different CPU philosophies. Apple started with a stock ARM CPU core and created their own high-performance ARM-compatible CPU with the highest IPC of any ARMv8 chip currently on the market. Intel, meanwhile, took its high-performance x86 chips and simultaneously sought to improve their performance and their power consumption. The company has achieved both of its goals, but progress has slowed of late, and recent x86 chips have only been modest improvements on their predecessors. This lack of progress is part of what has fed the narrative of a stagnant Intel, vulnerable to disruption from Apple’s own product stack.
The principle problem with this argument is it assumes a high-power multi-core CPU is constructed more-or-less identically to a low-power, dual-core CPU. This is not true. Intel has implemented a sophisticated DVFS (Dynamic Frequency and Voltage Scaling) system inside its processors to ensure they can hit a variety of clock speeds and voltages. It has sophisticated on-die microcontrollers that manage clock speeds for each individual core, and it uses technologies like SpeedShift to further optimize how quickly its CPU cores can move in and out of idle states.
Anyone can design a high-performance CPU core that uses tons of power. Designing a high-performance CPU that doesn’t is significantly more difficult. Transistor layouts, design, and leakage currents are all different for high-performance chips compared with low-power chips. DVFS itself isn’t always easy to implement — ARM’s big.Little approach, which Apple mimicked with the A10 Fusion, was designed as a way to give ARM vendors access to lower power cores without forcing them to implement DVFS in the first place. It’s no accident that Intel’s pivot to address the low-power market post-Sandy Bridge also marked the end of its aggressive CPU performance improvements. The company has been trying to simultaneously improve performance and slash power consumption to fit into extremely aggressive power envelopes, and that’s no simple feat.
Consider the Core i7-7560U against the Core i7-2677M (Intel’s top-end SNB mobile CPU in the 17W bracket). Base clocks on the newer chip are 35% higher, while the boost clock is 31% higher. The Core i7-7560U’s maximum TDP is actually 12% lower than the Core i7-2677M, and it supports a configurable low TDP option that the Sandy Bridge-era processor didn’t offer. It supports 4x more memory, offers 1.6x more memory bandwidth, can exit and enter sleep states more quickly, supports technologies like SpeedShift, and offers 64MB of onboard EDRAM to boost the CPU’s integrated graphics.
The rate of progression we see here is still much slower than what used to be normal in the golden age of Moore’s law, but the gains have been much larger than on the desktop side. Factor in improvements from architecture, and the Core i7-7560U should be 40-50% faster than its Sandy Bridge counterpart, while its on-board graphics could easily be 2-3x faster. The reason enthusiasts tend to miss this level of improvement is because most enthusiasts are desktops or high-end laptop users, and the gains at these TDPs have been much smaller.
Adding CPU cores also adds complexity. Multi-core CPUs use a variety of strategies to maintain what’s known as cache coherency. If multiple CPUs have a copy of the same data residing in cache and one CPU updates or invalidates that data, all of the other CPUs must either be updated or instructed to invalidate the cache line. The more CPUs you have, the more complex this management process is. If this process isn’t handled properly, coherency requests will saturate cache bandwidth and hamper multithreaded scaling. In severe cases, performance will actually get worse as cache bandwidth is saturated by coherency traffic.
Historically, Apple only moves to replace one CPU architecture with another when it can make that change across its entire product line while picking up significant performance from doing so. The fact is, Apple sells significantly more Macs now than it ever did when it was a PowerPC company — and the obvious implication of that is Mac owners likeknowing they can run both Windows and x86 software, even if they rarely have to do so in practice.
Apple believes it can achieve better standby power by shifting certain OS functions to an ARM core, while the system still runs on an x86 chip when awake. Assuming the company follows through and launches this chip, the big thing to watch for is whether Apple starts shifting awake functionality over to a separate CPU. Ramping up an ARM chip that could run general-purpose system code natively would be a clear sign that Apple wasn’t just porting a few capabilities over to a different architecture but could be prepping a full swap from x86 to ARM. Right now, the Mac Pro is still a potent argument for why Apple may not do this — building a 12-22 core ARM chip is no simple task — but if the Mac Pro continues to languish or is canceled altogether, it could be a sign that Apple has chosen to leave certain markets to make the transition easier.