Access the latest quantum technology

Quantum technology in Bristol and bath - find out more about how you can access the commercialisation of quantum technology for sensing and security

Tuesday, August 22, 2017

Qualcomm details 10nm 48 core ARM chip for datacentre servers

By Nick Flaherty www.flaherty.co.uk

Qualcomm has shown details of its 10nm, 48 core ARM chip  at the Hot Chips conference this week, its first device for datacentre servers.

The Centriq 2400 SoC using the custom Falkor 64bit ARMv8 core with a 4-issue, 8-dispatch heterogeneous pipeline. This is designed to optimise performance per unit of power, with variable length pipelines that are tuned per function for maximum throughput.

Falkor’s out-of-order and rename resources are sized to prevent instruction retirement from being in the performance-critical path, allowing unbridled usage of the multiple execution engines. Other performance-critical elements of the micro-architecture, such as branch prediction algorithms and the cache hierarchy, are state-of-the-art for today’s server class processors. 

A range of power management techniques were designed into the core, including such mechanisms as independent p-state control for each of the CPUs and L2, with entry to and exit from low-power states controlled by hardware state machines for ultra-fast state transitions, and hardware state retention for power-collapsed sleep states with ultra-fast recovery.

The Falkor core duplex includes two custom Falkor CPUs, a shared L2 cache and a shared bus interface to the Qualcomm System Bus (QSB) ring interconnect. This modular building block serves as the foundation for the 48 core Centriq 2400 SoC design.

The shift from the private datacenter infrastructure to cloud computing services continues to accelerate. According to a recent IDC Cloud Server Forecast, more than 50 percent of servers sold by 2020 will be deployed for cloud computing services, and the processors that power those servers need to be optimized to address the demand for scalable performance with unique characteristics for cloud software and services.

Five years ago, the Qualcomm Datacenter Technologies team began to map out the strategy to enter the datacentre market. The key design point for the chip roadmap was to deliver 'right-sized' devices optimized for throughput performance and efficiency for emerging multi-core cloud workloads. Cloud services need to perform well in highly-loaded and multi-tenant environments, and the hardware platform needs to maximise aggregate compute performance while improving the cloud operator’s operational costs, largely driven by the cost of power and cooling.
To meet the demand for larger instruction footprints, falkor uses a new split instruction cache comprised of a single-cycle, low-power 24KB L0 I-cache complementing its 64KB L1 I-cache. The two caches are managed exclusively to provide a total of 88KB of low-latency I-cache. The core supports a 32KB L1 D-cache with a 3-cycle load-use latency. The L1 D-cache is augmented by a sophisticated multi-level hardware prefetch engine that dynamically adapts to system conditions.

The 48 Falkor CPUs are connected by a high-bandwidth and low-latency ring interconnect extending out to its large L3 cache and multiple memory controllers with new shared resource management techniques such as L3 Quality of Service (QoS) extensions and effective memory bandwidth enhancement via in-line and transparent memory compression. 

The Centriq SoC also supports the most robust form in the industry for secure boot: an on-die hardware-based immutable root of trust that authenticates firmware before the first line of firmware is ever executed. It also supports virtualised workloads with the full suite of ARM Execution Levels (EL0-EL3) and TrustZone secure execution environment, using the ARMv8 instruction extensions to accelerate cryptographic transform and secure hash operations needed for efficient performance when running networking security protocols such as https. The SoC also provides the RAS mechanisms needed to keep a datacentre running, such as fault isolation, reporting, and handling techniques.

The Centriq 2400 processor series is now sampling to key customers and is expected to be commercially available in the second half of 2017.

www.qualcomm.com

No comments: