Introduction: The Emergence of Chiplet-based Data Processing Unit (DPU) Architectures

Cloud Service Providers (CSPs) are in the process of revamping their data center architectures, driven by the emergence of optimized processors for domain-specific workloads, especially Data Processing Units (DPUs). The DPU has emerged as an autonomous coprocessor and as a result has taken over many of the compute and infrastructure processing functions; for example, server virtualization, networking, storage, and security, that are traditionally performed by the CPU.  One of the major goals behind deployment of DPUs is to have server CPUs wholly dedicated to run application workloads. As a result, DPUs could potentially enable optimum performance for various cloud data center use-cases, including hyperscale, telco, enterprise and hybrid clouds with improved security and reliability, and improved return on investment (ROI).

DPUs have been traditionally implemented by using monolithic system-on-chip (SoC) architecture. While the shift to smaller silicon geometries has enabled larger SoC devices with big caches and many cores, but each new silicon technology generation requires a huge investment and results in significantly higher development and die costs. Consequently, developing SoC-based DPUs supporting the range of required use cases is becoming commercially unappealing, especially in light of several important industry trends.

Moore’s Law provided consistent advances in transistor technology over decades that enabled chip vendors to meet the demands of monolithic SoC architectures, but only lately Moore’s law has witnessed a historic slowdown. As a result, transitioning to the next node has become much more costly while offering minimal advantage. Relatedly, due to the significant increase in cost in architecting SoCs, the industry is seeing the emergence of chip designs using several small dies, or chiplets, rather than one monolithic die.

The use of chiplets for DPU architectures can enable the support of power and area goals while providing the flexibility and product modularity required to support different use cases in a single DPU package. A chiplet-based architecture can support an integrated DPU device which uses components developed using different silicon technologies while also providing innovative solutions that utilize best-of-breed components. As a result, the use of chiplets offers a cost-effective approach for providing DPUs that support a wide range of use cases.

SoC to Chiplet Transition Enables Moore’s Law Extension 

The benefits provided by Moore’s Law through enabling steady advances in development of complex SoCs such as DPUs are starting to diminish. Multiplying transistor density now takes three or four years instead of two. Each boost in density comes with a dramatic increase in wafer expense, generating modest or no decrease in cost per transistor, a key principle of Moore’s Law. Speed and power increases have also lessened with each new evolution in transistor density or node. In short, transitioning to the next node has become much more costly while offering minimal advantage.

Due to the exponential increase in cost of developing leading-edge SoCs, only the very biggest suppliers have been able to develop monolithic chip designs for CPUs and DPUs. To compensate for soaring design costs and boost manufacturing yields, leading vendors have instead begun to adopt designs. Intel and Marvell have introduced chiplet-based products while AMD and Amazon Web Services (AWS) employ chiplets in new EPYC and Graviton server processors respectively. However, most of such early design chiplets have been designed exclusively in-house.

Further cost savings can come from creating different (heterogeneous) chiplets using different manufacturing nodes, which is impossible in a monolithic design.  For example, DPU chiplet designs could segregate I/O functions into a separate die manufactured in an older node. Some logic circuitry, such as accelerators, may not need to run at the same maximum clock rate as the main processor and thus can be fabricated in an intermediate node. Using older process technology can reduce the manufacturing cost of those chiplets as well as optimize aggregate power consumption of the aggregate chiplet-based DPU design.

Die-to-Die Interconnect Standardization Enables Heterogenous Chiplet-based Designs.

Heterogenous chiplet-based designs require standardizing die-to-die (D2D) interconnects so that chiplets from multiple vendors may be seamlessly integrated. Otherwise, each chiplet remains vendor-specific, which reduces the economic advantage of disaggregating the design. 

Over the past few years, a broad range of cloud computing and semiconductor industry stalwarts have introduced open-source designs for die-to-die interconnects between chiplets, thus reducing costs and fostering a broader ecosystem of validated chiplets. In 2019 the Open Domain Specific Architecture (ODSA) subgroup within the Open Compute Project introduced a Bunch of Wires (BoW) die-to-die interconnects for providing a standardized connection between chiplets such as processor DPUs, and cores, memory, and I/O, that operates like on-die connections.

Similarly, major semiconductor companies, including Intel, AMD, and Arm, have introduced the UCIe chiplet interconnect standard based on existing PCIe and CXL protocols which will also support latency and bandwidth requirements for rack-scale designs.

Optimum Support of Data Center Use Cases Requires Disaggregated Chiplet-based DPUs 

Cloud operators can potentially reap tremendous benefits from DPU support for bare metal virtualization, while controlling and provisioning servers with isolation and security from their tenants.  DPUs can also possibly enable bare-metal data centers where the entire server hypervisor is offloaded to the DPU.

By accelerating network services such as Open Virtual Switch (OVS) and virtual router (vRouter) functions, the DPU could allow data centers to also support Network Function Virtualization (NFV) and a range of additional security, filtering, and analytics use-cases.

The DPU could also facilitate efficient pooling through permitting significantly higher utilization while offloading various networking, storage, and security functions. For example, through enabling pools of processors, pools of AI/GPU clusters and pools of storage, allows cloud operators to dynamically assign resources based on the specific AI application need; for example, inference or training.

DPUs with embedded hardware-based security processing may provide East-West firewalls to every server in the data center to meet the zero-trust imperative. DPUs may also be beneficial for offloading of inline or lookaside encryption security using IPsec and SSL/TLS for encryption/decryption of data-in-motion and data-at-rest.

While monolithic SoC-based DPUs may address the data center use cases discussed above, the time and cost required for designing such devices (which are only applicable to smaller market segments as compared to general purpose CPUs) can be very demanding. It requires deep expertise in networking, storage, and security protocol domains. In addition, the investment required to fabricate traditional monolithic architecture based DPUs using more sophisticated process nodes can be astonishing.

Developing DPUs using chiplets with smaller die sizes can drastically cut semiconductor development times while lowering manufacturing costs.  DPUs built using chiplets also provide additional major benefits. Chiplet-based DPUs can utilize best-of-breed chips that incorporate components from multiple vendors, each with domain expertise in specific use case areas (e.g., processors, networking, security, and storage). As a result, cost and time-to-market for DPUs for specific use cases can be optimized through using smaller chiplet dies, different process nodes for specific chiplets as appropriate with a strong emphasis on core competency.

DreamBig Semiconductor: Next-Generation Chiplet-based Disruptive DPU Architecture

The data center market is facing a strategic challenge as it attempts to meet the requirements for democratizing development and manufacturing of chiplet-based DPU silicon. Driven by standardized connectivity between chiplets, there is a critical need for an open ecosystem in which chiplets remain interoperable across different vendors while enabling support for complete end-to-end specifications to meet aggressive customer DPU use-cases. This is exactly where DreamBig Semiconductor, with its proven architecture and validated chiplet ecosystem, brings significant value.

DreamBig Semiconductor (DBS) is a pioneer in utilizing the chiplet approach to dramatically reduce the cost and time required for developing disaggregated DPUs for the full range of data-intensive use cases. DBS accomplishes this through offering chiplets for differentiated DPU functions, such as computer, networking, storage, and security functions, based on leading-edge process technology and through leveraging an established partner ecosystem of chiplet vendors for standard, common functions. As a result, customers can rapidly and cost-effectively develop chiplet-based DPUs which can optimally meet the needs of their specific data-intensive use cases.

The DBS chiplet-based DPU architecture will provide end-users with greater flexibility, open new frontiers for component reuse and enable innovation on price, performance, and power consumption across the full continuum of DPU use-cases.  As a result, the DBS DPU architecture will allow users to bring together design IP and process technologies from an established ecosystem of vendors enabled not only by standardization of Die-to-Die (D2D) interfaces, but also through interoperability across different vendors and foundries while at the same time supporting multiple process nodes (both leading-edge and established) and packaging technologies.

Towards that goal, DBS is also focused on enabling a best-of-breed partner ecosystem of chiplet manufacturers to offer DPU users standardized, interoperable hardware as a critical priority. This ecosystem supports complete end-to-end specifications including protocols, packaging, testing, and manufacturing, to meet aggressive and diverse customer time-to-market, cost, and use-case requirements.

Saqib jang