CDI-Info/23 at main · vaj/CDI-Info · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

Hello.My name is Cheolmin Park, and I'm honored to be here to give a presentation about CXL and UCIe that are coming up as a new standard for the industry.I'm working for Samsung, and I am driving new business planning team.Thank you.

 Let me talk about Compute Express Link, aka CXL, which is a backbone driving the era of heterogeneous computing.

The fast-growing workload of AI.We all know that, and then pleasantly surprised that the workload that we are running AI, and the data set of AI is exploding.It requires smart devices, and the hyperconnected networks, and the super intelligent services to be able to deal with explosion of the data, and the explosion of the market's need, and the people's expectation of what the smart computing needs to do in solving our problem.

So AI is not only the workload, but it is a driving force of changing our system architecture for the future.In a way, we used to have CPU as a central device that takes care of all this compute, and the connectivity of the subsystem, among memory, storage devices, accelerator, and network cards.Now, we are at the tipping point of changing the structure of architecture that used to be dominated by CPU into the situation that interface is going to be an essential and critical piece that connects everything.Among CPU and the memory that could be desegregated, and accelerators that could be software defined, and the storage pools that we all wanted to take care of those workloads and provide services for diversified demands, as well as connect them to the network.In a way, these infrastructures are specialized in handling, ever exploring the demand for the AI, artificial intelligence, and the machine learning capabilities.

As we talked about, the fabric and interconnect of subsystem is becoming a central piece in being able to handle the workload that are coming up as a demand from customers, and people, and industry.In a way, from a very low level, in a package, the way we build the SoCs, differently from the traditional way, by having multiple different heterogeneous SoCs in one package.And the different packages connected via CXL, like Fabric, that is cache-coherent and very flexible in handling I/O workload as well.And which can be scaled up as a level of data center, scale interconnect.And we are hoping those kind of very flexible and scalable concept can be applied to connect in between data center or customers and us as users of those computes.

CXL is quickly becoming the de facto interface as a solution for the era of artificial intelligence and the new workload and use cases.By having different protocols in one unified interface, cache coherency, which has been the connection and the protocol in between among CPU cores, and the point-to-point that has been the connection in between different sockets of CPUs, and then CXL provides byte addressable capabilities.That is very critical for existing software programming model, as well as scalable in handling, exploring the data size, and low latency.These are key features of CXL interface that is enabling us being able to handle and provide compute capabilities for the exploring the AI workload.

One of the first implementation of CXL technologies is coming into from the implementation of if we can attach traditional memory with a CXL interface.The benefit of doing it is that we are providing bigger capacity of CXL and bigger capacity of main memory and the higher bandwidth of memory, and the more flexible and manageable the pool of memory to different computational engines need that are CPU and GPUs and smart NIC and accelerators.We are now seeing and about to become a mainstream of how the memory gets connected with different compute engines via CXL's capability.

CXL is providing very unique capability to CPU.In a way, one CPU used to have a limit of amount of terabyte that can be connected to one CPU.One of the most recent implementation of memory that was enabled by CXL Fabric can enable us to provide the double the capacity of the memory space to one CPU.The example that we see in the picture is from 8 terabyte per CPU of one of the most recent implementation of CPU and DIMM design to 16 terabyte per CPU of memory can be provided with a combination of DDRs and the CXL attached memory.This is significantly increasing the system memory capacity and it is providing twice the bandwidth of memory access because CXL is connecting the memory through CXL ports.And then CXL has a potential to provide us better reliability, availability, and serviceability and security because CXL controller and specification is uniquely defined to provide those functions.Sometimes, bigger the capacity of the DRAM that you have in the system, the bigger the problem can arise because the DRAMs are not designed to provide that much of capacity.But the way CXL manages those big pool of DRAM capacity, we can stay assured such a way DRAMs are going to be very reliably serviceable to workload.

CXL is beyond the hardware implementation because CXL provides flexibility in attaching the memory.And the CXL provides different way of tiear the memory spaces that CPU can manage or can take advantage of.So there are many different efforts from the industry to provide the best experience in using CXL and the DRAM heterogeneous memory provided to one CPU.One of the implementation is we see that this-- from this slide is that SMDK, Scalable Memory Development Kit that provides unified software interface to manage memory subsystem that is enabled by DDR, which is traditional and conventional way that we can provide the memory space to CPUs, and plus CXL attached memory.Memory SMDK is developed and optimized for heterogeneous memory subsystem.In a way, it provides very easy way to integrate heterogeneous memory devices.And it also provides very intelligent tiering, such a way application can choose different workloads into different memory space in between DRAM and then CXL attached memory.And the SMDK is aware where the memory is allocated and then how the memory gets used, how the bandwidth is provisioned.So it has a potential to accelerate the data service.And it is open source effort.And the SMDK is developed to provide better experience of CXL-based memory module and industries early access to the CXL technology.

Let's move on to Universal Chiplet Interconnect Express, which is one layer below of CXL that provides the way that we can build open silicon ecosystem on package innovations.

Gordon Moore, who was inventor of Moore's law, even had predicted 60 years ago it may prove to be more economical to build large systems out of smaller functions that are separately packaged and interconnected.

We've enjoyed monolithically implemented silicon space past 50 years.Now we are transitioning ourselves into the ways that we can assemble our architecture and implementation of functions done via different chiplets and then advanced package technologies.And one of the most recent effort that industry put themselves together is called UCIe, Universal Chiplet Interconnect Express.It is an open platform on a package that enables very high speed and standardized chip-to-chip interface.And the ways that can be demonstrated to provide cookbook of how to best integrate different functions and different silicons into a product space.Easier and quicker than ever.The picture on the left shows sea of cores that are all heterogeneous, CPUs and GPUs.And the stacked memories and I/O devices can all be integrated on package and implemented by UCIe standard specification.And this one enables all of us to provide more functions.Sometimes they can be easily limited by the limit of reticle size.And then provide a better time to market.And then which ends up resulting as a lower and might be lowest portfolio cost.

Only the imagination is your limit.And then the specification of the UCIe was thought out and built from the get-go to provide the best cook-- the way that we can assemble different functions and the silicons on a package.In a way, it provides different performance and very flexible bandwidth density.And optimized for the best energy efficiency and the latency.And it provides different ways that we can drive shorelines in a way best reliable and the best available.And the cost in mind, in a way, standard driven, the ways to implementation is going to provide you the best economy of scale.And supply chain issue can be very easily managed because all silicon pieces are done through the standard interface specification.Such a way, your backend verification and validation can easily be done through the standard and certified way.

So UCIe was initiated by Intel.In a way, Intel started the consortium and specification by donating the initial specification.The focus of UCIe 1.0 specification is physical layer and the protocol that supports CXL and PCIe for near-term volume attach.And then if we follow through UCIe specification, then you as the implementer and us as solution provider are going to have the ways done through standard in building CXL and PCIe connected chiplets.And it's just starting of the journey.And then we are hoping and expecting other protocols are going to come along.For example, now the UCIe is better serving I/O space connectivity.But significant players at the memory space can drive memory specifications into a UCIe standard.In a way, most innovative way of putting the packages together.From the current implementation of putting silicon side by side, evolving into 2.5D, which is mixing and matching of different dyes at the different height on the package, or even directly overlaying different silicons on top of the other silicon, which is 3D technology and integration.

The usage model supported by UCIe is almost limitless.Because the flexibility that UCIe provides is connecting pretty much all different silicon architectures into one standard.Memories can be attached with CPU core tiles.CPU core tiles can be attached with I/O devices.And CPUs can be integrated with GPUs on a same package.And UCIe is even defining the ways that silicon components can be connected with optical interface.It covers a very wide range of usages, from handheld to high-end servers.Your cell phone can have a silicon package implemented by UCIe specification.And the scales all the way up to within the rack of data center, or in between the racks of data center's server systems.Even between aisles that are going to be connected by optical interface that offered by UCIe standard.

It is an exciting moment, because 10 different industry giants are putting their efforts together, taking initiatives, and then driving the standard.

And then summary, that what the UCIe can offer is that leaders in semiconductors and packaging, IP suppliers, founders, and cloud service vendors are putting efforts and initiatives together to drive silicon technology for the future, and to accelerate it.In a way, experience of putting those functions together is going to be easier than ever.As we speak, UCIe 1.0 specification is ratified and published.New open standard establishes an open chiplet ecosystem, and ubiquitous interconnect at the package level.And UCIe just started, and then the consortium is welcoming anyone who are interested in participating and influencing, shared influence toward the future of the industry, the ways to go.Thank you.