CDI-Info/194 at main · vaj/CDI-Info · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
YouTube:https://www.youtube.com/watch?v=muBTTEise6U
Text:
Okay. All right. So I think you did a great job  putting our presentations in order here, Frank,  because I think we're continuing to build on the story of CXL. Great job by Michael talking about some more of the benefits  and some more of the practical,  here's a real device and I've got a few more to show.

I want to talk about SMART a little bit. So SMART was started in 1988. I worked for SMART modular technologies. So we worked primarily on advanced memory products. So we are a memory company. We have other sister companies at Penguin and Stratus  who work more on a system level and then a Cree LED,  which is an industrial company. We're all under the umbrella of SMART global holdings. So it's SGH on NASDAQ and you can see  some of our financials and things there. But we won't spend too much time on that.

 I'm going to skip right over this. We've talked about this already before. We're talking about CXL memory expansion devices  and at this point.

So I think this is reiterating some of the points  that you may have heard, but I'll give my spin on it. So as we've heard already,  one of the big advantages of CXL  is that it can increase memory capacity quite easily. So you can imagine if you fully populate a server  with memory DIMMs, right? You will run out of space. And once you have populated all of those,  there you have to start adding additional CPU sockets  to increase the memory density  or you can start adding CXL devices, right? So that's kind of the key there. The added benefit or an added benefit  is that you can also potentially reduce  your overall memory costs. And I think we've seen one or two examples of that. And I have some further case study that we did  in one of the upcoming slides. But the gist of it is that when you go to 3DS  or TSV components, DRAM components,  you pay a price premium for those. And as I'm sure, as many of you know,  that's well-deserved, right? They're more difficult to make,  they're more difficult to test. And so obviously they're gonna be a price premium  involved with those. So for maximum kind of value on building a system  with maximum memory,  if you can avoid getting to those high density components,  you can really keep the memory costs down. So we'll get into a specific example of that. And then the other thing that I think we've touched on  in several other presentations as well  is that you can actually,  even though CXL has a higher latency,  primarily due to the addition of that controller  being in the interface between the CPU and the memory,  it's not direct CPU to memory,  it's CPU to controller to memory. So there is a little bit of added latency,  but even with that,  because you're adding another pathway to access memory,  when the system or the cores are heavily loaded,  they don't have to all line up  to get access to that single,  that handful of memory controllers that are in the CPU,  they have another pathway to access memory. And I think Michael did a great job  of showing some examples of the performance increases  that you can get there.

Okay, so we've talked about this a little bit before too,  what's driving the need for all the DRAM. Obviously AI is the big one that everybody's talking about,  but there's some other real applications  that are driving it as well. One of them is just kind of purely the in-memory databases,  the SAP HANA and the SQL models that are out there. So that's an important one,  but also video processing. So we're seeing applications  such as security cameras in stores,  you get those self checkout kiosks. And apparently people have been taking low price stickers  and putting them on high price items  and then scanning them in the self checkout. And so they want to do real time analytics  of security camera processing  so that they can say, nope,  that item you just scanned does not match  what you just put in your shopping bag. So applications like that require video processing,  requires a lot of memory. And even in biotech, we're seeing DNA sequencing machines,  they actually generate tremendous amounts of data  and it's got to be stored and processed. And it's far faster to do that  if you can do it all in memory. So a lot of different applications here  that are driving forward. It's not just AI, I guess there's the point here,  but AI is obviously a very important one.

Okay, so we see the adoption of CXL  kind of coming in two phases. And I think the first phase is that direct attached memory,  and that's going to be in a couple of different form factors  which we touched on this morning,  but that is going to be implemented first. And then following on,  we'll have that disaggregation of CXL memory,  which we talked about as well. But one thing I do want to mention  is even once that disaggregation of CXL memory  behind switches or in memory sharing becomes a reality,  we don't see the in-system CXL devices going away. We see them working in parallel. And I think that once those are adopted  and once they're available in systems,  you still get a lot of performance benefits  from having CXL memory in the system,  as well as having an external pool available as well. So we see them coexisting.

So this is a different take on the, when is it coming? So we keep talking about it for a long time,  but when is it going to get here? And so we heard of Siamak went over the CXL 3.1 spec  and the differences there. That was released late last year. So the spec is here. They're continuing to make improvements  or we're continuing to make improvements  to add additional features and clarifications. So that'll be an ongoing process. But the bottom line is from a specification point of view,  we've got it. When you look at the controller manufacturers,  there's a number of different companies  that are making CXL controllers  that you can use to build these devices. And the majority of them are supporting  the CXL 2.0 spec features at this point. Many of them will be transitioning to 3.0  towards the end of this year or into 2025. So that's where we are on the devices. And then here you can see the CPUs are kind of,  they have a very long development cycle, obviously. So it takes them time to get those features  integrated into the CPUs. But as we learned earlier,  they're all backwards compatible, right? So you can have a device that's a 2.0  and it will work on a 1.1 capable system. You just don't necessarily have access to all the features,  but it will still work for a memory expansion device, right? So I talked before about the two types of adoption  that we're seeing, the direct attach in a server,  as well as that memory pooling adoption. And we do see two ramps occurring in the industry. And the first ramp is starting this year. Those direct attach devices,  as Micron just showed, are available now. And we see them being integrated into systems  and going into production this year. I have a number of devices that we'll talk about as well  that are also going to be available this year. And then we do see kind of that second tier,  that second ramp kicking off once the 3.0 spec  becomes widely available next year. And then that will ramp as well. But I do want to reiterate, those will coexist. The direct attach and the memory pooling,  we really see happening kind of side by side  and working together in the ecosystem.

Okay, so this is a lineup of products,  that CXL products that SMART's going to be introducing  in 2024. The top row is a family of add-in cards  that we've been developing for quite some time. The first one is an 8-DIMM add-in card. And then we have a 4-DIMM add-in card. And then we're also looking at some different configurations  of a 4-DIMM, potentially a low profile one,  as I'm showing here. That's still in feasibility and design. And then on the bottom row, you see our E3 form factors. So these are lower capacity than the ones  that Micron showed. We have a 64 gigabyte that was largely done  as a proof of concept device,  which has been extremely helpful for us  and for our partners to get CXL up and running  in real systems. And another difference is this is a x16 device. So our E3 support 16 lanes,  and that is good for performance,  but not all servers support that in the front bay yet. So just a trade off there that you're doing. And we are going to be introducing a 96 gigabyte version  of that device a little bit later this year. So the last one in the corner here,  some of you may be familiar with NVDIMMs. Obviously Optane was a big persistent memory device  that came out and has since reached end of life. But in addition to that, there's NVDIMM-N's,  which had been around for quite a while. They're essentially DRAM with NAND flash on the device. And then they use super capacitors to do a backup  to the NAND flash in the event of a power loss. Well, those have been somewhat niche,  but also pretty successful in the DDR4 ecosystem. Well, that support for those devices  is not planned to be carried forward to DDR5. Instead, what we're looking at doing  is non-volatile CXL devices. So I'll talk about a little more detail about coming up.

Okay, so the first one is we have an 8-DIMM add-in card. So this is in that add-in card or CEM form factor  that I talked about earlier this morning. It's a full height, half length, dual width card. And it's gonna be using or does use two CXL controllers. That's so that we can get the capacity  of the eight DIMMs, right? Each controller has two memory channels  and each channels can support up to two DIMMs per channel. That's where we get up to the eight devices. It does exceed that 75 watt limit  that I was talking about this morning  regarding what power can be provided from the edge connector. So you'll notice there is an auxiliary power connector here  to get us a little bit more power  that's necessary to power this device. But one of these cards can support up to,  you know, a terabyte of memory,  depending on the capacity of modules that you install.

The next one, this is our 4-DIMM add-in card. This is gonna be sampling,  I should mention the 8-DIMM  is gonna be sampling this quarter, right? So engineering samples will be available this quarter. This one is sampling next quarter. This is a 4-DIMM add-in card. It's a little bit difficult to see. You see one DIMM sticking up  and one DIMM kind of pointing down towards the connectors,  but there's actually a pair of DIMMs there  and a pair of DIMMs on the top as well. This fits in that single width. It's a full height, half length, but a single width card. And it's using a single by 16 controller, right? With two memory channels. This one, again, is gonna be available  for engineering samples coming up next quarter. So we're really excited about that one.

We also have in development a different four DIMM option. This is more for a low profile system. Some servers require kind of this low profile option,  like a 2U server. This is kind of a nice form factor for that. So this one is in development for later this year.

 And we've seen a couple of studies on the cost advantage,  but let me walk you through kind of the analysis  that we've done for one kind of example. So this is an example where you wanna build a system  where one CPU has a total of one terabyte of memory, right? In today's ecosystem without CXL,  you would need to populate,  if it's capable of supporting eight DIMMs,  which is a fairly common scenario with a CPU for DDR5,  you would need to populate it  with eight 128 gigabyte modules, right? In order to meet that one terabyte system. Today, when you make the leap to 128 gigabyte modules,  you are going to those 3DS devices. So there is a price per gigabit premium  that you pay for those and well-deserved. It's, as I mentioned, it's a difficult device to build,  but there is that little bit of a cost overhead. And so your total cost would be a little over,  almost $13,000 just for the memory in that device. If instead you have a system that's capable  of supporting CXL,  you could add a single eight DIMM add-in card, for example,  and use a total of 16, 64 gigabyte modules,  avoid that premium,  and you can pay a total of 60% less for your memory cost. That's including cost of like the add-in card itself, right? So now you do end up with kind of the two memory tiers,  if you will, the direct attachment memory  and the CXL attachment memory. So it would be very useful to take advantage  of some of those tiering solutions  that have been talked about  in some of the earlier presentations. But for pure cost analysis,  it has some significant advantages.

Okay, this is our E3 form factor module. This is the 64 gigabyte version. We are going to be releasing  a 96 gigabyte version coming up. This is a by 16 interface using a single CXL controller  and two DRAM channels behind it. You kind of get a pretty good idea  of what it looks like there.

And then I want to spend a little bit of time on this one. So this is our non-volatile device. This is going to be released for engineering samples  kind of later this year, late this year. And what we're doing here is  we have a CXL controller in this device. And so the host system can talk through CXL  and access the DRAM directly via that interface. So it's full speed. There's no NAND access from the host point of view. It's just full DRAM device. But then in the event of a power failure,  what happens is the device takes over  and it takes control of that DRAM  and copies the data to NAND flash  using onboard supercapacitors, right? So this will be all self-contained unit  and it will use those supercapacitors  to copy all the data into NAND flash. And then when the system comes back up,  it will copy all of the data back into DRAM  and therefore the system sees its data present. And so this is a pretty exciting device  that we're looking forward to introducing later this year. We're working with a number of ecosystem partners  to make sure that this functions well in systems  and requires minimum interaction from the host or the user. The goal is to plug it in and have it work, right?

Okay, so this is some of the reasons  why SMART has kind of fallen right into the CXL lab  and we're well suited to making these devices. You can see we have quite an extensive portfolio  of CXL devices coming out already this year. And that's because it just fits right in the sweet spot  of what we do. We do specialty memory, we do memory with controllers,  we've done NVDIMMs in the past,  we do serial attached memory,  we've worked on memory with all of these standards  previously and we have a very experienced team  that we've been working with for 30 plus years  to do these types of devices. So CXL is really exciting for us because it, like I said,  it falls right into the scope of what we do. And that's it. So if you would like information about any of these products,  don't hesitate to reach out to me. The link will bring you to our website,  but you can also reach out to me directly.