-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path224
More file actions
22 lines (11 loc) · 10.9 KB
/
224
File metadata and controls
22 lines (11 loc) · 10.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
All right, good afternoon everyone. Hope you all are having a great time at the summit here in Lisbon. I certainly have enjoyed all of the talks and also connecting with friends and former colleagues after a while. So it's really good to be here. Here at Intel, we are truly embracing a systems first approach towards building large scale AI infrastructure. And today I'm gonna talk about that approach and our thinking around building scalable systems for AI at the data center scale. So besides talking about general principles and approaches towards building these systems, I will also dig deep a little bit specifically into what we are doing with our recently launched Gaudi three generation of data center accelerator products for AI. That was announced just a couple of weeks ago. So just fresh off the press. So with that, let's jump right in.
So why does a systems first approach really matter? Here are a few reasons. I'm sure there are plenty more, but just as a bird's eye view, first and foremost, the data center is fast becoming the unit of compute when it comes to AI. The optimization point that both software and hardware professionals are optimizing for is no longer an individual ingredient or an accelerator, but it's the entire data center. So the conversation is rapidly shifting away from optimizing at the teraflops per watt level at an accelerator to optimizing for the exaflops or petaflops per megawatt scale at the data center. Similarly, all of the TCO optimizations are no longer just limited to focusing on the individual accelerator, but are now spanning to the entire data center. Secondly, this should not be news to any one of you, but there has been a near exponential growth in the adoption of large scale AI infrastructure. And the reasons for that are twofold. One is obviously to support the ever growing growth in, or ever growing size of models that are being trained and deployed for AI. But secondly, there is also a near exponential growth in the developer count that has been focusing on AI development over the past five to 10 years. I mean, ever since 2012, when AlexNet revolutionized the field of computer vision, and then in 2017 or 18, when BERT came along and just changed the landscape entirely on language modeling, the developer count has actually increased quite substantially, which is why the need for large scale AI infrastructure. And finally, we cannot accomplish a true systems based approach until we look at the stack end to end. So a vertically integrated full stack AI platform is really the only way to take on a systems based approach to meet the needs, ever growing needs of developers. Both, when I say developers, it's both the software, the AI machine learning experts, and also the infrastructure software developer persona.
Which is a good segue into this slide, which talks about what exactly does it mean to take a systems based approach. Starting with ease of use and developer productivity, I talked about the two developer personas. We as infrastructure providers need to ensure that whatever is put out there in the data center is easy to use for all kinds of developers. Second is the aspect of multi-tenancy and all of the security and isolation guarantees that need to come along with offering multi-tenant solutions. This is especially true in the case of the large public cloud providers where they support at any given time, hundreds of thousands of customers on just a single cluster, not just for mainstream infrastructure, but also for AI infrastructure going forward. A corollary of that is the notion of managing the infrastructure efficiently through intelligent orchestration frameworks and also job schedulers. Today the world is all revolving around Kubernetes based infrastructure for this, but there may be other frameworks and frameworks at a slightly higher level of abstraction that may be coming along. So infrastructure providers need to be cognizant of that and a systems level approach needs to comprehend that aspect as well. Next is life cycle management for the end to end, the end to end life cycle management for the cluster itself, meaning starting with the initial deployment of the cluster all the way to actually allocating capacity to individual customers and then end of lifing it. All of that needs to happen in a seamless automated way, which also needs to be comprehended from day one when we start thinking about these designs, these clusters. And comes along with that is a control plane because the control plane besides the data plane, which is the compute, is something that orchestrates all of this. So we talk less about the control plane, but that's very important as well. Obviously workloads need to scale because there's no point in building large scale systems without ensuring that scalability frameworks that allow workloads to run on that large scale infrastructure are taken into account as well. And finally, but no less important is the hardware itself. Us, part of the OCP community, especially should align with this notion of a standards based hardware approach towards system design. Because there is only certain amount of room for diversity in the infrastructure. But if you want to make AI infrastructure truly ubiquitous, I think the standards based approach is the only way to go about it.
So with that general framing of the approach, let me jump into a few specifics about our Gaudi 3 accelerator product that was launched a couple of weeks ago.
The Gaudi family of products is a truly AI optimized accelerator family and does not suffer from the incumbencies of traditional GPUs which need to support certain graphics functionalities as well. So purpose built for AI with 64 tensor processor cores and eight matrix multiply engines which are the heart of the compute element in the accelerator. Besides compute, memory, as you all can appreciate, is extremely important. So two tiers of memory, HBM, large capacity, 128 gigabytes, tons of bandwidth, 3.7 terabytes per second. But equally important is On-Die SRAM which offers a caching layer for software for effective reuse of activations in the AI modeling context, which offers significantly higher bandwidth even compared to HBM. And finally, networking which forms the third pillar of the accelerator complex. 24 integrated NICs, each of which are capable of serving traffic at 200 gigabit per second, leveraging 100 gigabit SerDes. Which packs in a lot of punch and creates a very efficient and well balanced system. And a PCIe Gen 5 for the host interface.
This allows us to take a very modular approach towards building these clusters, starting all the way from a single node, which is the basic scale unit, to something like a 64 node cluster which becomes a scalable unit for a data center scale design. And then expanding that to something like a 4,000 accelerator complex or even an 8,000 accelerator complex. I will not go into the speeds and feeds in detail. You can look at it offline. But the idea here is modularity.
So how we go about accomplishing this modularity is to ensure that it is baked into the design from the very fundamental or the foundational layer of the infrastructure. Which is in this case a node comprising of eight of these accelerators on a OCP compliant UBB baseboard. They have a all to all mesh, a fully connected interconnects between all of the eight Gaudis that offer an incredible amount of bandwidth for scale up, given that the tensor model parallel domain for model parallelism typically extends to eight GPUs in today's world. And then there are three links out of those 24 that I talked about earlier that are pushed out to OSFP connectors in the server that allow for scale out.
Which brings me to this, which is a node level architecture showing the OSFPs, which can then be leveraged to build out what we are calling here a subcluster that comprises of 128 accelerators. In this case, this becomes the scalable unit. As you can see, it is quite flexible. There is a leaf switch tier. We typically resort to, as you may know, a full fat tree class type network, non-blocking network for AI use cases that has two layers of switching. So this is the leaf layer.
And then building on top of this, we can go about building even larger clusters with a two tier switch hierarchy. In this case, showcased as a 4,000 Gaudi three cluster, fully non-blocking fat tree, as I said earlier, and allows us to offer customers the ability to train the largest of the models. We are talking about trillion parameter range here on this kind of infrastructure.
Now, I would be remiss if I did not talk about our sustainability efforts on Gaudi, especially in the context of this regional summit at OCP. Besides the traditional air cooled version, we also support two flavors of liquid cooling, single phase liquid cooled designs and two phase liquid cooled designs based on cold plates. The uniqueness of this design is that a common cold plate serves both cooling solutions. And we are on our way to launch the air cooled product first and then within quickly following that, we'll be launching the liquid cooled products as well. So very committed to the sustainability effort here.
But speaking of sustainability, we obviously are not just confined to the Gaudi family of products here. At Intel, as you all know, we have a huge CPU business as well, focusing on Xeon. So this slide tries to cover our overall cooling infrastructure solutions for both the Gaudi family and also for our Xeon family. And more notably, this includes both liquid cooled solutions and also emergent cooling, which is fast emerging as a far more efficient cooling subsystem. It'll take some time to gain traction in terms of mainstream deployments, but we need to pay attention to it from the get go because the numbers are staggering. We are talking about about 30% of reduction in power consumption just by virtue of leveraging immersion cooling. You saw the other talks in sessions earlier today where with immersion, people are talking about PUEs of close to one. That probably is the only path for us to get there. And like I said, our Xeon family of products also supports a single phase, two phase, and actually goes on to support immersion cooling right now with the third, fourth, and fifth generation of Xeon and the recently announced Xeon 6 family, which will all support immersion cooling. With that, this is my last slide. So in conclusion, I would just like to say that we are super excited to continue our journey towards a systems-based approach for AI infrastructure. The call to action is here for you all to come participate in that journey with us. We are looking forward to collaborations across the industry to create an open ecosystem for AI systems as opposed to a closed source, proprietary set of technologies. With that, I thank you for listening in and I'm happy to take any questions. Thank you.