George Cozma of Chips and Cheese joined Bryan, Adam, and the Oxide Friends to talk about AMD's new 5th generation EPYC processor, codename: Turin. What's new in Turin and how is Oxide's Turin-based platform coming along?In addition to Bryan Cantrill and Adam Leventhal, we were joined by special guest George Cozma, as well as Oxide colleagues Robert Mustacchi, Eric Aasen, Nathanael Huffman, and the quietly observant Aaron Hartwig.Some of the topics we hit on, in the order that we hit them:Chips and Cheese: AMD's Turin: 5th Gen EPYC LaunchedEnd of the Road: An Anandtech FarewellCentaur TechnologyAVX-512Zen5's AVX512 Teardown + More...Thermal Power Design (TDP)OxF: Rack Scale Networking (use of p4)P4AGESAOxF: The Network Behind the Network (Oxide server recovery)openSILphoronix: openSILPCB backdrillingOxF: AMD's MI300 (APUs)dtrace.conf(24) -- The DTrace unconference, December 11th, 2024If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!
I noticed that you changed up the title. You did not like my... Okay, yeah, go ahead. Go ahead, explain yourself.
Here's where I wanted to start. I love your title. Okay.
I mean, why would someone... Obviously, I love my title. I love your title. Akui, there's a butt coming.
There's not even a butt coming.
Listen, pal, we've been rejected by enough VCs. I know a breakup male when I see it.
No, like unequivocally, I love it. Period. The end.
We love meeting Oxide. We're very excited about Oxide. Okay. Yeah. Go on.
Look, I'm not cheering from the sidelines. I want to be in the game with you. I'm in the game.
Okay. Go on. So you love my title. My title was Unshrouding Turin.
Yeah. Love it.
But I noticed the title here is Benvenuto a Torino.
That was for Robert.
That was for Robert.
I thought he'd enjoy that more. Robert, which one do you prefer?
Oh, put Robert on the spot. Go ahead, Robert. There we go. What an all pro. You know what this reminds me of? This reminds me of when my now 20-year-old was four. We understood from one of his friends in the neighborhood that he and this girl were going to get married. And they were like, okay, that seems like a little bit heavy for four. Yeah.
And we were talking to another parent at the preschool, and she was saying that her daughter and Tobin were going to get married. I'm like, God, this kid's a real, gets around, real gigolo here. Well, as long as, you know, he's got his, I guess he's got, you know, when you're a four-year-old, I guess you have a playmate in every port. And we are at the beach.
We're at Chrissy Field with one of these girls, and the other girl comes up.
Oh, man.
Oh, and you're like, okay, what? And I'm reminded a little bit of Robert on the spot over here. And like Robert, my four-year-old takes the hand of one of the girls, takes the hand of the other girl, and then the three of them all go running off together.
Delightful.
I'm like, all right, you know, go for it.
You're like, I'm just going to write this down to tell at your wedding.
Absolutely, or weddings. I mean, who's to say that this, you know, who's to say that this won't carry into adulthood? Yeah, so I'm raising a bigamist. Anyway, I am, regardless of the title and who it was designed to appease, I am very, we're very excited to be talking about Turin and the Turin launch. This is AMD's latest part. George, thank you very much for joining us. Really appreciate it.
You are a repeat friend of Oxide. It's good to have you back.
Good to be back. Excited to be here.
So you had a great blog entry that I was excited to see it at the top of Hacker News over the weekend. Were you surprised by that?
No. I've noticed that Hacker News has been picking us up more and more. That's great. And... Ironically enough, so we recently moved over to Substack and we noticed that Seemingly, the SEO for Substack is a lot better. So that article got a lot more traction.
Oh, interesting.
So it being at the top of multiple sites, aggregator sites, doesn't surprise me.
You know, maybe the SEO, it may just be also that you just got a great article on a hot topic.
You know, this could be... Yeah, and what's funny was the video actually did really well.
So, okay, I'm glad you brought up the video.
Yeah. Oh, so there was the video part of the article.
No, no, I'm glad you brought it up. Because those, you, the comments on that video were the nicest YouTube comments I've ever seen in my life.
Yeah.
I didn't even know YouTube comments could be nice.
Yeah.
And most... What I've noticed is that, so you guys can't see the like ratio, but it currently has a 100% like ratio.
I mean, what is going on? This is, this can't be a YouTube video.
It's not YouTube.
It's not YouTube. There's something, this thing has fallen into some alternate reality and like, and the comments are all like, you know, thanks for all of your diligent work. And you know, I just, I, I love, I mean, it just, it's great. God, like we talk about lightning in a bottle.
I don't know if you saw these Adam, but it was just like, I have, I think a lot of it is because like, I mean, we, what, last month or just the month before Anand Tech closed?
Yes.
For the last time, a lot of the older folks like Ace Hardware Review, Real World Tech, David Cantor doesn't really write anymore. So a lot of the in-depth stuff has sort of disappeared over time.
Yeah, so you think that this has given the internet some gratitude? You've managed to domesticate the internet.
More so that I think people want an alternative to... That really goes in depth.
And look, it was like if there was a YouTube video that we're going to start having nice comments on, that was a good one to start on. That was a great video, went in depth. I love that you kind of had the surprise ending where you set a world record in your hotel room. Let's start with there. What was that benchmark that you were running? And you were running that on Turin, obviously.
So that was Y Cruncher. a hundred billion, uh, BPT, uh, B, B, B, B, B, excuse me. It's hard to say. Um, but basically it, it, All it does is it's a compute benchmark. So it just wants as many threads and as high clocks as you can get. It's not memory bound at all. But the prior record at the time of that video was about 10 seconds. And that was a sub five second result.
And I see Jordan in the chat or as one of the audience members, he was running it with someone else in the room, Jeff from Graph Computing. And I was doing my video and I see him in my eye trying to wave this laptop.
I'm like, I couldn't really say anything. I was like, what's going on? And he's like, do you want to show the audience that we just broke a record? I'm like, okay, completely unplanned. I had no idea that was going to happen.
It was pretty great. It was pretty great. And I love that you're like kind of wrapping it up. You're like, no, no, actually, wait a minute. Hold on. I'm not being, this laptop just being handed to me.
Wait, there's literally, I pulled the Lisa Sue. Wait, there's more.
There's more. Yeah. Okay. So, and obviously that setting, and it's helpful to know this is a very compute intensive workload. Because one of the things that I think that we've heard from a bunch of folks is this thing is so much more compute that now you've got to really ask questions about balance of the system and memory bandwidth and so on. So I want to get into all that.
Um, I guess one thing I would kind of ask you from the top, um, just what is your kind of top takeaway from the Turin launch? And was there anything that surprised you? Was there anything that you either didn't know what was coming or didn't know the kind of magnitude or you're still, yeah.
Sort of three things here, if you don't mind. Um, What surprised me most about Turrent specifically was one, the fact that they hit five gigahertz on some SKUs.
Yeah, I was going to ask you that.
On a server CPU. Yeah. But not just hit five gigahertz. Like I wrote in the article, Wendell from Level One Text in a essentially web server workload was hitting 4.9 gigahertz all core. That's nutty. That's utterly, utterly nutty.
That is crazy. I feel like the last time we were really, it's been a minute since we've seen clocks that high from anywhere, I feel. I feel it's been like, I mean, IBM was hitting it with power.
Well, no, IBM C is the only, or really the only folks that do that sort of over five gigahertz consistently in server. Intel back in the day, if you remember those Black Ops CPUs, they were doing five gigahertz. And I believe that there was one, the last Oracle, Spark CPU, the M8, went up to five years. Did it really? But other than that, I'm having to draw upon some fairly niche CPUs here.
The fact that what is effectively a mainstream CPU can do this is crazy. But add on to that, just... The fact that Zen 5C, so sort of those compact cores, have the full 512-bit FPU, I think that that's really impressive considering that they're sticking 16 of them into a single CCD now.
Yeah, so let's elaborate on this a little bit because this is, I think, a really interesting point. So we're seeing this from Intel too, right, where once you get the density up to a certain level, you've got to make some compromises. But the compromises that AMD seems to be making are much less than the compromises you're seeing. I mean, the Zen 5C cores are – they're still Zen 5 cores.
Yeah, so here's the difference between Zen 5 and Zen 5C. From an architecture perspective, nothing. Until you hit L3, there's no difference. Now, are they on different nodes? Yes. Is there an Fmax difference? Yes. But the fact that they're still hitting 3.7...
f max so that's your top clock is really impressive and the no difference is i think you're on what three nanometer for the 5c and four nanometer for the yes and five is that right yes these are all tsmc obviously so on their c cores they jumped uh 600 megahertz so from 3.1 to 3.7 um god that is amazing yeah yeah and and i there was a good point made and i think the biggest jump
in recent AMD history in terms of the server performance. It's not Zen 4 to Zen 5. It's Zen 4C to Zen 5C.
Interesting.
I think that performance jump is far from... Like, I think that that's really, really exciting performance. But what I really like about Turret is not just that there's these big old high-end SKUs, the 128 and 192 core SKUs, but they've paid attention to this mid-range, right?
With the 9575F, that's a high-frequency 64-core SKU that I was talking about that was getting 4.9 from Windows testing on all cores. I think that that sort of skew, the fact that they still are paying attention to sort of the mid-range is really good. It's expensive, no doubt, but... Yeah.
And the fact that you've got this kind of SKU stack now that is kind of a uniform SKU stack where you can start to really make some interesting trade-offs as you look at workload. And it just, it feels like, you know, and Robert here is our resident SKU stackologist. Yeah.
which definitely, I mean, there were times with Intel where you've got like these three different, where you had like gold, silver, and platinum, and you've got- Also bronze. Bronze, right. And it just, it did require like you to get a postdoc to figure out like which part you want. And Robert, what do you make of this Turin SKU stack?
I mean, it seems like it's a pretty clean SKU stack in terms of making different trade-offs as you go up and down it.
Yeah, I think AMD's biggest strength is the fact that you're basically only choosing the number of cores, what the frequency is, and cache size. Otherwise, all the other features are the same. And I think that actually ends up being pretty powerful. You're not getting into a case where it's like, oh, do you want to have fast memory? Different SKU. Do you want to have a different...
Do you want to have RAS features? Ooh, sorry. That's going to cost you. That's going to cost you.
So you're not having to make trade-offs. You kind of, you kind of take from the buffet table, but you're not needing to compromise. Is that right?
Yeah. I mean, certainly. And I think that, that, you know, when on the Intel side, when you go from these P cores to E cores, Adam, you're going from the P is for performance and the E is for efficiency. You're like, Oh, as, uh, uh,
So, Joe McCurry from AMD put it, economy cores. Economy cores. When he said that on stage, Ian and I were laughing for about five minutes straight.
That's really good.
Yeah, because in particular, you don't have AVX-512 on those.
No. That's the most shocking bit to me, is that you basically still never, despite Intel championing it and trying to put all that energy into everything over the years, you still just got nothing.
And I... let's just say that's, that's something I've harped on Intel about is they really, the fact that they now have an ISA segmentation.
Yes. Yeah.
It's bad. It's bad. Don't segment your ISA, please. Yeah. Like that's, that's how you shoot yourself in the foot. Something fierce.
Just to elaborate on that. So what that means is like the operating system needs to make scheduling decisions that, within the same chip about whether certain workloads can run on certain of the cores.
Is that right? So on the client side, yes, that's true. On the server side, that's not correct. All server chips have the same cores. So there are two Xeon 6 lineups, the Xeon 6000Ps and the Xeon 6000Bs.
and the p's are for the p cores there's no merger of yeah so just to be clear it's like it's like ryanair it's like all economy yes on this one but but the thing with that is and here's sort of the cleverness of zen 5 versus n5c and that is amd can just have one big skew stack and that's it yeah intel you need these two different skew stacks
And it can start getting confusing with what has what and where.
Well, and then it's like, so it's really tough too, because if you are, you know, in our position of like, we are selling a rack scale computer to someone who is making compute available to their kind of customers, right? We're asking, if you have to ask them, well, are you using AVX 512? They'll be like, I don't know. I have to go ask my users that.
And it makes it really hard, and you're kind of at this big fork in the road. So to be able to have a...
to not have to give up i mean just as you said george to not have to compromise on isa and to get the same isa everywhere and yes you may be giving up you know you're making some trade-offs in terms of max frequency and so on but you're not like dropping off your trade-off is an area right right yeah you have to spend that area um but the
The thing here is I think the area would be, even if it was a bare bones implementation, I don't know if you guys remember Centaur, Centaur CNS. That was, if you guys remember VIA, that was a VIA CPU that never actually hit market, but you were able to test Um, thanks to a couple of guys who acquired some during the Centaur, uh, buyout days about three years ago. Um.
It had a very basic and bare bones AVX 512 implementation. But that's fine.
I feel like the last time you were on, I was getting a bunch of grief for dropping some dated references. This is an old part, just to be clear.
Well, here's the thing, right? It's an old part, absolutely. But Intel bought the team and It was essentially an AccuHire. They got the folks from Centaur three years ago. Oh, so this is a recent thing.
This is not Centaur.
No, no, no. So Centaur was broken up essentially in 2021, as the Wikipedia article says. VIA still has x86 licensing, but the Centaur team isn't at VIA anymore.
So this was in 2021. So I assumed you were talking about the, like the, okay, this is, you're not talking about a chip that was made in like 1999. No, no, no, no, no.
I'm talking about a chip that was due for release in 2022.
And they were broken.
But was canceled just before launch.
Okay, so I've got so many questions about this Wikipedia page. So they were broken up. First of all, like nice use of the passive voice. Like they were, like what broke them up?
The iceberg. The iceberg broke the line. So VIA, who was the parent company, basically shuttered the Austin headquarters. And the team was acquired by Intel for $125 million. That's a hell of an acquirer. Yeah. Here's the thing.
was expecting it and it was very um the amount of information that has been shared about it has been close to zero interesting and if you ask anyone that formerly was there they they don't say anything it's it's kind of weird okay interesting and so then in via i didn't realize that via was an x86 license holder
So they had licenses from IDT, which is what Centaur used to be, and Cyrex, because they acquired Cyrex. Right. So they had licensing, and I think they still technically do, which is how Zhaoxin, which is a Chinese x86 manufacturer, has the ability to make x86 CPUs, which that's a whole history in and of itself.
God, thank you for opening up all of these doors. Did you see this? There's this documentary, The Rise of the Centaur, covering the early history of the company. It's like, okay, that's must-see TV. I mean...
But the reason why I brought them up is because what was supposed to be their newest course, CNS, was supposed to have had ABX 512 capabilities, but it was a very bare-bones implementation of ABX 512.
Oh, interesting. Okay.
And if the eCourse had had a bare-bones implementation,
Yeah, interesting. And the fact that it's not there at all really does, it is, as you say, it's a separate ISA. Yeah. Is that a good segue to the AVX-512 improvements on Tornax? I mean, I felt like going into this launch, I mean, I felt that's one of the headliners was the improvements to the AVX-512.
Yeah. Alexander Yee, the creator of Y Cruncher, did a very good write-up on Zen 5's AVX 5.12 implementation. Oh, yeah. And he went very, very into it and basically said, yeah, this is the best AVX 5.12 implementation so far.
Yeah, really. There are some... The data path, obviously, is a big part of that, I assume. The fact that it's going from a 256-bit wide data path to a 550-bit wide data path. Or deeper than that.
That's part of it. Another part is just the increase in the number of registers. They doubled the number of registers. Yeah. They made a lot of ops single cycle, which is nice. There were some trade-offs that were made. Some of the integer stuff was made to cycle, which was a bit of a cavity in the tooth, so to speak, fly in the ointment. But other than that, it's...
The way that AMD can just not have to... So with Intel, you always had that sort of clock offset where if you run any AVX-512 code, you would suddenly decrease in your clocks, right? AMD doesn't have that. How they accomplish that, I have no idea. But you can throw in AVX-512 instructions and...
thermal and power, um, like clock speed, um, pullback, it won't have this turbo clock thing where you have, where, where if you introduce any AVX 512 instructions, even if they're just loaded store instructions. they'll decrease the CPU clock regardless. You don't have that with Zen 5 or Zen 4 for that matter.
And was that on, was that Sandy Bridge or Haswell, Robert, where it was like, I mean, AVX-512 has always been kind of had this kind of problematic property that if one thread starts using it, it kind of like browns out the rest of the part.
It's Skylake.
Skylake is what did that. But you also had this problem on Broadwell with AVX2 and the others. And it's actually worse than just running an instruction. If you actually just leave the AVX save state bit such that Intel thinks it's modified in the register state, that's enough to trigger this slowdown sometimes. Oh, man. You might not remember this. We had a nasty bug back at Joyent.
where we had a guest in windows and we just weren't properly clearing one part of the save state in the initial state. So the initial state basically was like, you know, like the two 56 bit op masks are the two physics, but register state YMM state is valid. It's like, okay, I'm going to no longer boost.
Wow. And so you basically run this guest and it would like crater performance for the whole box for no actual approachable gain. Cause no one's actually using it.
Yeah. I mean, basically just that one for, it was just for it, but it was just, that's, I think that to me is kind of the even more telling, but even if you just leave the state in the save state, then you're, you're toast.
And it's really hard to have a feature like that, where if you use this feature, it has this, this kind of this adverse effect on the rest of the monkey's paw kind of feature.
Yes. Yeah. Yeah.
It's very hard to reason about the performance when you have these kind of problems. So, and the, and AMD is not needed. And I mean, you know, and it sounds like George, you've got the kind of the same question of like, you just come so accustomed to these kinds of intense compromises that come with AVX 512. It's kind of amazing that we can have it all.
Yeah. It's, How do I put this? So I think what Zen... And if you read the initial sort of coverage of Turin... Sorry, not of Turin, of Granite Rapids. So that's Intel's newest Xeon chip.
Right.
They're... All the coverage was like, yeah, it's good. It competes with Genoa. But we were all briefed before this. And we were just sort of thinking, yeah, but is this actually going to compete with what's coming up next? And it's, let's put it this way. At least AMD isn't competing with itself anymore, if you get what I mean. But it's not a good competition for Intel.
Like, it's not a winning one for Intel. They can at least bid on something and not be laughed immediately out of the room, but...
Yeah, and you would expect that to like, I mean, flip a little bit with Sierra Forest, but then you end up with this kind of this e-core business. And I mean, I think there's gonna be, I mean, there's gonna be things that are gonna be interesting over there, but touring is a very hard part to compete with. It's done a pretty good job across the board. Yeah. Yeah.
And so I think, George, I'm not sure you got to all three of the things that you had that were, so the frequency, the high number just in terms of the F cores and getting that up to five gigahertz, especially across all cores.
And the fact that there's only two 500-watt SKUs. I actually really like that. The fact that while they are going up to that 500 watt skew, they're only really leaving it for the highest end parts. Whereas everything else is 400 or below. And I really actually respect that.
I can tell you that we at Oxide also like that. Folks think about the kind of the rack level, because we're kind of left with the rack level power budget. And yeah, we definitely, it's nice to have a SKU stack that is not all sitting at 500 or 500 watts plus, right? I mean, I think of that.
Yeah, yeah. It has... 400 watts is still a large amount of power, no doubt. But if you remember the slide that AMD showed with a seven to one consolidation. If you could do that, right? If you can go from a thousand racks of 8280s to 131 racks of- Oxide, what?
Yeah, but if you can do that reduction, even though the single SKU power has gone up, your total power savings has gone, like your total power of the data center has gone down.
It has, and I think economically, too, it's interesting. These are expensive parts, but they can do so much, especially at that high core count level, when you're not having to sacrifice on what those cores can go do, that you can make it make economic sense, I think. Yeah. it's a big step function over where they've been.
I mean, I think that we, you know, I think like a lot of people, like the Genoa SKU stack was a little less interesting for us.
It was obviously a lot more focused towards the high end. Yeah. Like it was very much skewed, pun intended, towards the hyperscalers, right? Towards those people who can take that much power and just not care. But I think especially with the Turin SKU stack, I think it's a lot more sort of a refresh across the board. Yeah. Which is really good. Yeah.
Yeah, it is a good segue, Robert, to the kind of our thinking on Turin, because we, so George, as we're kind of thinking about our, I mean, our next gen sled is obviously a Turin-based sled. We did deliberately elect to kind of bypass Genoa and to intersect with Turin. Maybe describe our thinking a little bit there, Robert, and we've got
Sure, yeah, we can go a bunch of different places. I think the starting place is really actually going back to Milan.
Yeah.
Because actually like 64 core Milan, like the 7713P, or even if you go up a little bit, getting that in 225 watts or 240 watts, that was actually really nice. Really nice, yeah. That was really nice. Performance per watt on Milan is really pretty great.
Uh, yeah, I mean, I definitely, I mean, when you get, you know, it's hard to compare to the, you know, 192 Zen five C cores, uh, in, in that range. But, um,
Once you kind of get to Zen 5, I mean, I still miss that there's no 225-watt, 64-core part, but I think this is where you're going to see this from our end, trying to think about how do we get a little more flexibility, leverage the fact that you have base DRAM has increased in capacity without going to 3D-stacked RDMs, just because that part of the balance and price equation starts getting...
really thorny. So the fact that you have 128 gig RDIMS, um, is useful, especially when you start looking at the fact that two DPC stops making, gets challenging, uh, fast, uh, there. So I think there's a bunch of different skews you can kind of start to look at. You know, I think one thing I've been keeping my eye on is actually the 160 core, um, Zen five C, uh, as one thing to look at.
Cause like that kind of keeps you below 400 Watts, uh, So I think it's still like a group E CPU as opposed to a group G in the IRM. So do you want to describe a little bit of those terms? When you say the Group G versus Group E, what are those things?
Yeah, when AMD creates a new socket, they put out what they call an infrastructure roadmap or IRM, and then they basically are predefining different TDP ranges into these groups. So, for example, Group E probably has some range off the top of my head from like 320 to 400 watts. Um, these new 500 watt, uh, CPUs, I sometimes joke are group G guzzlers, uh, just cause they, they definitely take a lot.
Um, but you can design your platform to different TDP kind of thresholds, uh, these kinds of different infrastructure ranges, and then you'll get different kind of CPU and core counts. So like, I think if we look at, um, there's like three or four different 64 core CPUs. I think there's like the 95, 35, uh, which is kind of like the, uh, you know, almost 300 watt, uh, 64 core.
It's like, that'll be like a group a CPU. Yeah. And that kind of gives you what the tells you kind of what the TDP range and what you as the platform designer can kind of tweak, uh, from what the min to max is on there.
Um, whereas like the others will come to group E or some of these even smaller ones, um, like some of the 32 and 16 cores, if they're not getting cranked up for frequency might even be in like group B, uh, And related.
But as we were thinking about Cosmo, Cosmo is our codename for our next-gen Turin-based sled, SP5-based sled. What were we thinking in terms of what groups we wanted to target and kind of like the trade-offs there in terms of flexibility?
Yeah, that's a good point. So I think a lot of what we do is a bunch of work that we do with actually Eric, who's also on here. We're trying to figure out, hey, what's the right balance of how many stages, how many components do we need to kind of reach what kind of what power group?
So, for example, if we designed when we did SP three, we designed what they called Group X, which was the group they added later for the 3D V cache and like max frequency skews. Maybe it's like a 240 or 280 watt max. But then we ran kind of a 225 watt CPU in there the entire time, giving us plenty of margin, plenty of headroom, which meant that, you know, our power subsystem was very clean.
So here we kind of are saying, hey, let's let's you know, we said, hey, we're going to start with group B as our target. We're going to see what does it cost us to fit group G? You know, does it actually cost us more stages, more inductors, more? more other parts. And, you know, then the first question of, can we cool, can you air cool 500 plus Watts, which is a different question entirely.
Yeah. Eric, you want to describe some of the kind of the thinking there is as you're, as we were looking at the, what the PDN for this thing was going to look like.
Yeah. So generally the, the trade-off is, um, not even so much cost, it's space. So if you look at that, in the chat there was a server that had two sockets and like 48 DIMMs or some insanity on it. And if you look at that thing, there's a whole lot of power stages in it.
And if you look at the board designs for those things, having that many power stages basically creates a giant wall for you to route around, which sucks. So you have to pull out all those lovely PCAE lanes we love to use and route them around all those power stages because they don't really like going through them.
And when you mean route, we're talking the physical layout.
Traces on the PCB.
Yeah, traces on the PCB, right.
Yeah. And so for our power design, I tend to bias towards the conservative side of things. One, because we're not building a million of these and I don't have a finance person beating down my door over $5 in extra components. And two, because it gives us flexibility, right? So if we wanted to run those 500-watt chips, I'm going to be able to do that. Yeah.
Yeah, and I feel that, like, also, Eric, I think whatever size Oxide becomes, even if we're selling millions of them, I will help slay the person that comes to your door on the 5. Because I feel like on so many of these parts, I mean, yes, they add up and it's part of the bomb. But, man, look at the cost of these CPUs is so much greater. And getting the flexibility is so much more important.
having the reliability of having margin yes not having to worry about it it's like okay oh you know i i can push it down to like four millivolts of margin to the spec it's like but but why so i can save five dollars like that's dumb don't do that and then so i'd love to get a bunch of commodity servers and just start throwing them through uh power testing and see what they do oh well i don't know how close they are
Oh, and I think, I mean, George, I'm sure your experience has been up there, but I do love a bunch of the reviews online of Turin cautioning people to not do exactly this. Like, by the way, your SP5 motherboard may not be able to take some of these SKUs.
Yeah, basically what AMD has said is there's these 500-watt SKUs But if you are an end user, like if you're a small-medium business and you bought, say, four Turret servers from Dell or whoever, don't just swap out the chips.
Please make sure that your boards can actually support these chips.
Yeah. And the, and having like, and because the problem too will be that if you push these things to the margin, I mean, you, you can get like misbehavior. It's not, it won't be as simple as like burning the house down.
And you can get some very weird, weird, bizarre behavior that you're just going, why is it doing this? And you'll tear your hair out for a week trying to figure it out. And it's just because of the power.
Yes, or if you recall us on the Tales from the Bring Up Lab episode where Eric was regaling us with some of our adventures on Gimlet, where our power was already pretty good, but we could not figure out why this chip would reset itself after 1.25 seconds. So we made our power even better. And Eric, my recollection of this was AMD's like, we have never seen power. This power's amazing.
You've got amazing margin. It can't be that. And sure enough, it was not that. It was firmware, of course.
Oh, yeah. It was firmware on a power stage. Oh, on a power stage. Yes. Yeah. Yeah. It was the control interface that the AMD processor uses to tell the power stages what voltage it wants. Turned out to need a firmware update. It's very much a face-slapping thing.
George is kind of hilarious because we were not, the SVI2 was the protocol this thing uses, that the part uses to talk to the regulator. And the SDLE, which we had used, this great part from AMD that we had used to actually model all this stuff, as it turns out, didn't have a hard dependency on getting the ACK back from the controller when we set the voltage to a specific level.
The part, as it turns out, wants to hear that ACK, as we learned. We learned that the hard way. We learned that the hardest, most time-consuming, most expensive possible way. But we did learn in the process.
I actually, Eric, I thought it was super interesting to learn that our power margins were really good on that because that was like a first natural line of attack was our power margins aren't like, that's why this thing is resetting because it is in a reset loop because our power's not good enough. But we actually learned in the process of doing that, like, no, no, this power's actually quite good.
Yeah, it was rock solid and it was just stupid margin.
So, Eric, as you're kind of thinking about like, okay, so we need to, you know, there are things we need to do. And were you coming to the conclusion that, okay, I think we can make this all fit? I mean, as you're doing that kind of that trade-off?
Yeah. And so the big trade-off is, okay, are we going to have customers that need it? Are we going to even want to run a 500 watt? Can we air cool 500 Watts? Cause we're not water cooling. And it to me is the power person.
It basically came down to, it doesn't hurt anything for me to put another power stage on this thing and I can always turn it off and then it's just not doing anything, but it's also not contributing any heat to the system. So if I wanted to, I could turn it off and I wouldn't pay that much of a penalty.
And importantly, we're able to use the same regulator that we're, it's not like we're having to swap regulators to accommodate this. We were able to use the same Renaissance parts for this. And then the, and then from a thermal perspective, so we then, okay, so that's kind of like, all right, we've got the insurance there.
And then from a thermal perspective, we also needed to do the, because you said we're not water cooling this thing. So, you know, can we, and at 500 Watts, I think we definitely know we will not be talking about how quiet the fans are because you'll be lucky if you can hear us talk over the fans. Yeah.
We know the fans be cranking, but I think that the, I mean, we've done that and Doug and crew have done the model.
I'm calling it. We can air cool it.
Yeah.
Yep.
No, that's the thing.
Yeah.
I mean, right now we've done all of our, our worst case studies, which is basically saying, assume the CPU is going 500 Watts, right? All the dims are going at their maximum. You've got every SSD going at its maximum. and the NIC, and some amount of loss, you're paying some amount of loss for all the stages, we still think we can cool that.
And then practically speaking, even though the CPUs with turbo boosting have a good way to eat up the rest of your power, you're usually not getting all of those devices maxed out all at the same time.
It is in particular, like it's real hard to max out the draw on your DIMM and the draw on your CPU at the same time without being mean spirited. I mean, you have to be really...
Maybe X512 will let us do it this time.
They've also gotten a lot more clever about how they do all the hashing across stims.
Substantially so. You're right. I should not be tempting the gods here.
One thing that surprised me coming into this outside of the industry and coming into this is seeing like, okay, you got your 500 watt TDP, but that's not actually the peak power you can draw. They can draw over 800 watts transient. as they're scaling things up and down. And that just, it just blew my mind.
It's like, wait, this 500 watt park and just spike up over 800 Watts and then scale itself back down. Like the, like, yeah, that's great thermally, but dammit, I gotta, I gotta provide that. Right. Right.
Yeah. And with the voltages that current server CPUs are running, you're having, at those 800-watt spikes, it's not 800 amps. It's 1,000-plus amps, which means more power stages.
And just turning your board into a giant resistor, essentially.
Yes.
It turns out copper is not like 1,000 amps running through it in normal PCB thicknesses.
So actually, sort of the thing sort of related but unrelated to TURN directly at the event that was announced was Pensando, new Pensando stuff.
Yes.
And I sort of want your take on this.
Interesting is our take. I mean, it's like definitely interesting. I mean, I think that we would love to be able to get some parts. The draw does become an issue back there for us. Yeah.
Yeah, I mean, the P4 programmable nature of it for us is something that's actually really powerful. We leverage that in our switching silicon a lot and have been looking for something to get that into the NIC. The big challenge is just, I think where we're a little different is a lot of the DPUs have been designed to basically be like, we're the compute, the DPU is the computer in charge.
And Hey, you big, big CPU. That's like running guests over there. Like, uh, you're subordinate to me. So like, yeah, you don't, you don't, you know, you exist, but like only at my pleasure. Uh, and we were not quite as, uh, split brain, uh, there slash we're not trying to sell the entire server.
So like, you know, it just gets thorny when it's like, okay, that, that device also needs its own DDR five.
Yes.
Uh, some questions around like, Oh, but yeah,
Think of how much it could offload your processor. It's like, yeah, but the processor is still going to get maxed out. So now I've just increased my overall power.
Totally. Yeah. So when we end up designing for kind of not absolute density, but trying to get the best density in a fixed power budget, which because, you know, unlike the hyperscalers, we're not basically building a power plant next to every new DC. Yeah. that that's where it gets a little more challenging.
And so we're trying to work with folks to figure out, you know, Hey, if I don't need, say all of the arm cores that show up there, or let's say I didn't run with DDR five, you know, where can I, what can I get? What can I, can I still get out of there? You know, how can we kind of change this from, you know, some of these parts are, and I don't remember what this one was, you know, or 50, 75 Watts.
And, you know, that's that, or, you know, maybe I, I play games and I say, um, you know, I've got a lot of SSDs, but maybe I don't need all of the IOPS, all those SSDs. So I can double up, you know, capacity instead.
And that gets me back some of the power and I can send that to the neck, but it's definitely, uh, we're not in a, you know, even with just increasing power for the CPU, I'm already trying to think about like, well, what do I do for folks who don't have all that power? If I've got 32 sleds, how do I, uh,
Well, and I think that, you know, and Ellie in the chat is saying, well, I don't think people realize how, you know, how restrictive the oxide power budget is. And I don't necessarily, it is restrictive. It's more that we are really, we are taking that rack scale approach. And so we're kind of the ones that are like always adding up the visa bill.
And, you know, when you have, you know, 30, 40, 50, 60 watts, 70 watts, 80 watts in your neck, like that adds up in a hurry. And yes, you can offset it elsewhere. But what we're trying to do is try to get you the maximum amount of useful work out of that rack scale power budget.
And, you know, by being the ones that are doing that, we're the ones that are, you know, sometimes having to deliver some tough messages to folks about like, we like this, this is interesting, but it's drawing way, way, way too much power.
But yeah, overall, really good, really nice to see, excited to see that kind of P4 continue there and hoping someday we can find a way to make it make sense for us. But I think there's a lot of other folks who it does make a lot of sense for.
Yeah, I mean, we love, we're huge P4 fans as folks know. And I think we've got... Actually, if folks can see it out there, we've got an exciting announcement in terms of Excite Labs and using their part as our next-gen switch, and it's P4 programmability. So we're really excited. We've been using P4 on the switch. We're going to continue to do that.
And using P4 or programmability at the NIC, we're really interested in. But it's got to happen in a way that we can accommodate everything else we need to go do with the rack. So, George, a long answer to your question there, but it's interesting for sure.
So speaking of how you guys don't want to do DPs, because AMD didn't just announce DP, which is Pensando Selena 400. They also announced Polara, which I think is just a standard NIC. Like it's P4, but it's not a DPU.
Yeah. Which is something that's going to be, and that is something that's going to be like, we're definitely interested in.
Okay. Yeah. Cause I was going to say, is that the one that you're more interested in?
Yeah, I think so. I mean, that's not going to intersect our first cut here of Cosmo. But no, we're really, really interested in it. And again, great to see that P4 programmability.
There were some moments where it felt like we were a bit of a lonely voice, but I think other folks are beginning to realize, and I think as the hyperscalers themselves have known, that having that network programmability is really essential.
Brian, we've talked about P4 in the past, but for folks who maybe haven't listened to the back catalog, should we give a little overview? Sure.
Robert, you'll do the honors. I'm happy to give my P4 spiel.
Sure, I'll see if I can do it justice. Effectively, the way I think about P4 is it basically is a programming language that you can use to compile a program that operates on the Nix data plane. And I think this is an important part because for a lot of these things, the value is to actually run at line rate.
So you've got 100 gig, 200 gig, 400 gig, especially with all these 112 gig 30s coming along. You basically can't necessarily treat that as a general purpose program that's coming in, DMAing everything back to normal, you know, to a normal core's memory and processing it and sending it back out. But instead, this kind of lets it process the packets kind of in line in that hardware receive path.
And having that higher level of abstraction really allows you to kind of express something programmatically that they can then use hardware resources efficiently. Our big challenge has been working with vendors to give us a substrate upon which we can build a true P4 compiler. Honestly, the biggest challenge in that part of the ecosystem has been the proprietariness of the compilers.
So, George, I'll tell you that one thing that will be a factor for us as we're looking at kind of a P4-based NIC is what we have been looking for is what is that kind of x86-like substrate going to be? Something that is a documented, committed ISA that we can write our P4 compiler against.
So what we are not looking for, because we are coming out of a bad relationship in this regard, what we are not looking for is kind of a proprietary compiler. We really... uh, want to, and we have written our own P4 compiler for, we use our, we developed our own P4 compiler and have open sourced it, um, X4C, um, for purposes of just doing development, software development and testing and so on.
But we really want to take that and have that, use that to actually, uh, to compile, uh, for these parts for both the switch for sure. Um, and then ultimately the NIC, um, would definitely be our vision for where it's going.
yeah um and i know you guys as were a lot of us disappointed with uh intel discontinuing the tofino line of switches.
George, I really appreciate your sensitivity of taking us into the kind of the grieving room and your bedside manner here is really exemplary. I was really feeling you kind of passing the tissues to us as you, as you really felt our loss. I really appreciate that.
Yes. But yeah. And one question I have One question I have with AMD is, does it make sense for them to make their own Switch eventually?
I mean, are we in charge of AMD now? Because we got lots of ideas.
Actually, George, this is the pitch I've tried to make to them. Just because to me, it's like if you actually look at NVIDIA and what they've done with NVLink and basically buying Mellanox, at the end of the day, to really be able to you know, deal with that, what they're doing with ultra ethernet, it feels like you have a P4 engine. Yes. It's going to be a big change.
Take it from, you know, the Nick kind of two cert, you know, two port form factor to, you know, a switch ASIC and kind of dealing with power. But I think that if you really want to do well in that space, you can't rely on just like, hey, I'm going to convince Broadcom to let me pass through, you know, XGMI, you know, through my switch.
Or whatever they're calling that Infinity Fabric transport these days.
It's now UA Link.
Yeah, exactly.
Which, ironically enough, Intel is now a part of.
Right. So yeah. So, so yeah, I think it's like, it's good that you have that consortium and you'll be able to push some stuff there. But I also feel like at the end of the day, you know, where you see a lot of value from Nvidia is that they are building, you know, where they've been successful because they have vertically integrated a whole lot of that stuff. Yes.
Yeah. And, you know, so yeah, I mean, absolutely. We would be great for them to do that. Although that said, we are, you know, they need to do it in the right way. And the right way from our perspective is really establishing a substrate that people can build an open ecosystem on top of. And this is something that, you know,
And I always, I find vexing is you would think if you make hardware, it's enormously in your best interest to allow many, many software stacks to bloom, um, by having a well-documented committed interface. But, um, They really don't. It's a challenge, I would say. I wouldn't say they don't, it's too reductive. I think that they fight their own instincts on it.
And so we're very excited with Excite Labs. Again, you can see our announcement today or their announcement today, actually, but it features Oxide for sure. And we definitely see eye to eye with them on their X2. We're looking forward to to moving forward with that part. And we think that there should be, we want to see programmable networking everywhere.
We want to see this open ecosystem everywhere. The, on the note of like the kind of the lowest levels of the platform that can be hard to get at. So George, you may recall that we have no bias in our system. So there is no AGISA, there is no AMI bias. So when we buy us, buy us, am I? Nailed it. Oh, thank you. We have lots of bias. Up and at them, up and at them. Uh, better.
Um, do you wake up your kids that way? Adam, as, as when you're at castle, we're near wolf castle saying up and at them.
I'm not as good a parent as you in that regard.
Feels very loaded. I think my kids have definitely gotten sick of that particular Simpsons reference. Rainier Wolf Castle, no longer welcome in our abode. But we have no BIOS. And so Robert, that lowest level platform enablement has fallen to us. What are some of the differences in Turin from, or even from Genoa, but then from Milan?
Yeah, I think actually we've been talking about PCIe a bit. So I actually think one of the things that I find has been both fun and sometimes a little vexing, but is ultimately good for the platform, not always as fun for us in how the register numbers sort themselves out, is that they've actually increased the number of IOMS entries in there.
So basically in the past where you had a group of 32 PCIe lanes, which are basically two X16 cores, They were consolidated into one connection to the data fabric. Actually, one of the more interesting things is that we've seen that in Turin, each X16 group is connected to the data fabric independently through its own kind of IOMS slash IOHC, which are all, I guess, internal data items.
Yeah, those are effectively hidden cores, right?
Or those are core-ish? I mean... Yeah, I don't know how much. I'm sure there's a Z80 hidden in everything, or an 8051. So I'm sure everything's a core at the end of the day. But actually, if you just kind of look at it, this part is less hidden. Because if you just look at, hey, show me the PCI devices on Turin, you'll see, hey, there's eight AMD root complexes where there used to be four.
There used to be four. Yeah, interesting. Yeah. And that is presumably, so you are, you're just increasing the parallelism there. And I mean, is that the.
Yeah, that would be my theory is that basically it's getting you more because there's more data fabric ports that you can have just more transactions in flight to different groups of devices.
And so but but those kind of changes, which if we were at some level of software, that's an implementation detail you don't need to see. But at the level of software we're at, like you actually need to go accommodate those differences.
Yeah, yeah, that's definitely it. Otherwise, it's Milan to Genoa was more. There are more changes than Genoa to Turin. Interesting. In kind of some of the lower level stuff. Some of these kind of bits like how do you do PCIe initialization, hot plug have stayed more the same. From Genoa. From kind of Genoa to Turin. Yeah. They have some different firmware blobs that you talk to. So like...
The SMU interfaces stay the same across these, but they moved to a new what they call MPIO framework, which is what goes and programs the DXIO crossbar. PCIe device training is kind of a collaborative effort between that core and X86 cores.
And could you describe what device training is? Like, why does a link need to be trained? What is that?
Yeah. So there's, there's two different pieces here. So if you see AMD, first off, when they sell, you know, in their, all their makers say, Hey, we've got 128 PCIe lanes, which is great. Uh, But the first thing you have to figure out is, well, actually, how do those work on the board? I've got, are these X16 slots? Are they X4 slots that are actually connected to an SSD?
What's their size and width, and how do they actually fit across the board? So one of the first things that everyone has to do is they kind of will tell the AMD's firmware, hey, here's how this is actually connected. you know, these logical, these physical fives, you know, I've got an X 16 slot. I've got, you know, in our case, we've got 10, uh, X four slots for basically every front facing you.
Dot two, right. You know, an X 16 slot for a Nick. Um, you'll have other things for other folks. Or if you have a kind of like a board, like showed up in the chat, you know, you've got some number of X 16 slots that map to things, some, some probably M dot two slot. So you have to tell it what is all, you know, what all is there.
So it can basically go and reprogram the internal crossbar to say, okay, these lanes should be PCIe. You know, George mentioned earlier that when you have a two-processor configuration, you know, some of those lanes are being used for that. So that's part of it. If you use SATA, which I, you know... getting less and less common.
Yeah, that's right.
Uh, you know, some of those lanes are come from those same PCIe lanes. So, um, that's the first step is you're kind of doing that. Then the next phase is after you've done that is, uh, if you open up the PCI, uh, base specification and, in like chapter two is a very long state machine. It has a lot of different states and a lot of different phases.
Obviously, what does it mean to basically have a PCIe device end up at the other end and have both sides be able to talk? Um, and so device training is basically going through that process, discovering, is there even a device there? Right. And from there trying to say, okay, let's start talking to one another, figure out how we can talk, then what speed we can talk at.
Once we're good at, you know, a certain speed, then they'll increase, um, to additional speeds, sending these things called ordered sets and training sets and lots of different acronyms that you hope generally just work and you don't need to think about. And then unfortunately, sometimes you do need to think about them.
Right, when they misbehave. So there's a lot of low-level work that we need to go do. And how do we, in terms of like, we don't yet have, I mean, Eric and Nathaniel and crew are working on Cosmo as we speak, kind of finishing up Cosmo. How do we work on that before we have our own board in hand?
Yeah, so the main way we do this is that there are often reference platforms. So, you know, if you look at George's article, all of his testing was done on a volcano platform, which is the name of a platform that AMD developed that was specific for Turin. There's a couple older generations.
Is it too much to hope that someone has a sense of humor that named that thing volcano?
Honestly, I would not surprise me if somebody did have a sense of humor.
I mean, you had Volcano, you had Puroko. I'm trying to remember what the other two were. The ones before that were all metals.
You had, like, Onyx. Yeah, no sense of humor. I just like the idea that, like, Inferno, you're going with the... So, yeah, so we got the Volcano reference platform from AMD.
Yeah, so we are doing ours mostly on a bunch of Ruby platforms, which were the ones that first came out for Zen 4. And so we have those, which gives us generally most of the schematics and other bits there. You generally get most of the firmware, but not all of it. So you can't always do all the things on the board that you think you should, like a reference platform that you do.
But that gives us a development platform. So we're fortunate that we were able to get some early silicon from AMD, so we could actually start doing development of that ahead of launch. Right.
And then Nathaniel, do you want to talk about kind of how we use those dev platforms? Because we've got a little, a great little board there.
Yeah. So notably we, uh, we're able to take the BM. So in Ruby AMD made this BMC board called a Hawaii board that has an a speed BMC and kind of all your, your traditional BMC stuff, but it's on an OCP card. So we can pull that out. And so we developed our own OCP, uh, form factor card. And we, uh, we call that the grapefruit.
and that goes into that bmc slot and connects to the ocp connector but it has rsp and rot and uh xilinx fpga there and some flash and you know kind of basic so i3c level translators that kind of thing so we can kind of hotwire into the the ruby dev platform with our you know what is effectively our bmc topology and do some development there
And so that's kind of... I've been working on Grapefruit a lot the last few months, I guess, and doing some of the FPGA work and trying to get the thing integrated so that Hubris runs on it and we've got, you know, our Ethernet stack runs on it and all of that. And that's all going pretty smoothly. And we're getting close to... being able to use our grapefruit board as an eSpy boot target.
So I don't know that we have talked a lot about eSpy boot.
Yeah, we should talk about eSpy because this is definitely a difference in Turin.
Yeah, so Turin supports... You can do the standard Spynor boot, like all of these devices have done for a while. But they added eSpy boot, which is based off an Intel standard. And so it's an extension to eSpy, which is kind of like an LPC replacement. But it's an extension to that that allows you to have...
what they call slave attached storage and or slave attached file storage or some flash storage something like that and That allows you to boot off of over the spy network and it's basically built exactly for the server use case So, you know you have a BMC or some kind of device sitting there and then the flash is hiding behind that and so you talk spy to that device and then it goes off and fetches the flash and does whatever it needs to do and
you know out of spy nor and and or I guess you could you know do it off of NVMe or something if you'd like and You just feed it the bytes that it requested back over the east by interface and so and why that's how we're planning So so what's that what why is what why did we need to enhance spy to do that?
Why can't we on spy?
Well, I mean so you can talk spy to a nor flash, but that's that's basically all you can do is And the eSpy protocol kind of sits on top of what looks like a fairly standard spy, like QuadSpy interface. But it allows you to go request transactions, and then you just wait until the device goes and gets them. So, you know, Flash is notoriously slow.
And so if you are the only device talking to a, uh, like a quad spy and you ask it for something or you ask it to go do an erase or whatever, you basically just kind of have to hang out and spin your tires until it finishes and wants to give you that data back. And over the eSpy interface, they do posted and non-posted transactions.
So you can do these non-posted transactions and say, hey, I want a kilobyte of flash from this address. And you send that message. And then you can continue on using eSpy, talking to the device, doing other things, while whatever the eSpy target is goes off and does the work to get you your kilobyte of flash. And then it'll let you know with an alert that it has data for you.
And you can come by and fetch it when you want.
That's right. And spy, as my kids would say, spy has no chill. Spy, you need to give it what it needs. You're set on the clock. There's no clock stretching in spy. And so spy interposition becomes a real nightmare because you need to get everything you need to get done, you need to get done in that one clock cycle. It's like, yeah, that's really hard.
Right. And on Gimlet, what we did is we put actual analog spy muxes in so we could flip between our A-boot and B-boot flash images. So we'd actually just swap between chips that way. With eSpy, none of these images are very large, so you end up buying a commodity flash part with, say, a gigabit of flash storage. And you only need 32 megabytes or something like that.
You know, you need these small like PSP images there. So with eSpy, we can also go down to one flash part. And then the the FPGA that's acting like the eSpy target will just translate, you know, into the high or low pages basically of the flash. And but the AMD doesn't really have to know the difference. So that makes things a little simpler on our end.
Yeah, and the other nice bit there is that as you get into DDR5, one of the big problems is training time. Yes. So one of the things that AMD has is that you actually, after you train the first time, you actually end up writing back a bunch of this data into that spy flash. Yeah.
And if you have it, you know, without that virtualization, then if you're trying to kind of hash or figure out like, you know, how do I make sure this contents are all what I thought I wrote down and that it just gets gnarlier.
And this kind of just indirection layer, you know, computer sciences, you know, it's one contribution is adding another layer of indirection just to cheat just comes in handy.
Yeah, and just on the training times, without that, without kind of, and so dim training where you're trying to find the search for the constants that are going to allow you to not have interference when you're talking to these dims, that search can take a long time. And Robert, how long does it take you when you've got the first, that first genoa that you've got-
Yeah, the first thing I had, which admittedly was early, A0 silicon. This is no shade on AMD. But I think it was one dim of GDR5, not that big. It definitely felt like minutes. It was 11 minutes, I believe, was the number. Was it really?
Yeah, it was. I've put it out of my brain. Yeah, George, I don't know if you've seen some long boot times, but DDR5 takes a long time to train.
So I know when I was first booting my 9950X system, I put in the dims, I turned it on, I went, grabbed a cup of coffee, came back,
noticed it was still booting was like okay let me go feed my cats fed my cats came back was still booting it's like okay let me go run to the bathroom come back still booting i'm like is it actually booting like what's taking it and i was about to turn it off when i saw in the corner of my eye that my monitor flashed up i'm like okay i'm finally booted it's alive it's
Yeah, I'm like, okay, thank God it's actually done. And I did screw something up because I had an update to BIOS, which is always a bit of a nerve wracking experience when you have to do it over just watching a flashing light go. And you're like, is it done? Are you done? Did you work? I hope it worked. And then you turn it on and you're just like, okay, hopefully this works.
Adam apparently really prefers your pronunciation, George, to mine.
Yeah, that was delightful. Suffering through bias, right?
All right, I'll do my Duolingo with George on there.
I will say, one of the other major reasons for using eSpy was that it gains us back our second UART channel, which we lost in Turin.
uh because we like hardware handshaking and the second you are in turin doesn't have hardware handshaking and so we you know we're going to plan to do our ipcc protocol between the sp and the uh and the sp5 or the the turin processor over e-spy as well so that'll be a multiplex path and that was something that we had to solve regardless of east east by boot
as a former colleague of mine who retired was fond of saying, why do we have pins for Azalea on SP five? And they couldn't give me two pins to have a second flow controlled. You are.
It was, yeah, we definitely, and I'm not sure, you know, maybe we're, I guess we're a bit unusual on this, but boy, we were unusual. We need the, yeah, that, So, George, we were using, just as Nathaniel mentioned, we were using one of the UARTs as the IPCC, which is the interprocessor, right? Communication channel.
But this is our protocol for the socket, the host OS, to be able to speak to the SP, which is our replacement for the BMC.
Um, and we were specifically looking for something that, that we could use that didn't require PCIe training or very much, uh, from the peripheral space that kind of keeps us stuck in the FCH. Uh, even USB obviously requires a lot of smooth bring up and shenanigans. So that was out of the picture. Um, so we ended up with the UART and, uh, actually the AMD UARTs can go up to three megabaud.
which is more than you know well okay well okay so they actually they can't go up to three megabod by default they didn't we we actually we needed a uh and well the rs232 level translator to go from like the 3.3 volt to i don't know minus 12 plus 12 that could not do Three megabaud.
But we, the three megabaud ended up being very load bearing for us because, because during, well, because the, yes, when we wanted the, when we wanted the PS, when we were doing dim training and are doing dim margining and we were, the PSP is spewing output. It was just happily going at one, one, five, 200. Yes. And there was no token to change it to anything. Yeah. That's for real.
It's not to get a three megabaud. So it was very, it's very slow.
It was very slow. And our friends at AMD, fortunately got us a, a fix to the PSP to operate at three megabod. And that was very, it was life changing for, for, I mean, I know for 30, 30 X makes a big difference for RFK. 30 X means when, when it's 30 minutes, that 30 X is like a real actionable human 30 X. Not all 30 Xs are the same.
And when something takes 30 minutes, taking 30 X off of that is a, is a big deal.
uh so the good good news is with with the e-spy i think we can get significantly faster than three meg too so it'll be interesting to see what that looks like in practice uh e-spy is a little weird it's like it's it's simplex so you can only transmit one direction you know at any one time uh but you can get you can do quad at 66 megahertz so we should be able to get uh something a little bit faster than three meg i think and that is yeah sorry george go ahead
It's really funny that you guys are talking about like three megabaud and whatnot, because, so slight tangent. So I used to work, back when I was in college, I used to work at the on-campus observatory. And there was a data uplink from the observatory to the lab, which was about a mile and a half distance. It was still running 800 baud for some serial connection.
Oof, 800 baud.
Yes, yes.
You could practically run a message up there faster.
Yeah, but mind you, this was just a, all it was was basically just the go signal to start the power up for everything.
Yeah, the go signal still takes several minutes to transmit.
Yeah, so basically you would send the message and then you would either walk or drive up And by the time you got there, it was done. But it was like, and then, and then the way that you had to connect to all of it was through a, through a BBS.
I'm not even joking. This sounds like a dream that I would have that I would describe to Adam.
The system was built in like the 1980s. It was not updated until 2020. Yeah. Wow. Yeah.
I'm sure. I would like to believe that original designers and Oxide and Friends listeners would be like, oh God, that was still in use? That was supposed to be for a weekend. That was not supposed to be. That was a temporary fix. Totally.
Oh no, this was no temporary fix. It was designed like that. The... So...
And so Nathaniel, maybe worth elaborating a little bit why the three megabod is so actionable for us beyond just the margining and the Mbis results, because this actually ends up becoming, because this is our conduit for the SP to talk to the host CPU, we use this in the recovery path. So like if you've got a system that can't talk to anything else, it's gonna load its image
via that link and being able to go faster than three megabaud is going to be really really nice right yeah yeah i mean that's that's kind of the big thing i think the big hope here is that for cosmo when we do recovery we could potentially use the you know like i don't know i'm hoping to get it you know somewhere up in the 12 megabaud but it's going to depend on you know how busy we are doing other things on that link too because it's a shared resource so
When we talk about recovery, think about like DFUing your phone or whatever. We use this during the manufacturing process. So if a server has kind of gone out to lunch in some way or we just want to wipe it clean, we're using this mechanism and going to 3 megabot.
Yeah, and I think we're replacing the Spinar image basically over 3 megabot. So it's, you know, slow. Yeah.
It takes a minute as the kids say, but it actually takes like, we're actually doing two different things, Nathaniel.
So we, we first are writing the spine or which actually goes much faster.
Yes.
That part is quick. But then we basically, instead of sending the full M dot two image that we would boot from, which would be like a gig and basically be, you know, an eternity in that world. We have a slimmed down, basically kind of phase two image. So unlike a traditional BIOS where you're basically splitting up, you know, the BIOS is in your spy flash.
It's, it, it sits there and then kind of goes and pretends to reset the world back into 1970 after waking up and changing everything and, you know, turning on all the CPUs so it can turn them off again. Um, you know, we basically have a continuous operating system image. So basically, but we just kind of say, Hey, you find it half your like Ram disk, half your modules somewhere else.
So, um, we end up when we end up doing the recovery, we end up sending kind of a like slim down, just, just, you know, a measly a hundred megabytes over this, this small link. Um,
And actually, and so George, in all honesty, like part of the rationale for this is to get us out of those moments of terror when you are flashing a bias and you have gotten often no recourse if that, if that goes sideways. And so this gets us out of that because we know that the system at the, at the absolute lowest layers of the system, we can get the system to, to be able to boot. And we,
it gives us much more control over the reliability of the system, upgradability of the system, manageability of the system. That's how we're able to get... Oxide rack can arrive, power on, and get going and provision VMs in minutes instead of days, months, weeks, whatever.
Speaking of sort of... Again, sort of a question to you guys, because this is stuff that I... I know a lot more about CPUs and GPUs than I do sort of the networking and the sort of lower level intricacies of all this.
Well, I feel like this is like, we're like the sewer people and you, you know, like you get to, I mean, you've got this glorious palace in terms of the cores that have been built. And meanwhile, it's like the sewer people are happy about not being at three megabot. Like what's going on. No, it's, it's a big deal down here in the sewer.
Yeah. But sort of, So I asked you this back when I was in San Francisco meeting you guys in person. What do you think of sort of the updates to OpenSeal and how that's been going to get rid of AGESA?
Yeah, so we are all in favor. So we have been, and actually it was funny because I actually first heard Turin the code name Turin when it was accidentally blurted out on one of those OpenSeal calls. I'm like, okay, what is Turin? And I remember asking Robert, like, that's a city in Italy, so it must be the next thing, but we hadn't heard of it yet.
And OpenSeal was going to intersect with Turin, which of course, when we were first hearing that, it's like, oh my God, that just feels like Buck Rogers. It's like in the year 2041. But of course Torrent is not here. And that work we are very, very supportive of. We are not actually using any of that because it's a different model.
It's kind of going to, it's still a traditional model of a bootloader that's going to effectively make the system look like it's gone backwards or send the system backwards to boot a host operating system. And we've got this staged approach where we are running a single operating system the entire time.
So it doesn't fit our model, but we're extremely supportive of it because we believe that we want these lowest levels of the system to be completely documented. And we want there to be room for many different approaches. And so I think that we're very supportive of OpenSo in that regard.
Yeah, because I know when we last talked about OpenCell, it was very much in sort of the initial stage of it being ramped up and what was happening with it. It does seem like AMD is adopting more open standards with regards to sort of, because they also announced Calibra, which is open source root of trust stuff.
Yep. Yeah, so we're excited to see how all that kind of starts to change. I mean, there's a recent... I dropped a link in there. I think they did this at OSFC. They talked about how they're going to have OpenSeal kind of be the mainstay more so for Venice. Yes. And so I think, you know, from our perspective, this is all good. It kind of gets us out there. We can start to point to things that...
you know, are in open cell and ends up being a, a win for, uh, uh, for everyone. So I think it's, it's basically, it's excited. We're, we're excited to see it. Parts of it. Um, you know, we may be able to leverage directly, but if not, you know, we can be inspired by it. They can be inspired by us and vice versa.
Yeah, and it's very nice to be able to go compare nodes, especially when things aren't working. Helpful to have multiple implementations out there. And I also think that the model for AGISA, the programming model, makes it very difficult to reason about the overall system. This is where Robert's eyes are going to start to twitch because...
Robert spent a lot of time in the absence of documentation having to really understand what this code was doing.
I mean, I still think the best bit for me is in SP3 where the SMU to do hot plug, it speaks over I2C. And I'm pretty sure it, the smooth itself does not reset the I squared C peripheral to run at a hundred kilohertz.
When the, when the I squared C peripheral restarts, it starts in basically fast mode plus, which is this weird push pull mode at like faster than 101 megahertz, which basically means it don't work. Yeah. And the only there was definitely no explicit initialization.
It's just that, hey, this Dixie module in a, you know, dependent on this Dixie, which probably just did a generic blanket I squared C initialization, which changed, you know, which reset everything to 100 kilohertz.
Well, and that's it. I mean, when you have these kind of, this is why it's so healthy to have different software ecosystems on the same hardware, because you don't want things to be working by accident. You want them to be, and it's, you want things to be well-documented and with well-committed abstractions. And failing that, it's good to have the software out there.
So it's, no, it's been, George, it's been good. And we're, I think, excited to continue to see that.
um the uh some of the other like lowest level i lowest level differences on turin um we you mentioned the dims for channel and we we kind of had a um a fork in the road in front of us in terms of two dims for channel two dpc versus one one dpc and there's a trade-off there to be made and robert what was the i mean
Yeah, so the big... So I think to help understand making the trade-off for DDR5, you kind of have to go back to DDR4. So when you have two DIMMs per channel, the way it works is that kind of in the channel, or you go back even further in time, you'll actually find platforms with three DIMMs per channel. you basically are daisy chaining the channel.
So the traces will literally go up to the first dim, then continue on to the second dim or to the third dim in those platforms. So just the presence of that second, of having two dims on there sometimes changes the SI. In DDR4, it often didn't. So if you only had one dim populate, you could still get the maximum memory speed. possible.
However, in DDR5, just the presence of two DIMMs per channel drops dramatically what maximum speed you can hit. And then if you actually make the mistake of populating it, then that drops the speed.
Well, not just populating it, the fact of having that channel, right? So for Turret, it's 6,000 up to 6,400 with validation, but 6,000 with one DPC. 4,400 with two DPC, and then 5,200 if you're running one DPC in a two DPC board. So the fact of just having that second channel, you're losing a whole bunch of your memory clocks.
Yeah, then for us, the other big change is that from SP3 to SP5, you went from 8 channels to 12 channels.
Yeah.
And so just for us, since we kind of have this half-width system, that's, I guess that's, what about, I don't know, one of the other, I don't know, Eric, do you remember what the width is on that?
PCB is 10 inches wide. Yeah, 300 millimeters. A PCB is, yeah, 10 inches, so 300 millimeters-ish. Yeah.
So basically, we were in a place where you could fit 16 DIMM slots, but you weren't going to make 24 DIMM slots magically appear in the space of 16, not unless you got very creative. So we ended up saying, okay, between that and the fact that you now had 96 gig and 128 gig RDIMMs without going to 3DS, which means you can actually purchase them and pay for them without a lot of blood money, then...
or really you're not basically fighting against the GPUs and HBM, which means you can actually get them. Then that kind of, kind of put us down to, okay, well that, that, you know, you want the memory bandwidth. That's definitely one of the big values here. And the memory latency for a number of applications can definitely matter and you can still get to capacity in other ways.
So that, that's kind of, I ended up kind of going at a kind of 12 channel, one DPC kind of configuration and, Because we looked at saying, okay, was that better? The other option was, hey, eight channel, two DPC. And that just kind of seemed kind of the worst of all worlds.
Yeah, I think that the 12 channel, one DPC move is probably the right move. I do like that AMD is giving the option for a two DPC setup with all 12 channels. But I could definitely see how people would really want Especially if in the future we go to, say, 16 memory channels. There's no way you're doing 2 DPC on that. Right. Yeah. Right? We're going to have to go to 1 DPC.
Now, stuff like MR DIMMs can help with capacity and bring back that sort of 2-channel capacity or the capacity that 2 DIMMs per channel will get you. But... Yeah, I think the 2D PC has been, the writing has been on the wall for it for a long time now.
When I think just in general, when we had a trade-off where we'd have to give up memory latency, we have always felt that memory latency is really important. You want to get maximum. You want to minimize memory latency, and you don't want to take a hit there.
Well, in the DDR4 world, where you went from 3200 to 2933, that was a very easy cost to pay. If you were telling me to go from 6400 to 6000...
probably could make that you could probably make convince yourself that that is actually worthwhile but you know 5200 4400 that's a that's a that's a lot farther from 6000 it's a big yeah right it's a big big chunk to take out and in terms of mr dims because this is a domain where you know intel is still uh basically the all i mean they because we're pretty standard on mr dims right let me let me let me get on my soapbox for about 30
Because Intel's MRDIMS on Granite Rapids is not the JDEC MRDIMS. They are different. It's essentially just MCRDIMS relabeled, which made me tear my hair out because they're not technically compatible standards.
Yes.
So I wanted to scream and shout and let it all out as the famous song goes, because that was utterly infuriating to me because you're saying you have MR Dems, but they're not really MR Dems, the Jadak spec.
So, you know, Georgia, what I love about Oxide and Friends is when it comes to the soapbox of a dims being not being per JDEC standard. It's actually a line of a soapbox here at Oxide and Friends. There's actually this is this is the because this is a soapbox. Robert, you know, the soapbox you've been on the soapbox.
And it's, yeah, it's frustrating, but they, but MRDMs, I think when the, when they are JDAX standard, right, because they will be.
Yeah, the MR part of it is there. I think you'll, then once the questions, you know, as that slowly enters the market and memory controller support and, you know, seeing the costs, you know, assuming you can get the cost not to be ridiculous because volume is definitely one of the big parts of the DRAM business.
But, you know, for us, because we have a platform that's not trying to scrunch everything into a 1U, you know, a higher dim just means a new thermoformed air flow shroud. Right. And that's pretty easy to go fit in. You know, for us, the added height is not a problem. For other platforms and other chassis, you could be kind of SOL.
Yeah, well, I think that, you know, one of our kind of big revelations, and again, this is not due to us, I think the other hyperscalers done this as well, but that the...
you actually, the way to have maximal density is not necessarily to have maximal physical density, that you want to open up some room for airflow and you can actually get higher density by having, by using a little bit more space and being, you know, where the rack is nine feet tall. And so we, you know, using some of that, trying to use some of that space to get higher density.
Um, the, uh, actually, can you just talk about bacterial vias for a second, Eric, just because you mentioned in the chat, I don't know, George, do you know about bacterial vias? This is, this is truly amazing stuff.
I don't.
Yeah. Oh, yeah.
Yeah.
I'm sure somebody will put a link to what the definition is in the chat as well. But basically, whenever you design a circuit board, you have a circuit board is essentially a set of two-dimensional layers that are interconnected in a third dimension. So it's kind of like two and a half D. And so to interconnect between these layers, you have what are called vias.
And these vias are, in their most fundamental form, just a tiny hole drilled in the board that's plated with copper that connects multiple layers together. Unfortunately, when you get server motherboards and these bigger, higher density things, you have to use more layers and they get thicker.
And it turns out that when you run high enough speeds, the length of that via from the top of the board to the bottom matters. And so if you have a signal from the very top of the board going through a via to the very bottom of the board, you can make that via look kind of like a trace, like a wire to the signal, and it won't really notice it.
However, if you go like from the top layer to one routing layer down, which is like layer three, skipping a ground layer, going from one to three, Great, fine. But then you have this via barrel, this wire, essentially, that's hanging off this trace that goes from layer three all the way down to the bottom. And that piece of wire looks a whole lot like a capacitor. If I remember right, my RF rate.
And so it basically creates a stub that causes a resonance. And when it resonates, it sucks all the energy out of your signal. And it turns out that things like DDR5 have high enough frequencies in them to require backdrilling. And to give you an idea of when you need to do backdrilling, I've designed boards that run like 10 gigabit lanes, so 10 gigi on a single lane.
That's a normal, you know, two millimeters thick kind of thing and didn't need backdrilling on it. And two millimeters thick is fairly standard for a server motherboard. 1.6 is like a commodity PCB. When you run like 28 gig, you have to back drill it. And certainly when you run higher than 28 gig, you have to back drill. So PCIe is 32 gig in Gen 5 now, so that has to be back drilled.
But what's crazy is even DDR running at six gig per lane, 12, double data rate, whatever. The frequency isn't that high, but the frequency content is high enough, and because it's single-ended, those vias start adding up, especially when you're routing in the top layers. And so we now have to backdrill a whole ton more vias than we used to.
It used to be just those super high speed lanes like PCIe and, you know, 100 gig Ethernet and stuff like that had to be back drilled. But now you're doing like thousands of these things. And it turns out back drilling a via is really hard. So what they do is they take and remove that stub that's left over by shoving a bigger drill bit in from the other side.
So they literally drill it out twice, once from the top, plate it, and then put it back in the drill, and then drill it again from the bottom. And what's absolutely crazy about this is getting those two drill holes aligned to within like a thousandth of an inch or 25 microns. And they have to do that because otherwise they'll short things out on the rest of our board.
And that takes a fabricator with very high skill.
I just love the fact that we're going to take a drill to the underside of the board. It does feel like a Adam Leventhal PCB engineer kind of approach to this of like, here, hold on, pass me the drill. I'll take this. Exactly. I need a drill and a running start, and we'll take care of it. Yeah, I'll fix your signal integrity issue with my drill. I'll fix it real good with this here drill.
But it is a total precision, and it is 25 microns. Just amazing, Eric, that this is. And then we've got simulation tools. I mean, how do we kind of figure out where this needs to be done? And this is…
Yeah, so we use both ANSYS and ADS. ANSYS is our full 3D, full wave solver, and then ADS is used for most everything else. But basically, we take the board geometry or even just a theoretical VIA and put it into something like ANSYS HFSS, and we can simulate what the effect of that VIA design will be on our overall channel.
And you can do it by just basically creating a fake channel in ADS and then putting in your extracted performance of your vias into that tool. And it'll tell you versus a perfectly, you know, a nice perfect transmission line, how good it is. And your goal is to get that via structure as perfect as a transmission line. So essentially the signal doesn't see it.
It doesn't notice a difference when it goes through a via.
and yeah that takes a lot of time and a lot of simulations well i was just gonna tweaking vias around like thousandths of an inch here and there and i was gonna ask like these f series parts may be actually relevant for you eric you and tom and yes yeah exactly the answer side of things we have it's a per core license and i think we have like a license for like 10 cores or something so like on my personal machine when i run ansys i got the uh
I can't remember the part number offhand, but it's the 12-core variant of the AM5, the Zen, I think it's the Zen 4.
So the 7900.
Yeah, the 7900. So it's not the 3D cache one, but it's just the normal one. But that one will boost up to over 5 gigs. So idling right now, I'm at 5.2 gig. And that turns out to be really helpful when you're running... When you're running these simulations in ANSYS that are insanely single-threaded.
The 9175F is a 16-core turn part up to 5 gigahertz. But it's 16 CCDs with one core per CCD. It's designed for EDA. It is. Absolutely.
And I just love the fact that you got to think of like, you know, who wants that? Like that thing in the SKU stack. It's like, oh, the engineer that actually like.
Exactly. It's all the EDA folks. It's all the EDA folks who are like.
Put this thing under his desk.
Exactly. So we got to get that SKU for Eric and Tom and the other folks that are running these simulations.
I got one of those monster SKUs sitting around one of the 500 waters. I'm like, yeah, that's cool. It'll pull a lot of power. I want the five gig one, man.
Yeah, we got to get that for you. Not a cheap part, by the way, but still. No. We had the ANSYS folks on. I remember we had the ANSYS along with Tom on talking about our use of simulation, which was another great episode. I really enjoyed talking to those folks. And you just learn about the physicality of the stuff just...
blows me away and like i feel i mean i don't you feel bad that we end up running like our dumb software on top of this stuff at the end of the day i just feel like we're kind of yeah seriously well i've been taking all these gigahertz for granted too and just like the level of complexity underpinning this is is bananas it's like we back for this thing to run php it's a good look yes sorry yes
Well, I can serve up cat videos on YouTube, Pastor.
Totally, totally. But it's just amazing. And this part is a great part. You know, I think that we're, you know, I think, George, we were really excited to see, I mean, obviously your in-depth review was terrific. But I mean, I think, George, from your perspective, like this is a part that has really hit the mark in kind of like every dimension it feels like.
Yeah, so the 9175F, I think for any EDA workloads, is sort of the torrent part for that. Then you have the 9575F, which to me feels like the drop-in replacement for all the OEMs for Genoa. You just take out all your Genoa chips and you put that in, and it's just...
You get better ST, so single thread and sort of low thread count workload performance to the 9654, so that's the top end Genoa SKU, but basically just as good multi-thread performance at similar power pulling, similar power numbers. So to me, that feels like the drop in replacement And then the big boys, the 9965 and the 9755.
Those are the top end big performance that the hyperscalers and all the... people who can use that power will grab. Right.
And I think that we, we are going to be, I think, you know, another thing that we're looking at is partnering with Murata and getting for those folks that actually do want to go more than 15 KW for the rack, which was our original design target, but which felt very aggressive in 2019. But then, you know, I think it feels like Nvidia is like, that's like two GPUs now for you. Yeah.
I can get that in for you. Exactly. I can get that to you.
Yeah.
we are, are, uh, we'll be for folks that can go be up above 15 KW for the rack. We'll be able to go do that. And, uh, it's, it's, it's again, it won't necessarily be quiet. Um, but we, uh, we think we're going to be able to air cool that. Um, and those, that's where you get that, you get that kind of seven X consolidation that AMD was talking about. Um, and I think that there's, yeah. Yeah.
Speaking of GPUs, something that wasn't covered in the media, even by us, was that when AMD gave their turn presentation to the media, While AI was a big part, they didn't just... When asked about HPC and FP64, they're like, yeah, we're absolutely supporting that. Do not worry. And that was sort of a big relief on my shoulders because it was like, thank God you're not just talking about AI.
Like, there's HPC going on here. There's more than just... like low data types, there's FP64 things happening, thankfully.
Yeah, and I think that we're excited for the AI workloads too, and I think they're going to get a nice pop from AVX 512, certainly the 512-bit data path there, and you're going to see a lot of those. There are nice pops to be had, we think. But you're right, it was not just AI. There are, as it turns out, other... We also need the workloads to simulate the computer for the AI, as it turns out.
Eric needs the...
So, so yeah, the, the 9575F was, was targeted towards sort of the head node for AI CPU that, that was its, what AMD was targeting it as. But I honestly think that in a general compute sense, it's, it's sort of the all rounder in my opinion. Yeah. Um, so.
Yeah, we think so too. And I think that, you know, I think unlike with our first gen where it was kind of every, we only had one SKU, the 7713P, we're going to allow for some flexibility for Oxide customers inside that SKU stack. We're excited to kind of extend that and then do some of the work around dynamic power control.
We got a bunch of ideas on how, you know, we've got the right foundation to go with. actually manage power holistically across the rack and, and use some of the, there's all, we got a lot of stuff to go, a lot of knobs to turn. Um, and I think it's going to yield a pretty great product. I mean, the hats off to AMD for sticking the landing.
I mean, we are definitely wedded to AMD in a lot of ways in terms of our lowest level of platform initialization and so on. So we're always relieved when they execute well.
Yeah.
Um, and great decision in 2019.
Yeah. Yeah. Um, and sort of wrapping up with instinct because there was some, a lot of people were concerned about the APU chip, which I think you and I had talked about.
Yeah, we talked about the APU. Yeah, boy, we're hitting all the sympathy cards here. Hold on. Hand me the other sympathy card now.
So I think there was some misunderstanding or was misheard in what was said Um, because when I, when I went and I asked for clarification after, uh, after the presentation, what it sounded like was. They aren't making APS every generation right now because their customers see sort of the X use as the AI chip and the ACE uses the HPC chip.
So all of the big hyperscalers are only looking at the X use. And it's like, when, when you have to fight between. The hyperscalers and this slightly more niche part, it's like, yeah, unfortunately that will win. But from what I was able to gather, they do see the APUs as the future for not just AI, but for HBC and moving forward. So they are continuing development. There's no ending going on.
Just much like 3DVCache, That was announced that Turrent X is not coming. And it's because the cadences are different, and there's certain dials that you get to pick. so to speak.
When you were asking about the APU, did the oxide people put you up to this?
No, so this was something that I've been bugging them for a while about. And by the way, if you guys see an AMD MI300A dev kit come out, I'm going to claim some level of responsibility for that.
That's great. We've asked for... Yeah, that'd be great. You will deserve responsibility for an MI300A dev kit. We'd love it.
Yeah, I've been trying to get them to put that out and sell it on Newegg, like Ampere Computing sells their Altera Max bundle where it's the board and a CPU. I'm like, just sell that on Newegg for, I don't really care how much money, just have one.
George, I don't know what we're going to do for you if an MI300A dev kit is for sale on Newegg, but we're going to do something very, very nice for you. I don't know when it's going to be yet. It is going to be... That would be great. We get George a whole wheel of his favorite cheese?
Yes, absolutely.
That's right. For sure. But I totally agree. And yeah, we definitely noticed that the... You know, we have brought up APUs so frequently with them. I feel we've kind of overstayed our welcome with respect to that. Sort of like, you know what? We're going to let you guys say the next thing about APUs. We're going to stop telling you how much we love APUs and we'll let you do the next thing.
Or we'll let George do it for us, which is great. Much more effective. Much more effective. Well, we'll say, yeah, again, if we get an MI300A dev kit on Newegg, it's going to be, we're going to have to do, it's going to be something spectacular. Yeah.
But yeah, I've been pushing them to do that. And in general, I've been pushing them as much as it's within my ability to fix certain parts of their software stack. Rock them.
Yes. Yeah.
Yeah.
Well, you know, and I think that you, I think, and just actually honestly with OpenSeal, I mean, I think one of the things that we really like about AMD is it's a company that doesn't like, it listens. I'd like they kind of, they know what the right direction is. It takes them a while to get there sometimes because it's,
Yeah.
It's a big vessel. And, but, you know, I mean, I remember when we were getting Naples stood up way back in the day and, you know, there was a lot that was still needed to be done. But you could see that like, okay, this is not a trajectory that's really interesting. And then with Rome, it's like, okay, this has just got a lot more interesting. And it was clear to us in 2019, Robert, that they were,
they were on a trajectory or they were surpassing Intel effectively. And then obviously with Milan and, and with Genoa now turn, I mean, it's like, we've seen them like continue to execute, execute, execute. And so, yeah, let's go, let's keep Adam on the APU side.
Yeah. On the APU side in, in just in general on the sort of getting there. Cause the perennial problem for AMD has been software. And to the best of my ability, I've been trying to get through to them that they need to have Rockham support on every single piece of AMD. Anything that has the AMD logo on it should run Rockham.
Period.
With the one exception maybe being consoles, because those are a special little thing. But that's a different argument over there.
That's for Sony and Microsoft to take up.
Absolutely. Yeah. No, love it. And I totally agree.
And you know, we're, it helps not hearing it from, from, but if any AMD users are listening to this at, at super computing and at CES, I'm going to be harping on you guys.
Yes. So we, we love the parts and, and, and we're going to be, we want you to be got some ideas for things to be even better, but touring is a great part. We're really excited about it. And George, thank you very much for, for joining us. It's been great to have. Thank you for having me. Oh, yeah, it's great to have the team. I mean, and Nathaniel and Eric here and Aaron as well.
It's been, and obviously Robert here in the studio with me. And then Adam, of course, to correct my pronunciations and to inform me that my running start is not quite big enough on the back drilling. Um, the, um, you know, I think it's been, uh, we're really excited about our forthcoming.
So I don't think it's, we're excited about is of course, you're going to be able to take the, taking a touring sled and putting it into a, an unused cubby in an, in a, an oxide rack that has Mulholland sleds and just have the whole thing just work. So we're really excited about that. Um, And onward, great part. And George, thanks again. Really, really appreciate it.
And thank you all for joining us.
Speaking of excitement.
Speaking of excitement, we do have one very exciting announcement.
That's right. Take it away. DTrace.conf, we're back. So DTrace.conf is our approximately quadrennial Olympics-like... Let's see, they need the Olympics theme in here.
The Olympiad has arrived.
Our last Olympiad, 2020, was- I'm sure there'll be no copyright violation on the YouTube video if we do that. So we started in 2008, did it in 12 and 16. We're excited for 2020, canceled, and now we're back. So we're going to put the link in the notes. Link is going to go out to the folks who are here live, but it is December 11th. coming right up. So it's going to be an unconference.
It's going to be an unconference. If you were a DTrace user, you want to come hang out at Oxide. We are going to charge you for tickets. It's not going to charge you too much money, but we do have to charge you something. Otherwise, it'll be immediately consumed by teenagers. Teenagers will consume every ticket if we don't charge you anything. Um, very limited supply.
Um, so, uh, hop in there if you're interested in joining us. Um, yeah, it's going to be out of my, I'm, I'm really excited for this. It's gonna be fun.
Oh, it's gonna be great. I mean, I mean, I feel like it's rude for me to say it's my favorite conference, but I've always loved it. I've loved it. It's been terrific. I know.
And it's going to, yeah, it's going to have a different complexion and flavor this year for sure. It's going to be a lot of fun. So I'm looking forward to it. That's for sure. That's right. I know Robert, you've got to, everyone is just like, okay, what do I need to get done now before I've got until December 11th to get said to get my, but we got a lot of things to talk about. So.
It's going to be fun. So join us, detrace.com 2024. And I will not violate any more copyrights by humming.
With your humming? I think we dodged the bullet on that one. I don't know that it was so recognizable.
Exactly. Awesome. All right. Well, George, thanks again. Thank you, everybody. And yes, see you at detrace.com 2024.