• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Epyc has problems when you max out PCIe lanes

Soldato
Joined
1 Apr 2014
Posts
18,610
Location
Aberdeen
Linus Tech Tips has an interesting video about the problems they had when they maxxed out the PCIe lanes on their Epyc server with umpteen NVME drives


TLDR There are major performance issues; it's all too fast and the bandwidth is overloaded. I'm wondering if Intel solutions have the same problems?
 
Caporegime
Joined
17 Mar 2012
Posts
47,559
Location
ARC-L1, Stanton System
Linus Tech Tips has an interesting video about the problems they had when they maxxed out the PCIe lanes on their Epyc server with umpteen NVME drives


TLDR There are major performance issues; it's all too fast and the bandwidth is overloaded. I'm wondering if Intel solutions have the same problems?

Put simply they built a system with Mass Storage that is faster than RAM.

You need to watch the first part of this build video to understand what is going on. Its simply a case of EPYC CPU's having so many PCIe lanes its possible to Raid so many NVMe drives together the speed of that is faster than DDR4 can keep up with causing transfers between memory and drives to stall out, because the memory cannot keep up with the speed of the drives.

They Raid 24 NVMe drives. They were pushing storage transfer rates of near 30GB/s. that's faster than the memory can keep up with, or at least its about equivalent to DDR4 3800MT/s

Intel's CPU's don't have anywhere near as many PCIe lanes so its not possible to raid 0 anywhere near as many drives so Intel CPU's cannot possibly get anywhere near the speed to cause the memory to stall.

If you actually watch the video they explain what the problem is and the cure is to slow the transfer rates down, to a speed that's probably still faster than Intel can manage.

This is the video where they built this monster....

Edit: if they would use Page Filing they would get higher memory performance than using RAM :D
Edit2: the CPU has 128 PCIe4 lanes, 24 NVMe drives running with 4 PCIe lanes is using 96 PCIe lanes. Memory is 4TB 8 Channel at 3200MT/s.

 
Last edited:
Man of Honour
Joined
30 Oct 2003
Posts
13,251
Location
Essex
More to the point.. 24 nvme in raid 0, who takes that sort of risk in the real world? At the point where you are even considering it you should be considering san storage, at which point the bottleneck becomes the FC network.

Its a good technical exercise but really very little more.
 
Caporegime
Joined
17 Mar 2012
Posts
47,559
Location
ARC-L1, Stanton System
I thought they put on ZFS and RAID 5 or something, not RAID 0.

Initially yes but the software couldn't deal with it so they switched it to Raid 0.

More to the point.. 24 nvme in raid 0, who takes that sort of risk in the real world?

Apparently AWS (Amazon Web Services) have been trying something similar and running into the same transfer rate bottlenecks.

Its useful if you're remote video editing, provided you have a 40Gb Lan to keep up. lol
 
Man of Honour
Joined
30 Oct 2003
Posts
13,251
Location
Essex
Initially yes but the software couldn't deal with it so they switched it to Raid 0.



Apparently AWS (Amazon Web Services) have been trying something similar and running into the same transfer rates bottlenecks.

Its useful if you're remote video editing, provided you have a 40Gb Lan to keep up.

I seriously doubt any hyperscaler would be using any directly attached storage, it just makes scaling up very difficult. Even a business like mine doesnt bother, we use a single lane on rome and that houses the sd card the machine boots from :)
 
Caporegime
Joined
17 Mar 2012
Posts
47,559
Location
ARC-L1, Stanton System
I seriously doubt any hyperscaler would be using any directly attached storage, it just makes scaling up very difficult. Even a business like mine doesnt bother, we use a single lane on rome and that houses the sd card the macine boots from :)

Right but if you have the tools to push the boundaries of whats possible why not play with it, you know, for science.... one day it might become something you as a company can deploy so why not have a look?
 
Man of Honour
Joined
30 Oct 2003
Posts
13,251
Location
Essex
Right but if you have the tools to push the boundaries of whats possible why not play with it, you know, for science.... one day it might become something you as a company can deploy so why not have a look?

Im not knocking it as a technical exercise, more that its just not how hyperscale or anybody works (that i know of) right now. Imagine deploying thousands of epyc rome servers then disks directly attached and not common to all machines. Imagine the carnage!! What happens if you lose a host, that is a single point of failure for all attached disks. Pretty risky stuff.
 
Last edited:
Caporegime
Joined
17 Mar 2012
Posts
47,559
Location
ARC-L1, Stanton System
Im not knocking it as a technical exercise, more that its just not how hyperscale or anybody works (that i know of) right now. Imagine deploying thousands of epyc rome servers then disks directly attached and not common to all machines. Imagine the carnage!!

Point taken, i'm guessing they are just testing the bandwidth.
 
Man of Honour
Joined
30 Oct 2003
Posts
13,251
Location
Essex
Point taken, i'm guessing they are just testing the bandwidth.

Still though those speeds are insane!! Its impressive that we are at the point where the memory subsystem can't keep up with I/O. It's all pretty clear in what they can do to make it even better again and banging against the limits of, well everything, is pretty awesome!!
 
Man of Honour
Joined
30 Oct 2003
Posts
13,251
Location
Essex
CPU was struggling to keep up with the parity calculations too on a 24c48t CPU.

Whats funny is you would saturate even 128gb FC which has a max throughput of something like 26gb/s. You would effectively need to load balance some 4 connections per controller, per server, to get close to the available throughput :D

That is insane and expensive!! Expensive as in something like 10k for a switch that even gets close (for just one server) :D
 
Soldato
Joined
5 Oct 2009
Posts
13,835
Location
Spalding, Lincs
More to the point.. 24 nvme in raid 0, who takes that sort of risk in the real world? At the point where you are even considering it you should be considering san storage, at which point the bottleneck becomes the FC network.

Its a good technical exercise but really very little more.

They use it for editing their 8k Red video for multiple editors so it's being used well. There's only a small risk involved as they wouldn't lose anything hugely important in the worst case as it's not being used for storage, just active projects.

CPU was struggling to keep up with the parity calculations too on a 24c48t CPU.

They even upgraded it to a 64c CPU and changed to a 32c CPU in the end.

It's seriously impressive how far storage has come lately. Always being the massive bottleneck in a system to being bottlenecked by a system, insane really.
 
Soldato
Joined
28 May 2007
Posts
18,241
For this kind of work load a dual socket setup would probably be cheaper and faster. I’d look at a pair of EPYC 7262 or 7302’s.
 
Man of Honour
Joined
30 Oct 2003
Posts
13,251
Location
Essex
For this kind of work load a dual socket setup would probably be cheaper and faster. I’d look at a pair of EPYC 7262 or 7302’s.

Thats interesting, I wonder if the same number of drives in a dual socket would alleviate the issue, im really going to have to watch the video arent I? Commenting up here when i havent even looked. Noob.

Do we know what controller they were using?
 
Back
Top Bottom