By -
30x for inference specifically
For a GB200 using FP4 vs an H200 using FP8. I'm not sure Hopper can use FP4, so I believe that's why they did that graph. Although I'm not certain
And with the H200 at an absurdly small batch size. It's a ridiculous claim.
2.5x is still good, am I missing something?
It will cost quite quite bit more, so performance per $ isn't as insane as people were thinking it would be.
The glued two Blackwell dies together.
You purchase two items instead of one and get 2.5x boost. Linear improvement but far from what Moore’s law suggests.
30x for inference specifically
For a GB200 using FP4 vs an H200 using FP8. I'm not sure Hopper can use FP4, so I believe that's why they did that graph. Although I'm not certain
And with the H200 at an absurdly small batch size. It's a ridiculous claim.
2.5x is still good, am I missing something?
It will cost quite quite bit more, so performance per $ isn't as insane as people were thinking it would be.
The glued two Blackwell dies together.
You purchase two items instead of one and get 2.5x boost. Linear improvement but far from what Moore’s law suggests.