NVIDIA Working on New Driver For GeForce GTX 970 To Tune Memory Allocation Problems and Improve Performance

NVIDIA Working on New Driver For GeForce GTX 970 To Tune Memory Allocation Problems and Improve Performance

After the recent update of specifications of their GeForce GTX 970
graphics cards, it seems like NVIDIA is in pressure from their consumers
who seem betrayed that they didn’t get what they payed for. While
NVIDIA gave a reasonable brief explanation a few days back on why the
GeForce GTX 970 has an issue allocating its entire 4 GB VRAM pool to
games, they did so very late and has now been revealed that the GPU
lacks some major components such as ROPs and Cache which were falsely
advertised five months ago during launch.

NVIDIA To Fine Tune GeForce GTX 970 Memory Allocation Issues To Improve Performance

While NVIDIA is under pressure and the GTX 970 owners are flamed up,
NVIDIA’s Thomas Peterson (Director of Technical Marketing at NVIDIA)
has accepted that his company did mess up the stats of the GeForce GTX
970 and that they will soon release a driver which will tune how the
memory is allocated by the GeForce GTX 970 in gaming titles that will
help  improve performance further up. It’s not known how the new GeForce
driver will work but some have suggested that it may be similar to the
GeForce 337.50 BETA driver by focusing towards all its optimizations
towards the GeForce GTX 970 in memory bound conditions such as the 3.5
GB VRAM border after which it is reported that games start to lag or
stutter. Following is the message from Peterson on GeForce forums:

Comment #1 -Hey,

First, I want you to know that I’m not just a mod, I work for NVIDIA in Santa Clara.

I totally get why so many people are upset. We messed up some of the
stats on the reviewer kit and we didn’t properly explain the memory
architecture. I realize a lot of you guys rely on product reviews to
make purchase decisions and we let you down.

It sucks because we’re really proud of this thing. The GTX970 is an
amazing card and I genuinely believe it’s the best card for the money
that you can buy. We’re working on a driver update that will tune what’s
allocated where in memory to further improve performance.

Having said that, I understand that this whole experience might have
turned you off to the card. If you don’t want the card anymore you
should return it and get a refund or exchange. If you have any problems
getting that done, let me know and I’ll do my best to help.

–Peter

Comment #2 -Actually I’m not sure as that’s not a simple issue with
just one cause. Card memory is not just used for the frame buffer,
plenty of driver stuff gets loaded into it as well. We’re looking at
sticking as much of that stuff as possible into the 0.5GB space to leave
the rest available.

Comment #3 -The GTX970 really does have 4GB of memory and can access
all of it. And we’re looking at ways to tweak the driver to better
understand where to put stuff to make it even faster. But I totally get
that it might not be the right product for your specific situation. If
you really want to return it and are getting denied, let me know and
I’ll do my best to help.

The issue started in late November when users started reporting in
through several forums that their GTX 970’s are failing to go past 3.5
GB VRAM in certain games. After a few days, we were quick to come up
with our own analysis showing
that the card can utilize its 4 GB VRAM which is the total memory
available on the PCB but only under highly stressful conditions. It was
not until users who were breaking past the 3.5 GB VRAM started reporting
another issue that their cards can go past the 3.5 GB buffer but either
lag, stutter or show artifacts which made several sites initiate a
testing spree with their GeForce GTX 970 cards and asking NVIDIA for a
reply on this issue. NVIDIA’s first reply was
that the card had a crossbar which had two pools of memory, one was the
3.5 GB VRAM which was faster and used by gaming applications while
other was a 0.5 GB pool which was slow yet still faster then system
memory. This optimally handled the Maxwell core arrangement that was
available on the GeForce GTX 970 GM204 chip since it was disabled SKU.

Just two days after the initial revealing by NVIDIA and five months after the launch of GeForce GTX 970, Jonah Alben, SVP of GPU Engineering, accepted that
the specs of the card weren’t what was showed initially and the specs
of the card were more cut down then previously expected. Shown in a
block diagram of the GTX 970 GM 204 chip, he showed that the chip had
just 56 ROPs as opposed to 64 and 1792 KB L2 cache as opposed to 2048 KB
L2 cache as previously advertised. He stated that it was a
misunderstanding of specs handling by the NVIDIA marketing and technical
team.

GM204_arch_0

A quick note about the GTX 980 here: it uses a 1KB memory
access stride to walk across the memory bus from left to right, able to
hit all 4GB in this capacity. But the GTX 970 and its altered design
has to do things differently. If you walked across the memory interface
in the exact same way, over the same 4GB capacity, the 7th crossbar port
would tend to always get twice as many requests as the other port
(because it has two memories attached).  In the short term that could be
ok due to queuing in the memory path.  But in the long term if the 7th
port is fully busy, and is getting twice as many requests as the other
port, then the other six must be only half busy, to match with the 2:1
ratio.  So the overall bandwidth would be roughly half of peak. This
would cause dramatic underutilization and would prevent optimal
performance and efficiency for the GPU.

Let’s be blunt here: access to the 0.5GB of memory, on its own and in
a vacuum, would occur at 1/7th of the speed of the 3.5GB pool of
memory. If you look at the Nai benchmarks floating around, this is what
you are seeing.To avert this, NVIDIA divided the memory into two pools, a
3.5GB pool which maps to seven of the DRAMs and a 0.5GB pool which maps
to the eighth DRAM.  The larger, primary pool is given priority and is
then accessed in the expected 1-2-3-4-5-6-7-1-2-3-4-5-6-7 pattern, with
equal request rates on each crossbar port, so bandwidth is balanced and
can be maximized. And since the vast majority of gaming situations occur
well under the 3.5GB memory size this determination makes perfect
sense. It is those instances where memory above 3.5GB needs to be
accessed where things get more interesting.

*To those wondering how peak bandwidth would remain at 224 GB/s
despite the division of memory controllers on the GTX 970, Alben stated
that it can reach that speed only when memory is being accessed in both
pools. via PCPER

The reason for this cut down was that the last two 0.5 DRAM had to be
connected to two 32-bit memory controllers which however were situated
across just one L2 cache module. This resulted in vigorous sharing and
handling for just one L2 cache since the other had to be disabled. To
use the last memory block effectively, NVIDIA had to separate a single
piece of DRAM hence converting the card into a 3.5 GB model which only
used the last section of the VRAM when it was really needed. NVIDIA is
now going to fine tune the performance of this specific block and how it
manages resource sharing across a set of applications.

It’s not known when the new driver launches but looking at the amount
of heat they are currently getting, the new drivers will be launched
next month.

Add a Comment