cichy45 & InquiringMind
You guys are missing the point!
Let me explain:
Windows will auto use any spare DRAM as a file aware read cache, regardless of whether you have PrimoCache installed or not.
(That's assuming your system drive is a HDD, or that you have edited the registry to force the said read caching)
ie:
~all your DRAM is used (dynamically) anyway.
What is NOT used as extra/spare RAM is any spare GPU VRAM...
So, with GpuRamDrive you can add a lower (slower) tier of disk caching, to further speed up I/O, in much the same way PrimoCache normally uses a SSD...
but no SSD reqd = wider customer base for Romex...
Ideally, said cache would also be dynamic, decreasing as the GPUs VRAM is required for graphics.
GpuRamDrive: GPU VRAM as RAMdisk: Game changer!? Topic is solved
-
- Level SS
- Posts: 477
- Joined: Wed Oct 06, 2010 11:10 pm
Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?
This is actually a bad thing because:
- it results in multiple levels of caching which then can mean the same data being stored multiple times and multiple checks to determine if a particular item has been cached or not;
- Windows' file cache takes primacy over PrimoCache - initially both will start empty and fill up with identical copies of disk data. However since Windows' file cache gets checked first, the disk reads it satisfies will not be seen by PrimoCache and PrimoCache will flush its copy of this data out, resulting (eventually) in Windows' cache holding the most frequently requested data and PrimoCache the next most frequent. That is, until an application needs a large amount of memory which is then taken from Windows' file cache - this then results in the most frequently accessed data no longer being cached until it is read again and stored by PrimoCache.
This would add a third level of caching (with the results above) but with the added complications of a slower speed (so PrimoCache would have to figure out which cached data was less important in order to relegate it to GPU RAM), less capacity and, if a static size, potentially crippling performance for games which, if out of GPU RAM, would try to use system RAM instead. Since it is only gamers that are likely to *have* GPUs with large amounts of RAM, the loss of performance here is likely to be far more significant than any gain on disk speeds.
Comparing this feature with PrimoCache's L2 caching is also missing the point. L2's plus point is that it is limited by SSD size rather than RAM size so can be an order of magnitude larger, and since it is non-volatile it poses less risk of data loss than L1.
That would avoid the game-crippling performance issue, but then you'd have poorer cache performance since a dynamic cache would be harder to index/search than a static one and cached data would be lost after running any GPU-intensive game.
Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?
Would be really cool! Nice idea I've used GpuRamDisk, I have a 4GB Graphics Card where I use only 1-2GB max, the others 2GB would be cool if used as Cache, companies would use it to speed up buying 8GB/16GB cards to use in DDR2/DDR3 servers that has max amount of memory and it need to be used by the other softwares and OS
Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?
I don't give a flying-uck if it's slower than DRAM InquiringMind! So is my SSD, my HDD, Flash drive/s.InquiringMind wrote: ↑Fri Jun 04, 2021 6:19 pm I'm in agreement with Cichy45 here. GPU VRAM does offer greater bandwidth than system RAM, but only for GPU-specific work (graphics rendering, GPUGPU, etc). Using GPU RAM more generally (as a ramdisk or disk cache) leads to it being restricted by PCIe bandwidth, which is far less than that of system RAM.
In addition, GPU RAM tends to be more expensive and it is only comparatively recently that graphics cards have shipped with large quantities (>2GB). Even then, the amount included is fairly small compared to most motherboard's RAM capacity.
So this would represent a more expensive, slower and more limited option. A waste of time to implement I would suggest.
But ALL of them are slower than GpuRamDrive.
Especially in the low que depth, random 4K dept that makes up 66% of Windblows I/O...
My RAM is full and busy elsewhere, while my GPU and it's RAM sit idle, doing F-all..! Get it???
Not only that, I have a spare GPU, gathering dust, that could be used.
Seems to me you're so keen on a good argument that logical, deductive thinking escape you when you eagerly start banging your 'superior intellect' into a keyboard!?
This may be difficult for you to grasp, but it's OK for people besides you to have a good idea... really it is!
Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?
It *may* be a good idea, but I think InquiringMind mentioned it should be for a separate product which I would agree with, or at best an add-on product. And the market segment for it would be much smaller than what Primocache currently has, so the idea of developing it as a paid product (for a demographic that typically doesn't like to pay high prices for utility software) doesn't seem to be a very attractive one.
Even if produced as a recurring subscription add-on to pay for development, it would only be useful for gamers. Rendering uses large amounts of drive space, and the GPU is already in use doing other tasks, so digital artists probably wouldn't benefit. When gaming, vRAM is more in-use which limits the cache space you could use. The only utility you might see out of it is as a Gamer, when not gaming, which sorta goes against the "useful for a demographic" idea.
Theoretically it's good on paper, realistically it's a mess. No one's going to go purchase a 30XX or equivalent card just to utilize it's vRAM, when half the cost could be invested in more system RAM (or RAM + Motherboard for still less than the video card). I currently own a 2080ti with 11GB of vRAM, and I can pretty confidently say I wouldn't enable a vRAM cache, since a good chunk of the vRAM is in use while I game. When I'm not gaming, I wouldn't really need an additional ~6 to 8 GB of cache space, due to having greater than 32GB RAM in the system (64 right now). Primocache on a correctly built/configured system is more than effective enough right now.
The fallacy behind thinking the product is needed, is that gaming enthusiasts spend almost as much on their video card as they do on the rest of their system, which produces a poor build most of the time. Some people still try to get away with 8GB of RAM, which I just have to facepalm at. ~12 years ago I started recommending to friends that they get no less than 16GB, and ~7 years ago I recommended no less than 32GB. Right now for enthusiast gamers I recommend 64GB or more, and that they try to source slightly used video cards (due to mining driving up the prices). Part of that recommendation is for the use of Primocache, and the longevity of the build.
Bottom line for me: despite being an avid optimization nut, I still don't think the product would be profitable, nor do I think the market for it would be significant. And for gamers who actively use their vRAM, it sorta goes contrary to how to build a system effectively. i.e. buying a 30XX video card and then putting ~16 GB of RAM into the motherboard is absolutely a mis-configured system.
Even if produced as a recurring subscription add-on to pay for development, it would only be useful for gamers. Rendering uses large amounts of drive space, and the GPU is already in use doing other tasks, so digital artists probably wouldn't benefit. When gaming, vRAM is more in-use which limits the cache space you could use. The only utility you might see out of it is as a Gamer, when not gaming, which sorta goes against the "useful for a demographic" idea.
Theoretically it's good on paper, realistically it's a mess. No one's going to go purchase a 30XX or equivalent card just to utilize it's vRAM, when half the cost could be invested in more system RAM (or RAM + Motherboard for still less than the video card). I currently own a 2080ti with 11GB of vRAM, and I can pretty confidently say I wouldn't enable a vRAM cache, since a good chunk of the vRAM is in use while I game. When I'm not gaming, I wouldn't really need an additional ~6 to 8 GB of cache space, due to having greater than 32GB RAM in the system (64 right now). Primocache on a correctly built/configured system is more than effective enough right now.
The fallacy behind thinking the product is needed, is that gaming enthusiasts spend almost as much on their video card as they do on the rest of their system, which produces a poor build most of the time. Some people still try to get away with 8GB of RAM, which I just have to facepalm at. ~12 years ago I started recommending to friends that they get no less than 16GB, and ~7 years ago I recommended no less than 32GB. Right now for enthusiast gamers I recommend 64GB or more, and that they try to source slightly used video cards (due to mining driving up the prices). Part of that recommendation is for the use of Primocache, and the longevity of the build.
Bottom line for me: despite being an avid optimization nut, I still don't think the product would be profitable, nor do I think the market for it would be significant. And for gamers who actively use their vRAM, it sorta goes contrary to how to build a system effectively. i.e. buying a 30XX video card and then putting ~16 GB of RAM into the motherboard is absolutely a mis-configured system.
-
- Level SS
- Posts: 477
- Joined: Wed Oct 06, 2010 11:10 pm
Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?
Then accept that you have over-specified/over-spent on your GPU, sell it and replace it with one that only has the VRAM you need, and use the money raised to get more motherboard RAM.
And sell that spare GPU also, before prices return to normality.
This reinforces the point made above - the only people with large amounts of VRAM will be gaming enthusiasts or GPGPU users, both of whom will want/need that VRAM for gaming/mining.Jaga wrote: ↑Mon Aug 09, 2021 1:50 am Theoretically it's good on paper, realistically it's a mess. No one's going to go purchase a 30XX or equivalent card just to utilize it's vRAM, when half the cost could be invested in more system RAM (or RAM + Motherboard for still less than the video card). I currently own a 2080ti with 11GB of vRAM, and I can pretty confidently say I wouldn't enable a vRAM cache, since a good chunk of the vRAM is in use while I game...
If VRAM dropped in price enough to allow for GPUs with more RAM than what motherboards can currently accommodate (128-256GB at the moment) then this could change matters (given the current price for an Nvidia 3080Ti it *ought* to come with that much memory...) but it's more likely that by then we'll be looking at motherboard capacity in TBs.
Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?
Geforce RTX3070 Mobile, 8GB of 256bit GDDR6 (448 GB/s bandwidth) laying around doing nothing, why not test it?
Bad performance, but a good idea. For a proof-of-concept software, it's quite amazing. Imagine how it would perform with proper optimizations. By the way, I also tested building a ramdisk in the normal system RAM, and the performance was almost as bad as with VRAM, which means ImDisk Virtual Disk Driver is not a good ramdisk driver. It also means GpuRamDrive isn't the only one to blame for this bad performance.
Also, using VRAM as the cache means you can NOT USE system RAM for PrimoCache, and the more available system RAM, the better. People with less RAM but lots of unused VRAM could have the best of both worlds. Also, you could make some kind of administration to switch VRAM cache on and off if the GPU starts using more VRAM. Or turn the VRAM cache off (or on) by an executables list.
You guys are missing 2 important technologies available today that would definitely help using VRAM as a cache: Resizable BAR and DirectStorage.
1. Resizable BAR:
https://www.rockpapershotgun.com/what-i ... you-use-it
https://docs.microsoft.com/en-us/window ... ar-support
https://en.wikipedia.org/wiki/PCI_configuration_space
2. DirectStorage:
https://devblogs.microsoft.com/directx/ ... ing-to-pc/
https://devblogs.microsoft.com/directx/ ... ble-on-pc/
2.1. Nvidia RTX IO (Nvidia's DirectStorage implementation):
https://techreport.com/news/3473104/wha ... ia-rtx-io/
2.2. AMD Smart Access Storage (AMD's DirectStorage implementation):
https://www.digitaltrends.com/computing ... s-storage/
And if you compare the raw speeds of the PCI Express 4.0 x16 link against DDR4-3200 (my laptop uses both), PCI Express has the upper hand:
DDR4 3200: 25600 MB/s (https://en.wikipedia.org/wiki/DDR4_SDRAM)
PCI Express 4.0 x16: 31.5 GB/s (https://en.wikipedia.org/wiki/PCI_Express)
So I think it's worth to give it a go, using these newest technologies.
Here are my results with a 1280MB GpuRamDrive:
------------------------------------------------------------------------------
CrystalDiskMark 8.0.4 x64 (C) 2007-2021 hiyohiyo
Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes
[Read]
SEQ 1MiB (Q= 8, T= 1): 2150.918 MB/s [ 2051.3 IOPS] < 3758.15 us>
SEQ 1MiB (Q= 1, T= 1): 1997.493 MB/s [ 1905.0 IOPS] < 524.66 us>
RND 4KiB (Q= 32, T= 1): 267.776 MB/s [ 65375.0 IOPS] < 473.60 us>
RND 4KiB (Q= 1, T= 1): 129.255 MB/s [ 31556.4 IOPS] < 31.59 us>
[Write]
SEQ 1MiB (Q= 8, T= 1): 3145.389 MB/s [ 2999.7 IOPS] < 2448.16 us>
SEQ 1MiB (Q= 1, T= 1): 2904.572 MB/s [ 2770.0 IOPS] < 360.73 us>
RND 4KiB (Q= 32, T= 1): 382.770 MB/s [ 93449.7 IOPS] < 331.38 us>
RND 4KiB (Q= 1, T= 1): 152.177 MB/s [ 37152.6 IOPS] < 26.80 us>
[Mix] Read 70%/Write 30%
SEQ 1MiB (Q= 8, T= 1): 2218.394 MB/s [ 2115.6 IOPS] < 3757.53 us>
SEQ 1MiB (Q= 1, T= 1): 2121.866 MB/s [ 2023.6 IOPS] < 493.77 us>
RND 4KiB (Q= 32, T= 1): 279.845 MB/s [ 68321.5 IOPS] < 453.09 us>
RND 4KiB (Q= 1, T= 1): 135.992 MB/s [ 33201.2 IOPS] < 30.01 us>
Profile: Default
Test: 1 GiB (x5) [R: 0% (0/1280MiB)]
Mode: [Admin]
Time: Measure 5 sec / Interval 5 sec
Date: 2022/07/16 17:35:43
OS: Windows 11 [10.0 Build 22000] (x64)
Bad performance, but a good idea. For a proof-of-concept software, it's quite amazing. Imagine how it would perform with proper optimizations. By the way, I also tested building a ramdisk in the normal system RAM, and the performance was almost as bad as with VRAM, which means ImDisk Virtual Disk Driver is not a good ramdisk driver. It also means GpuRamDrive isn't the only one to blame for this bad performance.
Also, using VRAM as the cache means you can NOT USE system RAM for PrimoCache, and the more available system RAM, the better. People with less RAM but lots of unused VRAM could have the best of both worlds. Also, you could make some kind of administration to switch VRAM cache on and off if the GPU starts using more VRAM. Or turn the VRAM cache off (or on) by an executables list.
You guys are missing 2 important technologies available today that would definitely help using VRAM as a cache: Resizable BAR and DirectStorage.
1. Resizable BAR:
https://www.rockpapershotgun.com/what-i ... you-use-it
https://docs.microsoft.com/en-us/window ... ar-support
https://en.wikipedia.org/wiki/PCI_configuration_space
2. DirectStorage:
https://devblogs.microsoft.com/directx/ ... ing-to-pc/
https://devblogs.microsoft.com/directx/ ... ble-on-pc/
2.1. Nvidia RTX IO (Nvidia's DirectStorage implementation):
https://techreport.com/news/3473104/wha ... ia-rtx-io/
2.2. AMD Smart Access Storage (AMD's DirectStorage implementation):
https://www.digitaltrends.com/computing ... s-storage/
And if you compare the raw speeds of the PCI Express 4.0 x16 link against DDR4-3200 (my laptop uses both), PCI Express has the upper hand:
DDR4 3200: 25600 MB/s (https://en.wikipedia.org/wiki/DDR4_SDRAM)
PCI Express 4.0 x16: 31.5 GB/s (https://en.wikipedia.org/wiki/PCI_Express)
So I think it's worth to give it a go, using these newest technologies.
Here are my results with a 1280MB GpuRamDrive:
------------------------------------------------------------------------------
CrystalDiskMark 8.0.4 x64 (C) 2007-2021 hiyohiyo
Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes
[Read]
SEQ 1MiB (Q= 8, T= 1): 2150.918 MB/s [ 2051.3 IOPS] < 3758.15 us>
SEQ 1MiB (Q= 1, T= 1): 1997.493 MB/s [ 1905.0 IOPS] < 524.66 us>
RND 4KiB (Q= 32, T= 1): 267.776 MB/s [ 65375.0 IOPS] < 473.60 us>
RND 4KiB (Q= 1, T= 1): 129.255 MB/s [ 31556.4 IOPS] < 31.59 us>
[Write]
SEQ 1MiB (Q= 8, T= 1): 3145.389 MB/s [ 2999.7 IOPS] < 2448.16 us>
SEQ 1MiB (Q= 1, T= 1): 2904.572 MB/s [ 2770.0 IOPS] < 360.73 us>
RND 4KiB (Q= 32, T= 1): 382.770 MB/s [ 93449.7 IOPS] < 331.38 us>
RND 4KiB (Q= 1, T= 1): 152.177 MB/s [ 37152.6 IOPS] < 26.80 us>
[Mix] Read 70%/Write 30%
SEQ 1MiB (Q= 8, T= 1): 2218.394 MB/s [ 2115.6 IOPS] < 3757.53 us>
SEQ 1MiB (Q= 1, T= 1): 2121.866 MB/s [ 2023.6 IOPS] < 493.77 us>
RND 4KiB (Q= 32, T= 1): 279.845 MB/s [ 68321.5 IOPS] < 453.09 us>
RND 4KiB (Q= 1, T= 1): 135.992 MB/s [ 33201.2 IOPS] < 30.01 us>
Profile: Default
Test: 1 GiB (x5) [R: 0% (0/1280MiB)]
Mode: [Admin]
Time: Measure 5 sec / Interval 5 sec
Date: 2022/07/16 17:35:43
OS: Windows 11 [10.0 Build 22000] (x64)
Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?
Now here's a wild idea: suppose you can get a good VRAM cache performance that is about 60% of the L1 performance. Now suppose you can mirror the L1 to VRAM, making a kind of a RAID1 cache with L1+VRAM. Now you can read both at the same time, at VRAM speed, getting 120% of that original L1 performance. Doesn't sound bad at all.
Now imagine you can optimize the VRAM cache to the point of getting 80-90% L1 performance. And you make a mirror. And your read speeds are 160-180% of the single L1 cache. What about that?
You could even make a new tier to the overall cache: the L1+VRAM tier on top (180% of system ram performance, up to the VRAM size, if you won't use the VRAM), L1 below it (a lot more GBs than VRAM), then L2. Sounds cool.
Edit: getting even wilder: L1+VRAM RAID-0 cache on top, L1 below it. Squeezing all the I/O throughput you can get from both system RAM and PCI Express x16 at the same time, reading or writing.
Now imagine you can optimize the VRAM cache to the point of getting 80-90% L1 performance. And you make a mirror. And your read speeds are 160-180% of the single L1 cache. What about that?
You could even make a new tier to the overall cache: the L1+VRAM tier on top (180% of system ram performance, up to the VRAM size, if you won't use the VRAM), L1 below it (a lot more GBs than VRAM), then L2. Sounds cool.
Edit: getting even wilder: L1+VRAM RAID-0 cache on top, L1 below it. Squeezing all the I/O throughput you can get from both system RAM and PCI Express x16 at the same time, reading or writing.
Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?
And here is a very informative walkthrough from Nvidia Developer channel on CPU accessing VRAM through PCIe bus:
https://developer.nvidia.com/blog/optim ... ible-vram/
"...effectively use a CPU thread as a copy engine. This can be achieved by creating the DX12 UPLOAD heap in CVV by using NVAPI. CPU writes to this special UPLOAD heap are then forwarded directly to VRAM, over the PCIe bus (Figure 3)."
Figure 3. Preloading a VB to VRAM using CPU writes in a CPU thread
"For DX12, the following NVAPI functions are available for querying the amount of CVV available in the system, and for allocating heaps of this new flavor (CPU-writable VRAM, with fast CPU writes and slow CPU reads):
NvAPI_D3D12_QueryCpuVisibleVidmem
NvAPI_D3D12_CreateCommittedResource
NvAPI_D3D12_CreateHeap2
These new functions require recent drivers: 466.11 or later."
https://developer.nvidia.com/blog/optim ... ible-vram/
"...effectively use a CPU thread as a copy engine. This can be achieved by creating the DX12 UPLOAD heap in CVV by using NVAPI. CPU writes to this special UPLOAD heap are then forwarded directly to VRAM, over the PCIe bus (Figure 3)."
Figure 3. Preloading a VB to VRAM using CPU writes in a CPU thread
"For DX12, the following NVAPI functions are available for querying the amount of CVV available in the system, and for allocating heaps of this new flavor (CPU-writable VRAM, with fast CPU writes and slow CPU reads):
NvAPI_D3D12_QueryCpuVisibleVidmem
NvAPI_D3D12_CreateCommittedResource
NvAPI_D3D12_CreateHeap2
These new functions require recent drivers: 466.11 or later."
Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?
And here is a 9 year old article on CUDA Unified Memory and Unified Virtual Adressing:
https://developer.nvidia.com/blog/unifi ... in-cuda-6/
And here are a few articles on CUDA Unified Memory for beginners, and maximizing Unified Memory performance in CUDA:
https://developer.nvidia.com/blog/unifi ... beginners/
https://developer.nvidia.com/blog/maxim ... ance-cuda/
"Performance Through Data Locality
By migrating data on demand between the CPU and GPU, Unified Memory can offer the performance of local data on the GPU, while providing the ease of use of globally shared data. The complexity of this functionality is kept under the covers of the CUDA driver and runtime, ensuring that application code is simpler to write. The point of migration is to achieve full bandwidth from each processor; the 250 GB/s of GDDR5 memory is vital to feeding the compute throughput of a Kepler GPU."
We are at CUDA 11.7.99 nowadays, by the way.
https://developer.nvidia.com/blog/unifi ... in-cuda-6/
And here are a few articles on CUDA Unified Memory for beginners, and maximizing Unified Memory performance in CUDA:
https://developer.nvidia.com/blog/unifi ... beginners/
https://developer.nvidia.com/blog/maxim ... ance-cuda/
"Performance Through Data Locality
By migrating data on demand between the CPU and GPU, Unified Memory can offer the performance of local data on the GPU, while providing the ease of use of globally shared data. The complexity of this functionality is kept under the covers of the CUDA driver and runtime, ensuring that application code is simpler to write. The point of migration is to achieve full bandwidth from each processor; the 250 GB/s of GDDR5 memory is vital to feeding the compute throughput of a Kepler GPU."
We are at CUDA 11.7.99 nowadays, by the way.