Request: Separate L1 read and L2 write cache with safe deferred writes
Posted: Tue Feb 06, 2024 12:26 pm
I'd like to have separate caches for read and write, where read runs in L1 only, and write in L2 only.
The read scenario
Data in the L1 cache and the L2 cache can both be used for cached reads, but the L1 cache is never flushed to L2.
Blocks read from disk are never stored in L2. When a block is read from L2, it will not be stored in L1 as it is already cached by L2.
The write scenario
Data is immediate written to L2 (never to L1), a write is only completed when it has been completely written to L2. The written data will stay available on the L2 until the space is needed for new writes (no more free space), where the oldest data is overwritten first, or when the blocks are deleted.
L2 will be deferred written to the under laying disk. When a blue screen or power loss happens, the unwritten data in L2 will be written to the disk during the boot sequence or as soon as the disk is connected (b.e iScsi).
This caching scenario makes sure that:
1. There is as much as possible space in the L1 cache for reads, and the cache won't be trashed by big writes.
2. All writes are completed to disk - never to memory, making sure that all writes are persistent, even before they are written to the destination disk.
3. All writes will be flushed to the under laying disk, even when a power loss or blue screen appears.
4. All written data is cached, making sure that backups (that only backsup changes) are always read-cached over L2, won't trash the L1 cache and won't use the underlying disks, making sure that the uncached reads on those drives are still fast.
5. A much higher write delay can be used without the chance of losing data, making sure that (in Hyper-V replication scenarios) more than 50% of the data will be trimmed and never be saved to the underlying disk, saving a lot of wear on those drives (and a lot of bandwidth in the case of iScsi drives)
Explanation
Using a very fast NVMe drive, you can speed up the writes without the need of wasting memory on it, and in this scenario all data is safe, even deferred writes. Keeping the read cache in L1 will make sure that the drive will always perform (no L1 to L2 flushes) and the complete size will be available for write back caching.
The read scenario
Data in the L1 cache and the L2 cache can both be used for cached reads, but the L1 cache is never flushed to L2.
Blocks read from disk are never stored in L2. When a block is read from L2, it will not be stored in L1 as it is already cached by L2.
The write scenario
Data is immediate written to L2 (never to L1), a write is only completed when it has been completely written to L2. The written data will stay available on the L2 until the space is needed for new writes (no more free space), where the oldest data is overwritten first, or when the blocks are deleted.
L2 will be deferred written to the under laying disk. When a blue screen or power loss happens, the unwritten data in L2 will be written to the disk during the boot sequence or as soon as the disk is connected (b.e iScsi).
This caching scenario makes sure that:
1. There is as much as possible space in the L1 cache for reads, and the cache won't be trashed by big writes.
2. All writes are completed to disk - never to memory, making sure that all writes are persistent, even before they are written to the destination disk.
3. All writes will be flushed to the under laying disk, even when a power loss or blue screen appears.
4. All written data is cached, making sure that backups (that only backsup changes) are always read-cached over L2, won't trash the L1 cache and won't use the underlying disks, making sure that the uncached reads on those drives are still fast.
5. A much higher write delay can be used without the chance of losing data, making sure that (in Hyper-V replication scenarios) more than 50% of the data will be trimmed and never be saved to the underlying disk, saving a lot of wear on those drives (and a lot of bandwidth in the case of iScsi drives)
Explanation
Using a very fast NVMe drive, you can speed up the writes without the need of wasting memory on it, and in this scenario all data is safe, even deferred writes. Keeping the read cache in L1 will make sure that the drive will always perform (no L1 to L2 flushes) and the complete size will be available for write back caching.