Background
There are two storage caching implementations that have been recently added to Linux: bcache and dm-cache. These are similar in that allow a fast device (e.g., an SSD) to act as a cache for a slow device (e.g., a mechanical hard drive), creating a hybrid volume. The dm-cache implementation is exposed through LVM to provide a much easier interface.
Two excellent blog posts comparing bcache and lvm-cache are here and here. I don't need to repeat their content, this post is about how I used lvm-cache for both my laptop as well as my desktop.
Desktop
This was fairly easy. I have an SDD, where I have a 35 GiB partition allocated for the cache (/dev/sda6), and a HDD with a 270 GiB partition that I want to be cached (/dev/sdc3). Skipping the partitioning steps, I set the partitions as LVM physical volumes:
# pvcreate /dev/sdc3 # pvcreate /dev/sda6
Then I create a volume group and add both physical volumes to it:
# vgcreate letovg /dev/sda6 # vgextend letovg /dev/sdc3
An optional step is to tag the physical volumes so I can remember which is which:
# pvchange --addtag @hdd /dev/sdc3 # pvchange --addtag @sdd /dev/sda6
Next I create the logical volume. In the parlance of the lvmcache man page, this is the "origin LV". I specify that the logical volume takes up the entire physical volume, this PV is on the HDD:
# lvcreate -l 100%PVS -n cargo letovg /dev/sdc3
One of the advantages of lvm-cache over bcache is that at this point I can create a filesystem and copy data to or from it, I don't have to wait for the cache to be fully ready:
# mkfs.ext4 -m 0 -v /dev/letovg/cargo # mount /dev/letovg/cargo /mnt/cargo
Now on the SSD I create the "cache metadata LV". This should be 1% of the cache pool size, so with a 35 GiB cache, I set a 35 MiB metadata volume:
# lvcreate -n cargo-cache-meta -L 35M letovg /dev/sda6
Then I create the "cache data LV" on the remaining space on the SSD PV:
# lvcreate -n cargo-cache -l 100%PVS letovg /dev/sda6
A later command gave me an error that there wasn't enough free space in the volume group, so I resized the LV to make room, reducing the size by 9 extents:
# lvresize -l -9 letovg/cargo-cache
Now I combine the "cache data" and "cache metadata" logical volumes together into the "cache pool LV". I opted for a writeback cache rather than the (default) writethrough mode:
# lvconvert --cachemode writeback --type cache-pool --poolmetadata letovg/cargo-cache-meta letovg/cargo-cache
And now I attach it to the "origin LV" to obtain the "cache LV":
# lvconvert --type cache --cachepool letovg/cargo-cache letovg/cargo
And that's it! Here are the output of commands to display the LVM PV, VG, and LVs:
# pvs PV VG Fmt Attr PSize PFree /dev/sda6 letovg lvm2 a-- 35.00g 0 /dev/sdc3 letovg lvm2 a-- 270.00g 0 # vgs VG #PV #LV #SN Attr VSize VFree letovg 2 1 0 wz--n- 304.99g 0 # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert cargo letovg Cwi-aoC--- 270.00g [cargo-cache] [cargo_corig] 61.40 18.53 0.00
This is after several weeks of use. The cache is 61.4% full. Here's a little more information (I trimmed some of the columns out):
# lvs -a -o +cache_total_blocks,cache_used_blocks,cache_dirty_blocks,cache_read_hits,cache_read_misses,cache_write_hits,cache_write_misses LV CacheTotalBlocks CacheUsedBlocks CacheDirtyBlocks CacheReadHits CacheReadMisses CacheWriteHits CacheWriteMisses cargo 572224 351374 0 454928 2492192 449854 1934418
In this case, 15% of the blocks read from the cache LV and 18% of the blocks written to the cache LV have been serviced by the SSD. Not bad. Here's a simple benchmark. I'm going to scan repeatedly through about 200 MiB of binary files for a non-existent string. Each time I loop through I make sure to drop the caches Linux holds in RAM:
$ while true; do echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null; time grep -q blah ASCA_SZF_1B_M02_2011103000*; sleep 1; done grep -q blah ASCA_SZF_1B_M02_2011103000* 0.34s user 0.36s system 27% cpu 2.541 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.31s user 0.38s system 27% cpu 2.514 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.34s user 0.32s system 26% cpu 2.550 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.34s user 0.34s system 26% cpu 2.536 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.36s user 0.26s system 5% cpu 11.184 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.17s user 0.20s system 77% cpu 0.479 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.19s user 0.19s system 79% cpu 0.485 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.22s user 0.17s system 78% cpu 0.485 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.25s user 0.14s system 80% cpu 0.492 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.19s user 0.20s system 79% cpu 0.489 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.21s user 0.18s system 79% cpu 0.493 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.21s user 0.18s system 79% cpu 0.493 total
It takes about 2.5 seconds to read the data when it's on the HDD. When it's been accessed enough, lvm-cache (well, dm-cache) decides it's time to copy it to the SSD cache. It's interesting that this creates a large slowdown, taking over 11 seconds to finish. However, all subsequent accesses are from the SSD cache and now take just under 0.5 seconds to read. For comparison, this is the speed when Linux uses the RAM to cache the data:
$ while true; do time grep -q blah ASCA_SZF_1B_M02_2011103000*; sleep 1; done grep -q blah ASCA_SZF_1B_M02_2011103000* 0.24s user 0.11s system 84% cpu 0.422 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.12s user 0.05s system 99% cpu 0.172 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.12s user 0.04s system 97% cpu 0.164 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.12s user 0.05s system 98% cpu 0.166 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.11s user 0.05s system 99% cpu 0.168 total grep -q blah ASCA_SZF_1B_M02_2011103000* 0.13s user 0.04s system 98% cpu 0.166 total
In this case, it takes about 0.16 seconds to read the data.
Laptop
This instance was more complicated since on my laptop I have an SSD and a HDD, but I'm using LUKS to encrypt everything. I'm running LVM on top of LUKS. The basic setup looks like this (this is modified output from lsblk):
NAME SIZE TYPE MOUNTPOINT sdb 59.6G disk └─sdb2 45G part └─ssdmain 45G crypt ├─ssdvg-swap 4G lvm [SWAP] └─ssdvg-root 41G lvm / sda 298.1G disk └─sda3 238.1G part └─hdd 238.1G crypt └─hddvg-home 238.1G lvm /home
I have to use two volume groups because I don't bring up the HDD VG until after the root partition (on the SSD VG) is mounted.
For lvm-cache, the tricky part here is that the cache pool LV and the origin LV must be in the same volume group. So on the SSD I have to split the PV into two PVs. The first PV is used, as before, for / and swap. The second PV is used for the cache pool LV and is added to the second VG, containing the HDD PV:
NAME SIZE TYPE MOUNTPOINT sdb 59.6G disk ├─sdb2 45G part │ └─ssdmain 45G crypt │ ├─ssdvg-swap 4G lvm [SWAP] │ └─ssdvg-root 41G lvm / └─sdb3 14.1G part └─ssdcache 14.1G crypt ├─ssdvg-home-data 14.1G lvm └─ssdvg-home-meta 16M lvm
The rest is fairly straightforward to create the cache pool LV and attach it to the origin LV. I must say, it's nice using LVM because it supports online resizing. I had to boot into installation media to shrink the ext4 filesystem that / is formatted as (ext4 supports online expansion but not shrinking), but the other operations of shrinking the PV, making a new PV, creating a new LV, adding it to a VG, and so on, I could do "live".
Conclusion
I don't see much online about lvm-cache, but I tried it out and it works well for me. On one system, it's used with an ext4 filesystem, and on the other system, I use it with btrfs. I haven't noticed any problems in either case.
Comments !