Disk mirroring in Linux -- MD-RAID or LVM? Advantages and disadvantages?
Hi all; In Linux I can do software RAID using LVM or with MD (i.e., mdadm) as a basis. I'm currently thinking of a simple mirror of two conventional SATA disks. The machine is being built to do a compute-workload involving examination of many small(ish?) files, and will not be a desktop or a recreational/gaming machine. I don't know anything about the files or how they'll be examined. How would YOU setup a simple mirror in whatever Linux you use? Why do you prefer your selection? Thanks, --MCV.
To answer your question by asking another one, have you considered ZFS? This is what I use for a RAID10 setup in my NAS, in part because it's just what I know how to use, but its checksumming, CoW snapshotting (unlike LVM, you do not have to allocate space for one explicitly), and array management features are quite nice. The only downside is that it's out-of-tree, so if you're using a distro that updates the kernel fairly frequently, you have to wait a little while for the ZFS folks to update it. On my desktop (which runs Fedora, and made using ZFS annoying since it upgrades kernels often), I use mdadm with two drives in RAID1, with xfs on top of them. IIRC, the reason I went with mdadm over LVM was simply because of maturity, but that's perhaps not the reason you're looking for :) Speaking about performance is where I step out of my depth a little bit, but to my understanding, xfs has very good support for a large number of small/parallel reads (don't quote me on this). Might be worth considering using this on top of mdadm. ________________________________ From: Michael Voorhis via WLUG <wlug@lists.wlug.org> Sent: Wednesday, January 20, 2021 3:34 PM To: Worcester Linux Users' Group General Discussion <wlug@lists.wlug.org> Cc: Michael Voorhis <mvoorhis@mcvau.net> Subject: [EXT] [WLUG] Disk mirroring in Linux -- MD-RAID or LVM? Advantages and disadvantages? Hi all; In Linux I can do software RAID using LVM or with MD (i.e., mdadm) as a basis. I'm currently thinking of a simple mirror of two conventional SATA disks. The machine is being built to do a compute-workload involving examination of many small(ish?) files, and will not be a desktop or a recreational/gaming machine. I don't know anything about the files or how they'll be examined. How would YOU setup a simple mirror in whatever Linux you use? Why do you prefer your selection? Thanks, --MCV. _______________________________________________ WLUG mailing list -- wlug@lists.wlug.org To unsubscribe send an email to wlug-leave@lists.wlug.org Create Account: https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwlug.mailman3.com%2Faccounts%2Fsignup%2F&data=04%7C01%7Cnjkrichevsky%40wpi.edu%7Cb3c68c37c73e401ed76308d8bd82cd2b%7C589c76f5ca1541f9884b55ec15a0672a%7C0%7C0%7C637467717222346445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=pRB1cSXP%2FkYV5BjQAC%2Fncv37HCUbciTjE%2BFtCN1eF2Q%3D&reserved=0 Change Settings: https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwlug.mailman3.com%2Fpostorius%2Flists%2Fwlug.lists.wlug.org%2F&data=04%7C01%7Cnjkrichevsky%40wpi.edu%7Cb3c68c37c73e401ed76308d8bd82cd2b%7C589c76f5ca1541f9884b55ec15a0672a%7C0%7C0%7C637467717222346445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vNn45lwUPTBJQ4H6HWLrZ6jT80oRqYrGMWRdydKdRG4%3D&reserved=0 Web Forum/Archive: https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwlug.mailman3.com%2Fhyperkitty%2Flist%2Fwlug%40lists.wlug.org%2Fmessage%2F2IJCRBZRRY2V6JH3DTD5BGZW6SA7VLUB%2F&data=04%7C01%7Cnjkrichevsky%40wpi.edu%7Cb3c68c37c73e401ed76308d8bd82cd2b%7C589c76f5ca1541f9884b55ec15a0672a%7C0%7C0%7C637467717222356443%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QLbNVSi5wLb0gqGwYQ6MoS6qnmmGELQ1cLzbiFrap9Y%3D&reserved=0
On Wed, Jan 20, 2021 at 03:34:25PM -0500, Michael Voorhis via WLUG wrote:
How would YOU setup a simple mirror in whatever Linux you use? Why do you prefer your selection?
MD RAID is set up by the installers I've used (Fedora, Red Hat, Centos). LVM is used on top of MD RAID to provide flexible partitioning. I've never used LVM RAID.
Separate from the OS on the SD card, my RPI4 file server uses both MD raid and LVM. There are two 4TB powered, USB3 external disks, sda and sdb. They are the mirrors in a RAID1 array. # parted -sm /dev/sda p BYT; /dev/sda:4001GB:scsi:512:4096:gpt:WD Elements 25A3:; 1:41.0kB:4001GB:4001GB::RAID2:raid; # cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda1[3] sdb1[2] 3906853808 blocks super 1.2 [2/2] [UU] bitmap: 3/30 pages [12KB], 65536KB chunk Then LVM is used to manage the space in /dev/md0. About 1TB of md0 is not allocated yet. The nice thing about LVM is that I can easily use that to extend any existing file system or make additional ones. My file systems are ext4. # lvs LV VG Attr LSize LV_Home VG_data -wi-ao---- <1.82t LV_WebCam VG_data -wi-ao---- 931.84g # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 3.7T 0 disk └─sda1 8:1 0 3.7T 0 part └─md0 9:0 0 3.7T 0 raid1 ├─VG_data-LV_WebCam 253:0 0 931.9G 0 lvm /WebCam └─VG_data-LV_Home 253:1 0 1.8T 0 lvm /home sdb 8:16 0 3.7T 0 disk └─sdb1 8:17 0 3.7T 0 part └─md0 9:0 0 3.7T 0 raid1 ├─VG_data-LV_WebCam 253:0 0 931.9G 0 lvm /WebCam └─VG_data-LV_Home 253:1 0 1.8T 0 lvm /home mmcblk0 179:0 0 29.7G 0 disk ├─mmcblk0p1 179:1 0 256M 0 part /boot └─mmcblk0p2 179:2 0 29.5G 0 part / No automatic setup here. -RE On 1/20/21 4:36 PM, Chuck Anderson via WLUG wrote:
On Wed, Jan 20, 2021 at 03:34:25PM -0500, Michael Voorhis via WLUG wrote:
How would YOU setup a simple mirror in whatever Linux you use? Why do you prefer your selection? MD RAID is set up by the installers I've used (Fedora, Red Hat, Centos). LVM is used on top of MD RAID to provide flexible partitioning. I've never used LVM RAID.
WLUG mailing list -- wlug@lists.wlug.org To unsubscribe send an email to wlug-leave@lists.wlug.org Create Account: https://wlug.mailman3.com/accounts/signup/ Change Settings: https://wlug.mailman3.com/postorius/lists/wlug.lists.wlug.org/ Web Forum/Archive: https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/message/X2WG4W...
Michael> In Linux I can do software RAID using LVM or with MD (i.e., Michael> mdadm) as a basis. I'm currently thinking of a simple mirror Michael> of two conventional SATA disks. Michael> The machine is being built to do a compute-workload involving Michael> examination of many small(ish?) files, and will not be a Michael> desktop or a recreational/gaming machine. I don't know Michael> anything about the files or how they'll be examined. Will all the working set be local to the machine, or will they be stored on a NAS, but copied to the system and processed locally? Since it's a dedicated compute box, if I knew I could rebuild it easily, and I needed maximum local disk performance, I's be tempted to setup RAID0 stripe across the two disks. But... with SATA SSDs, I'd probably just go with a RAID1 mirror using mdadm, then setup lvm on top of the /dev/md0 device to create my local filesystems. Michael> How would YOU setup a simple mirror in whatever Linux you Michael> use? Why do you prefer your selection? I prefer mdadm on the bottom, then lvm on top, then ext4/xfs on top of that. But... in this case, the workload will impact the design. One thing to keep in mind is that directories with lots and lots of files will have scaling problems past a certain point. The old NNTP news spool servers used to setup a directory structure with three, four or more directory layers so they didn't end up with too many files in any one directory. Now, if all the data is local, will you be doing backups? Can you do backups of just intermediate results? Does it all need to be backed up? Again, if there will be lots of IO, going with SSDS will be best, otherwise each disk will limit you to 100-120 IOPs/disk. SSDS handle it so much better. But of course... under sustained load, some SSDs will slow down to a crawl as they hit the internal cache limits and they need to start doing garbage collection while still writing. But in general, your question didn't give us enough info to give you a good answer. Maybe try it top down? I.e.: I've got an application which needs to process hundreds (thousands? millions?) of small files which are downloaded, parsed with a compute light/heavy process, and intermediate/final results then saved for furthre processing. Describe the problem you're trying to solve, without assuming a design. John
If you want a simple mirror, you can't get simpler than mdadm with xfs on top of it. Now, is that your best option? Not sure. It *sounds* like your application is going to be come i/o bound really quickly. To John's point, I'd like to understand what you're better. One option you might want to investigate is something like a highpoint nvme card with a bunch of fast nvme's on it. I have such a setup with 4 nvme's in a raid0 (mdadm) and it'll do 800k IOPS. Tim. On Fri, Jan 22, 2021 at 5:01 PM John Stoffel via WLUG <wlug@lists.wlug.org> wrote:
Michael> In Linux I can do software RAID using LVM or with MD (i.e., Michael> mdadm) as a basis. I'm currently thinking of a simple mirror Michael> of two conventional SATA disks.
Michael> The machine is being built to do a compute-workload involving Michael> examination of many small(ish?) files, and will not be a Michael> desktop or a recreational/gaming machine. I don't know Michael> anything about the files or how they'll be examined.
Will all the working set be local to the machine, or will they be stored on a NAS, but copied to the system and processed locally?
Since it's a dedicated compute box, if I knew I could rebuild it easily, and I needed maximum local disk performance, I's be tempted to setup RAID0 stripe across the two disks. But... with SATA SSDs, I'd probably just go with a RAID1 mirror using mdadm, then setup lvm on top of the /dev/md0 device to create my local filesystems.
Michael> How would YOU setup a simple mirror in whatever Linux you Michael> use? Why do you prefer your selection?
I prefer mdadm on the bottom, then lvm on top, then ext4/xfs on top of that.
But... in this case, the workload will impact the design. One thing to keep in mind is that directories with lots and lots of files will have scaling problems past a certain point. The old NNTP news spool servers used to setup a directory structure with three, four or more directory layers so they didn't end up with too many files in any one directory.
Now, if all the data is local, will you be doing backups? Can you do backups of just intermediate results? Does it all need to be backed up?
Again, if there will be lots of IO, going with SSDS will be best, otherwise each disk will limit you to 100-120 IOPs/disk. SSDS handle it so much better.
But of course... under sustained load, some SSDs will slow down to a crawl as they hit the internal cache limits and they need to start doing garbage collection while still writing.
But in general, your question didn't give us enough info to give you a good answer. Maybe try it top down? I.e.:
I've got an application which needs to process hundreds (thousands? millions?) of small files which are downloaded, parsed with a compute light/heavy process, and intermediate/final results then saved for furthre processing.
Describe the problem you're trying to solve, without assuming a design.
John _______________________________________________ WLUG mailing list -- wlug@lists.wlug.org To unsubscribe send an email to wlug-leave@lists.wlug.org Create Account: https://wlug.mailman3.com/accounts/signup/ Change Settings: https://wlug.mailman3.com/postorius/lists/wlug.lists.wlug.org/ Web Forum/Archive: https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/message/S7MPWB...
-- I am leery of the allegiances of any politician who refers to their constituents as "consumers".
On 1/22/21 7:05 PM, Tim Keller via WLUG wrote:
To John's point, I'd like to understand what you're better.
The reason for the lack of detail is that I don't yet have a huge amt of detail on how the customer's software will work. With COVID19 I still haven't been able to have a detailed discussion. But what I've read on this list so far has been very useful. I like KISS design and I hadn't thought (very hard) about the use of RAID0 before now. XFS also sounds like it may be a part of the solution. --MCV.
"Michael" == Michael Voorhis via WLUG <wlug@lists.wlug.org> writes:
Michael> On 1/22/21 7:05 PM, Tim Keller via WLUG wrote:
To John's point, I'd like to understand what you're better.
Michael> The reason for the lack of detail is that I don't yet have a huge amt of Michael> detail on how the customer's software will work. With COVID19 I still Michael> haven't been able to have a detailed discussion. Michael> But what I've read on this list so far has been very useful. Michael> I like KISS design and I hadn't thought (very hard) about the Michael> use of RAID0 before now. XFS also sounds like it may be a Michael> part of the solution. KISS is good. But you really need to talk with the customer to define their workflow better. Also, there's a big difference between SSD and SATA disks. Heck, you have that pile of six SAS disks, why not use them in a RAID0 for your working set, but then a pair of mirrored larger disks to store your data. John
On 1/23/21 10:29 AM, John Stoffel wrote:
KISS is good. But you really need to talk with the customer to define their workflow better. Also, there's a big difference between SSD and SATA disks.
Oh definitely. I'll be talking to them. I'm just enjoying the design part, which has always been my favorite part of this sort of work -- the designing and building of stuff, before the inevitable descent into maintenance issues, misuse and neglect. :)
Heck, you have that pile of six SAS disks, why not use them in a RAID0 for your working set, but then a pair of mirrored larger disks to store your data.
I'd love to use my stack of SAS disks ... but that would mean purchasing a SAS controller for the project, and that would mean $$$. Hmmm, that's another thing I hadn't thought about looking for -- a JBOD SAS Controller ...??
"Michael" == Michael Voorhis <mvoorhis@mcvau.net> writes:
Michael> On 1/23/21 10:29 AM, John Stoffel wrote:
KISS is good. But you really need to talk with the customer to define their workflow better. Also, there's a big difference between SSD and SATA disks.
Michael> Oh definitely. I'll be talking to them. I'm just enjoying the design Michael> part, which has always been my favorite part of this sort of work -- the Michael> designing and building of stuff, before the inevitable descent into Michael> maintenance issues, misuse and neglect. :)
Heck, you have that pile of six SAS disks, why not use them in a RAID0 for your working set, but then a pair of mirrored larger disks to store your data.
Michael> I'd love to use my stack of SAS disks ... but that would mean Michael> purchasing a SAS controller for the project, and that would Michael> mean $$$. Hmmm, that's another thing I hadn't thought about Michael> looking for -- a JBOD SAS Controller ...?? They're cheap, look at the LSI stuff you can find in the internet. Generally if it can do SATA, it should do SAS. https://www.ebay.com/itm/133137220890?_trkparms=ispr%3D1&amdata=enc%3AAQAFAA... This might do the trick... though I'd check the cables.
participants (6)
-
Chuck Anderson
-
John Stoffel
-
Krichevsky, Nicholas J.
-
Michael Voorhis
-
Rob Evans
-
Tim Keller