If you want a simple mirror, you can't get simpler than mdadm with xfs on top of it. Now, is that your best option? Not sure. It *sounds* like your application is going to be come i/o bound really quickly. To John's point, I'd like to understand what you're better. One option you might want to investigate is something like a highpoint nvme card with a bunch of fast nvme's on it. I have such a setup with 4 nvme's in a raid0 (mdadm) and it'll do 800k IOPS. Tim. On Fri, Jan 22, 2021 at 5:01 PM John Stoffel via WLUG <wlug@lists.wlug.org> wrote:
Michael> In Linux I can do software RAID using LVM or with MD (i.e., Michael> mdadm) as a basis. I'm currently thinking of a simple mirror Michael> of two conventional SATA disks.
Michael> The machine is being built to do a compute-workload involving Michael> examination of many small(ish?) files, and will not be a Michael> desktop or a recreational/gaming machine. I don't know Michael> anything about the files or how they'll be examined.
Will all the working set be local to the machine, or will they be stored on a NAS, but copied to the system and processed locally?
Since it's a dedicated compute box, if I knew I could rebuild it easily, and I needed maximum local disk performance, I's be tempted to setup RAID0 stripe across the two disks. But... with SATA SSDs, I'd probably just go with a RAID1 mirror using mdadm, then setup lvm on top of the /dev/md0 device to create my local filesystems.
Michael> How would YOU setup a simple mirror in whatever Linux you Michael> use? Why do you prefer your selection?
I prefer mdadm on the bottom, then lvm on top, then ext4/xfs on top of that.
But... in this case, the workload will impact the design. One thing to keep in mind is that directories with lots and lots of files will have scaling problems past a certain point. The old NNTP news spool servers used to setup a directory structure with three, four or more directory layers so they didn't end up with too many files in any one directory.
Now, if all the data is local, will you be doing backups? Can you do backups of just intermediate results? Does it all need to be backed up?
Again, if there will be lots of IO, going with SSDS will be best, otherwise each disk will limit you to 100-120 IOPs/disk. SSDS handle it so much better.
But of course... under sustained load, some SSDs will slow down to a crawl as they hit the internal cache limits and they need to start doing garbage collection while still writing.
But in general, your question didn't give us enough info to give you a good answer. Maybe try it top down? I.e.:
I've got an application which needs to process hundreds (thousands? millions?) of small files which are downloaded, parsed with a compute light/heavy process, and intermediate/final results then saved for furthre processing.
Describe the problem you're trying to solve, without assuming a design.
John _______________________________________________ WLUG mailing list -- wlug@lists.wlug.org To unsubscribe send an email to wlug-leave@lists.wlug.org Create Account: https://wlug.mailman3.com/accounts/signup/ Change Settings: https://wlug.mailman3.com/postorius/lists/wlug.lists.wlug.org/ Web Forum/Archive: https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/message/S7MPWB...
-- I am leery of the allegiances of any politician who refers to their constituents as "consumers".