On Wed, May 16, 2007 at 11:38:09AM -0400, Jeff Moyer wrote:
==> On Wed, 16 May 2007 14:53:16 +0000, brad noyes <maitre@ccs.neu.edu> said:
brad> Hello All, brad> I am seeing some really slow performance regarding large files on linux. I brad> write a lot of data points from a light sensor. The stream is about 53 Mb/s and brad> i need to keep this rate for 7 minutes, that's a total of about 22Gb. I brad> can sustain 53Mb/s pretty well until the file grows to over 1Gb or so, then brad> things hit the wall and the writes to the filesystem can't keep up. The writes brad> go from 20ms in duration to 500ms. I assume the filesystem/operating system brad> is caching writes. Do you have any suggestions on how to speed up performance brad> on these writes, filesystem options, kernel options, other strategies, etc?
Of course. Your data set is larger than the page cache, so when you hit the low watermark, it starts write-back. You can deal with this a few different ways, and I'll throw out the easiest ways first: 1) Get more memory 2) Get a faster disk
Ha :). I have 12GB of memory. Which actually brings me to another question. How do i alter the per-process memory limit? I can only allocate a memory buffer that is 3GB. I'd like to make use of the other 8GB left in the machine. If i can double my buffer size i think i could sustain the 53MB/s for 7 minutes that i need.
If those are not options, then you can tweak your application by using AIO and O_DIRECT. This will allow you to drive your disk queue depths a bit further and avoid the page cache. Check the man pages for io_setup, io_submit, and io_getevents to get started.
I'll check out these options and man pages.
brad> Things I have tried: brad> - I have tried this on a ext3 file system as well as an xfs filesystem brad> with the same result.
You may not want to use a journalled file system. If you must, though, with ext3 you could try running with the data=writeback option.
yup. I'll check this option out.
brad> - I have also tried spooling over several files (a la multiple volumes) brad> but i see no difference in performance. In fact, i think this actually brad> hinders performance a bit.
I'm not sure I fully understand what you mean. Are you saying you write to separate physical volumes,
Not physical volumes, but different files. By the end of the data acquisition i will end up with the files: data.01, data.02, data.03 ... etc. Each file is a 1GB in size or whatever i set the limit to be. The reason i did this is because i thought that as the file grows larger there are several layers of indirection in the inode to get to the actual data blocks on disk; and perhaps that might hinder performance.
and that you don't see any performance increase from doing so?
Correct. I don't see any improvement. At least no measurable performance improvement in the kind of rates i'm dealing with.
brad> - I keep my own giant memory buffer where all the data is stored and brad> then it is written to disk in a background thread. This helps, but brad> i run out of space in the buffer before i finish taking data.
Right, this is exactly what happens in the OS. ;) Speaking of which, you don't mention which kernel you are using. Could you please provide that information? There are a few vm tunables that you could try tweaking, but I really don't think they will help if your data set is larger than memory. We can explore that option, though, if you like.
i'm using the 2.6.20 kernel from the ubuntu source tree. I recompiled it to get the large memory support, up to 64GB. I was looking for some tunable vm options in sysctl, but i didn't see much that made sense to me. If nothing else helps perhaps i will ask about the vm options.
p.s. In your head, is Mb Megabit or Megabyte?
the latter. Jamie already pointed this typo out to me :). Perhaps this time around my unit abbreviations are correct. Thanks for your input. I'll keep the list posted. -- Brad