Re: [Wlug] large file performance on linux

May 16, 2007

      On Wed, May 16, 2007 at 11:38:09AM -0400, Jeff Moyer wrote:
...
==> On Wed, 16 May 2007 14:53:16 +0000, brad noyes <maitre@ccs.neu.edu> said:
brad> Hello All,
brad> I am seeing some really slow performance regarding large files on linux. I
brad> write a lot of data points from a light sensor. The stream is about 53 Mb/s and
brad> i need to keep this rate for 7 minutes, that's a total of about 22Gb. I
brad> can sustain 53Mb/s pretty well until the file grows to over 1Gb or so, then
brad> things hit the wall and the writes to the filesystem can't keep up. The writes
brad> go from 20ms in duration to 500ms. I assume the filesystem/operating system 
brad> is caching writes. Do you have any suggestions on how to speed up performance 
brad> on these writes, filesystem options, kernel options, other strategies, etc?
Of course.  Your data set is larger than the page cache, so when you
hit the low watermark, it starts write-back.  You can deal with this a
few different ways, and I'll throw out the easiest ways first:
1) Get more memory
2) Get a faster disk
Ha :).  I have 12GB of memory. Which actually brings me to another question. 
How do i alter the per-process memory limit? I can only allocate a memory 
buffer that is 3GB. I'd like to make use of the other 8GB left in the machine.
If i can double my buffer size i think i could sustain the 53MB/s for 7
minutes that i need.
...
If those are not options, then you can tweak your application by using
AIO and O_DIRECT.  This will allow you to drive your disk queue depths
a bit further and avoid the page cache.  Check the man pages for
io_setup, io_submit, and io_getevents to get started.
I'll check out these options and man pages.
...
brad> Things I have tried:
brad>  - I have tried this on a ext3 file system as well as an xfs filesystem 
brad>    with the same result.
You may not want to use a journalled file system.  If you must,
though, with ext3 you could try running with the data=writeback
option.
yup. I'll check this option out.
...
brad>  - I have also tried spooling over several files (a la multiple volumes) 
brad>    but i see no difference in performance. In fact, i think this actually
brad>    hinders performance a bit.
I'm not sure I fully understand what you mean.  Are you saying you
write to separate physical volumes,
Not physical volumes, but different files. By the end of the data
acquisition i will end up with the files: data.01, data.02, data.03 ... etc. 
Each file is a 1GB in size or whatever i set the limit to be. The reason i did
this is because i thought that as the file grows larger there are several
layers of indirection in the inode to get to the actual data blocks on disk;
and perhaps that might hinder performance.
...
and that you don't see any performance increase from doing so?
Correct. I don't see any improvement. At least no measurable performance
improvement in the kind of rates i'm dealing with.
...
brad>  - I keep my own giant memory buffer where all the data is stored and 
brad>    then it is written to disk in a background thread. This helps, but
brad>    i run out of space in the buffer before i finish taking data.
Right, this is exactly what happens in the OS.  ;) Speaking of which,
you don't mention which kernel you are using.  Could you please
provide that information? There are a few vm tunables that you could
try tweaking, but I really don't think they will help if your data set
is larger than memory.  We can explore that option, though, if you
like.
i'm using the 2.6.20 kernel from the ubuntu source tree. I recompiled it to get
the large memory support, up to 64GB.

I was looking for some tunable vm options in sysctl, but i didn't see much that
made sense to me. If nothing else helps perhaps i will ask about the vm
options.
...
p.s.  In your head, is Mb Megabit or Megabyte?
the latter. Jamie already pointed this typo out to me :). Perhaps this time
around my unit abbreviations are correct.

Thanks for your input. I'll keep the list posted.
  -- Brad