==> On Wed, 16 May 2007 14:53:16 +0000, brad noyes <maitre@ccs.neu.edu> said: brad> Hello All, brad> I am seeing some really slow performance regarding large files on linux. I brad> write a lot of data points from a light sensor. The stream is about 53 Mb/s and brad> i need to keep this rate for 7 minutes, that's a total of about 22Gb. I brad> can sustain 53Mb/s pretty well until the file grows to over 1Gb or so, then brad> things hit the wall and the writes to the filesystem can't keep up. The writes brad> go from 20ms in duration to 500ms. I assume the filesystem/operating system brad> is caching writes. Do you have any suggestions on how to speed up performance brad> on these writes, filesystem options, kernel options, other strategies, etc? Of course. Your data set is larger than the page cache, so when you hit the low watermark, it starts write-back. You can deal with this a few different ways, and I'll throw out the easiest ways first: 1) Get more memory 2) Get a faster disk If those are not options, then you can tweak your application by using AIO and O_DIRECT. This will allow you to drive your disk queue depths a bit further and avoid the page cache. Check the man pages for io_setup, io_submit, and io_getevents to get started. brad> Things I have tried: brad> - I have tried this on a ext3 file system as well as an xfs filesystem brad> with the same result. You may not want to use a journalled file system. If you must, though, with ext3 you could try running with the data=writeback option. brad> - I have also tried spooling over several files (a la multiple volumes) brad> but i see no difference in performance. In fact, i think this actually brad> hinders performance a bit. I'm not sure I fully understand what you mean. Are you saying you write to separate physical volumes, and that you don't see any performance increase from doing so? brad> - I keep my own giant memory buffer where all the data is stored and brad> then it is written to disk in a background thread. This helps, but brad> i run out of space in the buffer before i finish taking data. Right, this is exactly what happens in the OS. ;) Speaking of which, you don't mention which kernel you are using. Could you please provide that information? There are a few vm tunables that you could try tweaking, but I really don't think they will help if your data set is larger than memory. We can explore that option, though, if you like. For now, my suggestion is to try using AIO with the open flag O_DIRECT. This will require you to align your data on 512 byte boundaries (and the size of the I/Os has to be a multiple of 512 as well). If you need any help converting your app, feel free to contact me off-list. -Jeff p.s. In your head, is Mb Megabit or Megabyte?