I have an interesting problem that is occuring. I have a /home directory on a file server and I have several login servers that mount this /home directory via NFS. Now this /home directory has a ton of files and directorys (about 2000 user accounts). I noticed that when I perform a copy of a large file ( I tested with a 50 MB file) on a login server from something like /tmp to the NFS mounted /home, the load average on the file server jumps up to over 7, and seems to take forever (around 3-4 minutes). If I copy a large file (once again I tested with 50 MB) from the NFS mounted /home to like /tmp on the login server, there is no noticable load average increase and it occurs in approximately 10-15 seconds. So it seems that writes to the NFS mounted /home are taking longer than they should and are spiking the load average. Both the file server and the login servers are dual P3's with 2 gigs of ram. Has anyone ever seen anything like this, or knows of any solutions to this problem? Thanks, Chuck -- Chuck Haines chaines@gmail.com ------------------------------------------- Tau Kappa Epsilon Fraternity WPI Class of 2005 ------------------------------------------- AIM: CyberGrex YIM: CyberGrex_27 ICQ: 3707881 -------------------------------------------
On Thursday 09 September 2004 11:59 am, Chuck Haines wrote:
Has anyone ever seen anything like this, or knows of any solutions to this problem?
i havent seen anything like that before, but i have had random NFS issues ... i found using NFSv3 over TCP resolved pretty much all of the problems i experienced while using NFSv2 or NFSv3 over UDP you could also try tweaking the rsize and wsize parameters for the mount -mike
With what options did you mount the volume on the client? Chuck Haines wrote:
I have an interesting problem that is occuring. I have a /home directory on a file server and I have several login servers that mount this /home directory via NFS. Now this /home directory has a ton of files and directorys (about 2000 user accounts). I noticed that when I perform a copy of a large file ( I tested with a 50 MB file) on a login server from something like /tmp to the NFS mounted /home, the load average on the file server jumps up to over 7, and seems to take forever (around 3-4 minutes). If I copy a large file (once again I tested with 50 MB) from the NFS mounted /home to like /tmp on the login server, there is no noticable load average increase and it occurs in approximately 10-15 seconds. So it seems that writes to the NFS mounted /home are taking longer than they should and are spiking the load average. Both the file server and the login servers are dual P3's with 2 gigs of ram. Has anyone ever seen anything like this, or knows of any solutions to this problem?
Thanks, Chuck
The curent line from the fstab is as follows: hostname:/home /home nfs rsize=4096,wsize=4096,bg,intr,rw,actimeo=15 0 0 Where hostname is replaced with the name of the machine I am mounting the /home from. Chuck On Thu, 09 Sep 2004 12:23:52 -0400, Dwight A. Ernest <dwight@significant.com> wrote:
With what options did you mount the volume on the client?
Chuck Haines wrote:
I have an interesting problem that is occuring. I have a /home directory on a file server and I have several login servers that mount this /home directory via NFS. Now this /home directory has a ton of files and directorys (about 2000 user accounts). I noticed that when I perform a copy of a large file ( I tested with a 50 MB file) on a login server from something like /tmp to the NFS mounted /home, the load average on the file server jumps up to over 7, and seems to take forever (around 3-4 minutes). If I copy a large file (once again I tested with 50 MB) from the NFS mounted /home to like /tmp on the login server, there is no noticable load average increase and it occurs in approximately 10-15 seconds. So it seems that writes to the NFS mounted /home are taking longer than they should and are spiking the load average. Both the file server and the login servers are dual P3's with 2 gigs of ram. Has anyone ever seen anything like this, or knows of any solutions to this problem?
Thanks, Chuck
-- Chuck Haines chaines@gmail.com ------------------------------------------- Tau Kappa Epsilon Fraternity WPI Class of 2005 ------------------------------------------- AIM: CyberGrex YIM: CyberGrex_27 ICQ: 3707881 -------------------------------------------
Chuck> The curent line from the fstab is as follows: Chuck> hostname:/home /home nfs Chuck> rsize=4096,wsize=4096,bg,intr,rw,actimeo=15 0 0 Here's one of the problems, you're not specifying TCP here, and you're using pretty low numbers. Try changing to use 32768 and tcp for your numbers. Also, have you checked the duplex on server to make sure it's ok? You mention you're using RAID on the server, I assume RAID5? If so, what kind of stripe size are you using? What does 'vmstat 1' (on the server) say from about 15 seconds before the copy kicks in, to about 15 seconds after the copy is done? Does the time scale up linearly when you copy a 100mb file? Or drop by half when you copy a 25mb file? What filesystem are you using on the NFS server and do you have quotas or anything else like that setup? How full is the filesystem? What happens if you write a pair of 50 mb files at the same time from two different clients? Does the time double? Does the load double? Basically, I don't know what's going on here, but I suspect: 1. network speed/duplex mismatch - that you're seeing lots of timeouts and retries on writes, but not reads. 2. NFS needs tuning on the clients to write in bigger chunks 3. You're RAID stinks. Which reminds me, how much time does a write from a non-raid disk on the server to the home directory/raid disk take? Can you time the write time of a 50mb file and if it's still ugly, then you've narrowed down the issue. Basically, there's alot of potential problems here and we need more details. John John Stoffel - Senior Unix Systems Administrator - Lucent Technologies stoffel@lucent.com - http://www.lucent.com - 978-952-7548
I will try all of these things and get back to everyone on what I find out. Thanks a bunch for your help. Chuck On Thu, 9 Sep 2004 14:39:21 -0400, John Stoffel <stoffel@lucent.com> wrote:
Chuck> The curent line from the fstab is as follows: Chuck> hostname:/home /home nfs Chuck> rsize=4096,wsize=4096,bg,intr,rw,actimeo=15 0 0
Here's one of the problems, you're not specifying TCP here, and you're using pretty low numbers. Try changing to use 32768 and tcp for your numbers.
Also, have you checked the duplex on server to make sure it's ok? You mention you're using RAID on the server, I assume RAID5? If so, what kind of stripe size are you using?
What does 'vmstat 1' (on the server) say from about 15 seconds before the copy kicks in, to about 15 seconds after the copy is done?
Does the time scale up linearly when you copy a 100mb file? Or drop by half when you copy a 25mb file?
What filesystem are you using on the NFS server and do you have quotas or anything else like that setup? How full is the filesystem?
What happens if you write a pair of 50 mb files at the same time from two different clients? Does the time double? Does the load double?
Basically, I don't know what's going on here, but I suspect:
1. network speed/duplex mismatch - that you're seeing lots of timeouts and retries on writes, but not reads.
2. NFS needs tuning on the clients to write in bigger chunks
3. You're RAID stinks. Which reminds me, how much time does a write from a non-raid disk on the server to the home directory/raid disk take? Can you time the write time of a 50mb file and if it's still ugly, then you've narrowed down the issue.
Basically, there's alot of potential problems here and we need more details.
John John Stoffel - Senior Unix Systems Administrator - Lucent Technologies stoffel@lucent.com - http://www.lucent.com - 978-952-7548
-- Chuck Haines chaines@gmail.com ------------------------------------------- Tau Kappa Epsilon Fraternity WPI Class of 2005 ------------------------------------------- AIM: CyberGrex YIM: CyberGrex_27 ICQ: 3707881 -------------------------------------------
==> Regarding [Wlug] NFS Trouble; Chuck Haines <chaines@gmail.com> adds: chaines> I have an interesting problem that is occuring. I have a /home chaines> directory on a file server and I have several login servers that chaines> mount this /home directory via NFS. Now this /home directory has chaines> a ton of files and directorys (about 2000 user accounts). I chaines> noticed that when I perform a copy of a large file ( I tested with chaines> a 50 MB file) on a login server from something like /tmp to the chaines> NFS mounted /home, the load average on the file server jumps up to chaines> over 7, and seems to take forever (around 3-4 minutes). If I copy chaines> a large file (once again I tested with 50 MB) from the NFS mounted chaines> /home to like /tmp on the login server, there is no noticable load chaines> average increase and it occurs in approximately 10-15 seconds. So chaines> it seems that writes to the NFS mounted /home are taking longer chaines> than they should and are spiking the load average. Both the file chaines> server and the login servers are dual P3's with 2 gigs of ram. chaines> Has anyone ever seen anything like this, or knows of any solutions chaines> to this problem? Kernel version (client and server)? Version of nfs-utils? What does top show as taking up CPU when you do the copy and your load average spikes? -Jeff
participants (5)
-
Chuck Haines
-
Dwight A. Ernest
-
Jeff Moyer
-
John Stoffel
-
Mike Frysinger