Fwd: [Wlug] NFS Trouble

Sept. 9, 2004

      Meant to sent this to the whole list.

---------- Forwarded message ----------
From: Chuck Haines <chaines@gmail.com>
Date: Thu, 9 Sep 2004 13:44:53 -0400
Subject: Re: [Wlug] NFS Trouble
To: jmoyer@redhat.com

Server:
   Kernel = 2.4.21-15.ELsmp
   nfs-util = 1.0.6-21EL

Client:
    Kernel = 2.4.21-15.ELsmp
    nfs-util = 1.0.6-21EL

Both machines are running RedHat Advance Server 3.0 ( Taroon Update 2).

When performing the copy, top on the server loops like this after
about 75% of the transfer:

 13:43:19  up 26 days,  6:05,  2 users,  load average: 7.19, 3.30, 1.62
125 processes: 124 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    0.2%    0.0%    1.6%   0.0%     0.4%   48.3%   49.3%
           cpu00    0.0%    0.0%    1.6%   0.0%     0.0%   66.6%   31.6%
           cpu01    0.4%    0.0%    1.6%   0.0%     0.8%   30.0%   67.0%
Mem:  2061576k av, 2044428k used,   17148k free,       0k shrd,  351044k buff
                   1425940k actv,  270704k in_d,   31124k in_c
Swap: 1052248k av,   19780k used, 1032468k free                 1371100k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
24242 bkmcd     15   0  4648 4220  3012 S     0.4  0.2   0:03   1 smbd
  165 root      17   0     0    0     0 SW    0.2  0.0  26:10   0 raid5d
 4783 root      15   0     0    0     0 DW    0.2  0.0  80:48   0 nfsd
 4790 root      15   0     0    0     0 DW    0.2  0.0  80:29   1 nfsd
 4788 root      15   0     0    0     0 DW    0.2  0.0  82:35   0 nfsd
 4785 root      15   0     0    0     0 DW    0.2  0.0  80:40   0 nfsd
 4784 root      15   0     0    0     0 DW    0.2  0.0  81:18   1 nfsd
24515 chaines   15   0  1256 1256   900 R     0.2  0.0   0:00   0 top
    1 root      15   0   508  468   448 S     0.0  0.0   0:24   0 init
    2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0 migration/0

As you can see, most of the CPU is in an IO wait state.

Hope this helps,
Chuck

On Thu, 9 Sep 2004 12:54:28 -0400, Jeff Moyer <jmoyer@redhat.com> wrote:
...
==> Regarding [Wlug] NFS Trouble; Chuck Haines <chaines@gmail.com> adds:
chaines> I have an interesting problem that is occuring.  I have a /home
chaines> directory on a file server and I have several login servers that
chaines> mount this /home directory via NFS.  Now this /home directory has
chaines> a ton of files and directorys (about 2000 user accounts).  I
chaines> noticed that when I perform a copy of a large file ( I tested with
chaines> a 50 MB file) on a login server from something like /tmp to the
chaines> NFS mounted /home, the load average on the file server jumps up to
chaines> over 7, and seems to take forever (around 3-4 minutes).  If I copy
chaines> a large file (once again I tested with 50 MB) from the NFS mounted
chaines> /home to like /tmp on the login server, there is no noticable load
chaines> average increase and it occurs in approximately 10-15 seconds.  So
chaines> it seems that writes to the NFS mounted /home are taking longer
chaines> than they should and are spiking the load average.  Both the file
chaines> server and the login servers are dual P3's with 2 gigs of ram.
chaines> Has anyone ever seen anything like this, or knows of any solutions
chaines> to this problem?
Kernel version (client and server)?  Version of nfs-utils?  What does top
show as taking up CPU when you do the copy and your load average spikes?
-Jeff
--
Chuck Haines
chaines@gmail.com
-------------------------------------------
Tau Kappa Epsilon Fraternity
WPI Class of 2005
-------------------------------------------
AIM: CyberGrex
YIM: CyberGrex_27
ICQ: 3707881
-------------------------------------------

-- 
Chuck Haines
chaines@gmail.com
-------------------------------------------
Tau Kappa Epsilon Fraternity
WPI Class of 2005
-------------------------------------------
AIM: CyberGrex
YIM: CyberGrex_27
ICQ: 3707881
-------------------------------------------