RE: [Wlug] NFS question
Jeff, Thanx for the continued feedback and suggestions, RHCM sounds cool but, I've read through some of the bug tracks on it and it sounds like there are some bad corruption problems when using it. It also sounds like there can be a considerable lag in the time it takes for the shares to come up on the secondary server. Are there many people using this in a production environment? Thanx, Don -----Original Message----- From: wlug-bounces@mail.wlug.org [mailto:wlug-bounces@mail.wlug.org] On Behalf Of Jeff Moyer Sent: Wednesday, October 13, 2004 3:48 PM To: Worcester Linux Users Group Subject: RE: [Wlug] NFS question ==> Regarding RE: [Wlug] NFS question; "John Stoffel" <stoffel@lucent.com> adds: Don> Thanx for the info John. Unfortunately I can't use the Red Hat Don> GFS, it requires v3.0 and we cannot upgrade to that yet. stoffel> Since it's open source, can't you get it from stoffel> http://sources.redhat.com/cluster/gfs/ and see if it will stoffel> install on your systems? Sounds like he wants a supported solution. Don> I have been pushing for it, but Oracle won't give me an definite Don> answer as to whether their apps are certified for RH 3.0. stoffel> That's silly, but very understandable. From looking at your stoffel> corp website, I can see why you're interested in only deploying stoffel> supportable systems, esp in such a production environment. Don> We could go with Veritas' Storage Foundation for Oracle RAC, but Don> that would require considerable change, and no one here knows the software. Don> For the time being I'll probably have to go with a NAS device and Don> make it as redundant as possible. stoffel> I know Veritas. :] And I'm looking for a job. :] Well, I know stoffel> VxVM/VxFS quite well and I've been exposed to their Clustering stoffel> software as well. Good stuff all around. stoffel> But that doesn't solve the question here, which is how to get a stoffel> good reliable NFS file storage (would the storage be on the stoffel> SAN, or local to the server) for a good price. stoffel> Here's a thought, but a pair of cheap 2U servers and install stoffel> RHEL 3.0 along with the GFS filesystem and clustering software. stoffel> Export via NFS to the other servers. If one node goes down, stoffel> you've got a backup and failover. And it would give you stoffel> exposure/experience with RHEL 3.0, GFS and clustering so you stoffel> would be working to migrate the Oracle instances to the same stoffel> type of setup down the road. If you only need to server NFS, then you can do this without GFS. You can use GFS, and it will mean that clients can mount from either server, but it sounds like overkill for this case. You simply want failover of NFS filesystems, and that can be accomplished quite easily with AS 2.1 and RHCM. In fact, you can probably work this into your existing environment without buying any more hardware (and I think without purchasing new software licenses, too). -Jeff _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
How do I get my SanDisk cruzer mini 256 mb pen drive to mount on my Fedora Core 2 laptop? Thanks in advance Walt
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 14 October 2004 9:55 am, Walt Sawyer wrote:
How do I get my SanDisk cruzer mini 256 mb pen drive to mount on my Fedora Core 2 laptop? Thanks in advance Walt
HI Walt, I always double check to insure that USB devices are supported by Linux. One place to check is: http://www.qbik.ch/usb/devices/ It helps to know the vendor ID and device ID of your device before checking this site, since sometimes the marketing name of the device changes (or varies in different countries). You can find the vendor and device IDs with the command "lsusb" or the GUI "usbview", or "cat /proc/bus/usb/devices". In my experience, with this info, and consulting the aforementioned website, if the site says it works, it will work. I checked your device and it appears that it should work, but double check with the vendor and device IDs to be sure. If usbview (or the others) sees the device, that's good, but it doesn't always mean that the device will work with Linux. The hotplug system should have installed the kernel modules for you. If you try "fdisk -l" (letter ell), you should see a new SCSI device (that's how USB "disks" (pen drives, etc) show up). It would be in the form: /dev/sdXY. If fdisk doesn't show it, check your kernel modules and your system log (/var/log/messages). Yes...I've been through this a few dozen times..... :-) often to no avail. Be careful what you purchase. I've also discovered that some flavors of kernel 2.6.x have problems with USB. After communicating with Greg Kroah-Hartman (kernel USB developer), he fixed a couple of things, and then I ended up moving to kernel version 2.6.8.1. I had fewer problems with the later 2.4 kernels, but thankfully 2.6.8.1 is working nicely for my USB devices. Later, Andy - -- Andy Stewart, Founder Worcester Linux Users' Group Worcester, MA USA http://www.wlug.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBbytZHl0iXDssISsRAtLgAJ9gR8OtvONdQKkfmVi44KVrVst4RQCeKo4p eBJojmRxM3/E1dpGEvgMpGQ= =6hEI -----END PGP SIGNATURE-----
==> Regarding RE: [Wlug] NFS question; "Don Peterson" <dpeterson@sterilite.com> adds: dpeterson> Jeff, Thanx for the continued feedback and suggestions, RHCM dpeterson> sounds cool but, I've read through some of the bug tracks on it dpeterson> and it sounds like there are some bad corruption problems when dpeterson> using it. It also sounds like there can be a considerable lag dpeterson> in the time it takes for the shares to come up on the secondary dpeterson> server. Are there many people using this in a production dpeterson> environment? Are you certain you are looking at the right product? In bugzilla, query Product: Red Hat Enterprise Linux Version: 2.1AS Component: clumanager There are no open bugs. You are probably looking at the clustering infrastructure related to GFS, which is a very new offerring for Red Hat. There are at least hundreds of deployments of AS2.1 and clumanager. As for the lag, it is, of course, configurable (at least it was last I worked on it, several years ago). There are a couple of components to this, of course. First, how long does it take to detect a failure, and second, how long does it take for application/filesystem recovery. Using a journaled filesystem, you will see fairly quick recovery times on that end. To simply failover NFS, you won't be looking at too much overhead. The one hardware component you may need to buy would be a fencing device. Hope this helps. -Jeff dpeterson> -----Original Message----- From> wlug-bounces@mail.wlug.org [mailto:wlug-bounces@mail.wlug.org] On dpeterson> Behalf Of Jeff Moyer Sent> Wednesday, October 13, 2004 3:48 PM To> Worcester Linux Users Group Subject> RE: [Wlug] NFS question ==> Regarding RE: [Wlug] NFS question; "John Stoffel"
stoffel@lucent.com> adds:
Don> Thanx for the info John. Unfortunately I can't use the Red Hat GFS, Don> it requires v3.0 and we cannot upgrade to that yet. stoffel> Since it's open source, can't you get it from stoffel> http://sources.redhat.com/cluster/gfs/ and see if it will install stoffel> on your systems? dpeterson> Sounds like he wants a supported solution. Don> I have been pushing for it, but Oracle won't give me an definite Don> answer as to whether their apps are certified for RH 3.0. stoffel> That's silly, but very understandable. From looking at your corp stoffel> website, I can see why you're interested in only deploying stoffel> supportable systems, esp in such a production environment. Don> We could go with Veritas' Storage Foundation for Oracle RAC, but that Don> would require considerable change, and no one here knows the dpeterson> software. Don> For the time being I'll probably have to go with a NAS device and make Don> it as redundant as possible. stoffel> I know Veritas. :] And I'm looking for a job. :] Well, I know stoffel> VxVM/VxFS quite well and I've been exposed to their Clustering stoffel> software as well. Good stuff all around. stoffel> But that doesn't solve the question here, which is how to get a stoffel> good reliable NFS file storage (would the storage be on the SAN, stoffel> or local to the server) for a good price. stoffel> Here's a thought, but a pair of cheap 2U servers and install RHEL stoffel> 3.0 along with the GFS filesystem and clustering software. stoffel> Export via NFS to the other servers. If one node goes down, stoffel> you've got a backup and failover. And it would give you stoffel> exposure/experience with RHEL 3.0, GFS and clustering so you would stoffel> be working to migrate the Oracle instances to the same type of stoffel> setup down the road. dpeterson> If you only need to server NFS, then you can do this without dpeterson> GFS. You can use GFS, and it will mean that clients can mount dpeterson> from either server, but it sounds like overkill for this case. dpeterson> You simply want failover of NFS filesystems, and that can be dpeterson> accomplished quite easily with AS 2.1 and RHCM. In fact, you dpeterson> can probably work this into your existing environment without dpeterson> buying any more hardware (and I think without purchasing new dpeterson> software licenses, too). dpeterson> -Jeff _______________________________________________ Wlug dpeterson> mailing list Wlug@mail.wlug.org http> //mail.wlug.org/mailman/listinfo/wlug
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 14 October 2004 10:09 am, Jeff Moyer wrote:
The one hardware component you may need to buy would be a fencing device.
I have a "fencing device" - its a 4' long wooden replica of a German longsword. You could beat the computer with it if you get frustrated. ;-) Cost: $90 - handmade - -- Andy Stewart, Founder Worcester Linux Users' Group Worcester, MA USA http://www.wlug.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBbyxZHl0iXDssISsRAikiAJ9GoocRinLAkLfUDaRNJxIk6OJv5wCfeLKE nVJuse9fSsw7FCJuWnnAOyg= =XOsI -----END PGP SIGNATURE-----
==> Regarding RE: [Wlug] NFS question; Andy Stewart <andystewart@comcast.net> adds: andystewart> -----BEGIN PGP SIGNED MESSAGE----- Hash> SHA1 andystewart> On Thursday 14 October 2004 10:09 am, Jeff Moyer wrote:
The one hardware component you may need to buy would be a fencing device.
andystewart> I have a "fencing device" - its a 4' long wooden replica of a andystewart> German longsword. You could beat the computer with it if you andystewart> get frustrated. ;-) andystewart> Cost: $90 - handmade Everyone's a comedian. ;) I/O fencing - n. The act of isolating a rogue program or computer from a shared storage medium. This is a term used in clustering, whereby one node is determined to be in an unknown state and, in order to perform recovery, it must be kept from writing any further data to the shared storage. Common methods of fencing include, but are not limited to: STONITH - "The big hammer" approach. Stands for Shoot The Other Node In The Head. Also known as STOMITH, where M is Member. Effected by the use of remote power switches, this is a common fencing method in Linux clusters. Essentially, each node has access to the power switch for each other node. In the event of a node failure, a surviving member will reset power on the failed node. SCSI Reservations - A part of the SCSI standard, reservations restrict access to storage to the device which issued the reservation. In Digital Clusters (later Compaq TruClusters), DAIO (pronounced day-o, Direct Access I/O) disks were turned into served disks, and "owned" by one member of the cluster. This member would issue the SCSI RESERVE command for the disk, and other members wishing to initiate I/O to this disk would have to go through the server. This is not anything as crude as NFS. Remember Digital had its proprietary Memory Channel bus, which is used to issue the I/O requests and get responses. SCSI Reservations can be broken by the holder issuing a Release command, or by any type of reset (bus, power). Because of this, it has been historically problematic (most O/S's will reset the SCSI bus on boot). So, when a node is determined to be in an unknown/unsafe state, the cluster will clear its reservation (by issuing a reset to the device) and another node will take over serving the disk. Persistent Reservations - A newer form of the Reservation above, persistent reserves persist across bus resets and, on some (most) devices, across power cycles. The concept of a "group reservation" was introduced to support clustering. Basically, when a node boots, it can "register" itself with the disk. When each member of the cluster has booted and registered (and, of course, has quorum), then one member will issue the group reservation command. This restricts access to the disks to those nodes registered. (In this case, I use the term node, but really mean initiator). In order to "fence" a member of the cluster, a node can preempt another node's reservation. Fencing at the Fibre Channel switch level - Most fibre channel switches allow partitioning of targets. One can, for example, limit access to a given target on the FC switch to a subset of initiators. This is done through the management interface for the device. GFS, for example, can be configured to use this method of fencing. Watchdog Timers - A watchdog timer is a means to determine application health. Essentially, an application starts the timer, and then must "pet the dog" at a given interval. Failure to pet the dog (reset the timer) will result in the watchdog rebooting the system. Watchdog timers come in two forms, hardware and software. Hardware timers are obviously preferred to software timers. I would not recommend watchdog timers in general as a fencing device because there is no way for another node to guarantee that a member has removed itself from the cluster. While communications with the node may have been cut off, it is not guaranteed that the node will not issue any further I/O to shared storage. Note that a hardware watchdog timer can be used safely as a fencing device, but it depends highly on the implementation. It's easier to mess this one up than the other methods. Also, I use the term "fencing device" loosely, here, since no other node is performing an action which isolates this node from shared storage. Now, back to my regularly scheduled coding... -Jeff
participants (4)
-
Andy Stewart
-
Don Peterson
-
Jeff Moyer
-
Walt Sawyer