==> Regarding RE: [Wlug] NFS question; Andy Stewart <andystewart@comcast.net> adds: andystewart> -----BEGIN PGP SIGNED MESSAGE----- Hash> SHA1 andystewart> On Thursday 14 October 2004 10:09 am, Jeff Moyer wrote:
The one hardware component you may need to buy would be a fencing device.
andystewart> I have a "fencing device" - its a 4' long wooden replica of a andystewart> German longsword. You could beat the computer with it if you andystewart> get frustrated. ;-) andystewart> Cost: $90 - handmade Everyone's a comedian. ;) I/O fencing - n. The act of isolating a rogue program or computer from a shared storage medium. This is a term used in clustering, whereby one node is determined to be in an unknown state and, in order to perform recovery, it must be kept from writing any further data to the shared storage. Common methods of fencing include, but are not limited to: STONITH - "The big hammer" approach. Stands for Shoot The Other Node In The Head. Also known as STOMITH, where M is Member. Effected by the use of remote power switches, this is a common fencing method in Linux clusters. Essentially, each node has access to the power switch for each other node. In the event of a node failure, a surviving member will reset power on the failed node. SCSI Reservations - A part of the SCSI standard, reservations restrict access to storage to the device which issued the reservation. In Digital Clusters (later Compaq TruClusters), DAIO (pronounced day-o, Direct Access I/O) disks were turned into served disks, and "owned" by one member of the cluster. This member would issue the SCSI RESERVE command for the disk, and other members wishing to initiate I/O to this disk would have to go through the server. This is not anything as crude as NFS. Remember Digital had its proprietary Memory Channel bus, which is used to issue the I/O requests and get responses. SCSI Reservations can be broken by the holder issuing a Release command, or by any type of reset (bus, power). Because of this, it has been historically problematic (most O/S's will reset the SCSI bus on boot). So, when a node is determined to be in an unknown/unsafe state, the cluster will clear its reservation (by issuing a reset to the device) and another node will take over serving the disk. Persistent Reservations - A newer form of the Reservation above, persistent reserves persist across bus resets and, on some (most) devices, across power cycles. The concept of a "group reservation" was introduced to support clustering. Basically, when a node boots, it can "register" itself with the disk. When each member of the cluster has booted and registered (and, of course, has quorum), then one member will issue the group reservation command. This restricts access to the disks to those nodes registered. (In this case, I use the term node, but really mean initiator). In order to "fence" a member of the cluster, a node can preempt another node's reservation. Fencing at the Fibre Channel switch level - Most fibre channel switches allow partitioning of targets. One can, for example, limit access to a given target on the FC switch to a subset of initiators. This is done through the management interface for the device. GFS, for example, can be configured to use this method of fencing. Watchdog Timers - A watchdog timer is a means to determine application health. Essentially, an application starts the timer, and then must "pet the dog" at a given interval. Failure to pet the dog (reset the timer) will result in the watchdog rebooting the system. Watchdog timers come in two forms, hardware and software. Hardware timers are obviously preferred to software timers. I would not recommend watchdog timers in general as a fencing device because there is no way for another node to guarantee that a member has removed itself from the cluster. While communications with the node may have been cut off, it is not guaranteed that the node will not issue any further I/O to shared storage. Note that a hardware watchdog timer can be used safely as a fencing device, but it depends highly on the implementation. It's easier to mess this one up than the other methods. Also, I use the term "fencing device" loosely, here, since no other node is performing an action which isolates this node from shared storage. Now, back to my regularly scheduled coding... -Jeff