IBM AIX HACMP Hardware User Manual

Manual is about: Certification Study Guide

Summary of AIX HACMP

  • Page 1

    Sg24-5131-00 international technical support organization http://www.Redbooks.Ibm.Com ibm certification study guide aix hacmp david thiessen, achim rehor, reinhard zettler.

  • Page 3

    Ibm certification study guide aix hacmp may 1999 sg24-5131-00 international technical support organization.

  • Page 4

    © copyright international business machines corporation 1999. All rights reserved. Note to u.S government users – documentation related to restricted rights – use, duplication or disclosure is subject to restrictions set forth in gsa adp schedule contract with ibm corp. First edition (may 1999) this...

  • Page 5

    © copyright ibm corp. 1999 iii contents figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Ix tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Xi preface . . . . . . . . . . . . ...

  • Page 6

    Iv ibm certification study guide aix hacmp chapter 3. Cluster hardware and software preparation . . . . . . . . . . . 51 3.1 cluster node setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.1 adapter slot placement . . . . . . . . . . . . . . . . . . . . . . . . ...

  • Page 7

    V 5.1.3 event notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1.4 event recovery and retry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1.5 notes on customizing event processing . . . . . . . . . . . . . . . . . 123 5.1.6 event emulator . . ....

  • Page 8

    Vi ibm certification study guide aix hacmp 8.1.1 the clstat command. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.1.2 monitoring clusters using haview . . . . . . . . . . . . . . . . . . . . . . 152 8.1.3 cluster log files . . . . . . . . . . . . . . . . . . . . . . . . . . ...

  • Page 9

    Vii 9.3 vsds - rvsds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 9.3.1 virtual shared disk (vsds) . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 9.3.2 recoverable virtual shared disk . . . . . . . . . . . . . . . . . . . . . . . 193 9.4 sp switch ...

  • Page 10

    Viii ibm certification study guide aix hacmp.

  • Page 11

    © copyright ibm corp. 1999 ix figures 1. Basic ssa configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2. Hot-standby configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3. Mutual takeover configuration . . . . . . . . . ...

  • Page 12

    X ibm certification study guide aix hacmp.

  • Page 13

    © copyright ibm corp. 1999 xi tables 1. Aix version 4 hacmp installation and implementation . . . . . . . . . . . . . . . 4 2. Aix version 4 hacmp system administration . . . . . . . . . . . . . . . . . . . . . . 5 3. Hardware requirements for the different hacmp versions . . . . . . . . . . . . 8 4...

  • Page 14

    Xii ibm certification study guide aix hacmp.

  • Page 15

    Xiii preface the aix and rs/6000 certifications offered through the professional certification program from ibm are designed to validate the skills required of technical professionals who work in the powerful and often complex environments of aix and rs/6000. A complete set of professional certifica...

  • Page 16

    Xiv ibm certification study guide aix hacmp • aix parameters that are affected by an hacmp installation, and their correct settings • the cluster and resource configuration process, including how to choose the best resource configuration for a customer requirement • customization of the standard hac...

  • Page 17

    Xv powerparallel systems area, known as the sp1 at that time. In 1997 he began working on hacmp as the service groups for hacmp and rs/6000 sp merged into one. He holds a diploma in computer science from the university of frankfurt in germany. This is his first redbook. Reinhard zettler is an aix so...

  • Page 18

    Xvi ibm certification study guide aix hacmp.

  • Page 19

    © copyright ibm corp. 1999 1 chapter 1. Certification overview this chapter provides an overview of the skill requirements for obtaining an ibm certified specialist - aix hacmp certification. The following chapters are designed to provide a comprehensive review of specific topics that are essential ...

  • Page 20

    2 ibm certification study guide aix hacmp 1.2 certification exam objectives the following objectives were used as a basis for what is required when the certification exam was developed. Some of these topics have been regrouped to provide better organization when discussed in this publication. Sectio...

  • Page 21

    Certification overview 3 • create an application server. • set up event notification. • set up event notification and pre/post event scripts. • set up error notification. • post configuration activities. • configure a client notification and arp update. • implement a test plan. • create a snapshot. ...

  • Page 22

    4 ibm certification study guide aix hacmp 1.3 certification education courses courses and publications are offered to help you prepare for the certification tests. These courses are recommended, but not required, before taking a certification test. At the printing of this guide, the following course...

  • Page 23

    Certification overview 5 the following table outlines information about the next course. Table 2. Aix version 4 hacmp system administration course number q1150 (usa); au50 (worldwide) course duration five days course abstract this course teaches the student the skills required to administer an hacmp...

  • Page 24

    6 ibm certification study guide aix hacmp.

  • Page 25

    © copyright ibm corp. 1999 7 chapter 2. Cluster planning the area of cluster planning is a large one. Not only does it include planning for the types of hardware (cpus, networks, disks) to be used in the cluster, but it also includes other aspects. These include resource planning, that is, planning ...

  • Page 26

    8 ibm certification study guide aix hacmp risc system/6000 models as nodes in an hacmp 4.1 for aix, hacmp 4.2 for aix, or hacmp 4.3 for aix cluster. Table 3. Hardware requirements for the different hacmp versions 1 aix 4.3.2 required for a detailed description of system models supported by hacmp/600...

  • Page 27

    Cluster planning 9 much of the decision centers around the following areas: • processor capacity • application requirements • anticipated growth requirements • i/o slot requirements these paradigms are certainly not new ones, and are also important considerations when choosing a processor for a sing...

  • Page 28

    10 ibm certification study guide aix hacmp your slot configuration must also allow for the disk i/o adapters you need to support the cluster’s shared disk (volume group) configuration. If you intend to use disk mirroring for shared volume groups, which is strongly recommended, then you will need to ...

  • Page 29

    Cluster planning 11 2.2 cluster networks hacmp differentiates between two major types of networks: tcp/ip networks and non-tcp/ip networks. Hacmp utilizes both of them for exchanging heartbeats. Hacmp uses these heartbeats to diagnose failures in the cluster. Non-tcp/ip networks are used to distingu...

  • Page 30

    12 ibm certification study guide aix hacmp • fddi • sp switch • slip • socc • token-ring as an independent, layered component of aix, the hacmp for aix software works with most tcp/ip-based networks. Hacmp for aix has been tested with standard ethernet interfaces (en*) but not with ieee 802.3 ethern...

  • Page 31

    Cluster planning 13 network types also differentiate themselves in the maximum distance they allow between adapters, and in the maximum number of adapters allowed on a physical network. • ethernet supports 10 and 100 mbps currently, and supports hardware address swapping. Alternate hardware addresse...

  • Page 32

    14 ibm certification study guide aix hacmp • sp switch is a high-speed packet switching network, running on the rs/6000 sp system only. It runs bidirectionally up to 80 mbps, which adds up to 160 mbps of capacity per adapter. This is node-to-node communication and can be done in parallel between eve...

  • Page 33

    Cluster planning 15 2.2.2.2 special considerations as for tcp/ip networks, there are a number of restrictions on non-tcp/ip networks. These are explained for the three different types in more detail below. Serial (rs232) a serial (rs232) network needs at least one available serial port per cluster n...

  • Page 34

    16 ibm certification study guide aix hacmp 2 a pci multiport async card is required in an s7x model, no native ports 3 only one serial port available for customer use, i.E. Hacmp in case the number of native serial ports doesn’t match your hacmp cluster configuration needs, you can extend it by addi...

  • Page 35

    Cluster planning 17 ssa subsystems are built up from loops of adapters and disks. A simple example is shown in figure 1. Figure 1. Basic ssa configuration here, a single adapter controls one ssa loop of eight disks. Data can be transferred around the loop, in either direction, at 20 mbps. Consequent...

  • Page 36

    18 ibm certification study guide aix hacmp • 7133 serial storage architecture (ssa) disk subsystem models 010, 500, 020, 600, d40 and t40. The 7133 models 010 and 500 were the first ssa products announced in 1995 with the revolutionary new serial storage architecture. Some ibm customers still use th...

  • Page 37

    Cluster planning 19 2.3.1.1 disk capacities table 8 lists the different ssa disks, and provides an overview of their characteristics. Table 8. Ssa disks 2.3.1.2 supported and non-supported adapters table 9 lists the different ssa adapters and presents an overview of their characteristics. Table 9. S...

  • Page 38

    20 ibm certification study guide aix hacmp 1 see 2.3.1.3, “rules for ssa loops” on page 20 for more information. The following rules apply to ssa adapters: • you cannot have more than four adapters in a single system. • the mca ssa 4-port raid adapter (fc 6217) and pci ssa 4-port raid adapter (fc 62...

  • Page 39

    Cluster planning 21 • a maximum of 48 devices can be connected in a particular ssa loop. • only one pair of adapter connectors can be connected in a particular ssa loop. • member disk drives of an array can be on either ssa loop. For ssa loops that include a micro channel enhanced ssa multi-initiato...

  • Page 40

    22 ibm certification study guide aix hacmp 2.3.1.4 raid vs. Non-raid raid technology raid is an acronym for redundant array of independent disks. Disk arrays are groups of disk drives that work together to achieve higher data-transfer and i/o rates than those provided by single large drives. Arrays ...

  • Page 41

    Cluster planning 23 raid levels 2 and 3 raid 2 and raid 3 are parallel process array mechanisms, where all drives in the array operate in unison. Similar to data striping, information to be written to disk is split into chunks (a fixed amount of data), and each chunk is written out to the same physi...

  • Page 42

    24 ibm certification study guide aix hacmp as with raid 3, in the event of disk failure, the information can be rebuilt from the remaining drives. Raid level 5 array also uses parity information, though it is still important to make regular backups of the data in the array. Raid level 5 stripes data...

  • Page 43

    Cluster planning 25 • array member drives and spares must be on same loop (cannot span a and b loops) on the adapter. • you cannot boot (ipl) from a raid. 2.3.1.5 advantages because ssa allows scsi-2 mapping, all functions associated with initiators, targets, and logical units are translatable. Ther...

  • Page 44

    26 ibm certification study guide aix hacmp 2.3.2 scsi disks after the announcement of the 7133 ssa disk subsystems, the scsi disk subsystems became less common in hacmp clusters. However, the 7135 raidiant array (model 110 and 210) and other scsi subsystems are still in use at many customer sites. W...

  • Page 45

    Cluster planning 27 • enhanced scsi-2 differential fast/wide adapter/a (mca, fc: 2412, adapter label: 4-c); not usable with 7135-110 • scsi-2 fast/wide differential adapter (pci, fc: 6209, adapter label: 4-b) • de ultra scsi adapter (pci, fc: 6207, adapter label: 4-l); not usable with 7135-110 2.3.2...

  • Page 46

    28 ibm certification study guide aix hacmp withdraw the 7135 raidiant systems from marketing because it is equally possible to configure raid on the ssa subsystems. 2.4 resource planning hacmp provides a highly available environment by identifying a set of cluster-wide resources essential to uninter...

  • Page 47

    Cluster planning 29 • cascading • rotating • concurrent each of these types describes a different set of relationships between nodes in the cluster, and a different set of behaviors upon nodes entering and leaving the cluster. Cascading resource groups: all nodes in a cascading resource group are as...

  • Page 48

    30 ibm certification study guide aix hacmp reintegration, a node remains as a standby and does not take back any of the resources that it had initially served. Concurrent resource groups: a concurrent resource group may be shared simultaneously by multiple nodes. The resources that can be part of a ...

  • Page 49

    Cluster planning 31 figure 2. Hot-standby configuration in this configuration, there is one cascading resource group consisting of the four disks, hdisk1 to hdisk4, and their constituent volume groups and file systems. Node 1 has a priority of 1 for this resource group while node 2 has a priority of...

  • Page 50

    32 ibm certification study guide aix hacmp the cluster becomes a standby node. You must choose a rotating standby configuration if you do not want a break in service during reintegration. Since takeover nodes continue providing services until they have to leave the cluster, you should configure your...

  • Page 51

    Cluster planning 33 when a failed node reintegrates into the cluster, it takes back the resource group for which it has the highest priority. Therefore, even in this configuration, there is a break in service during reintegration. Of course, if you look at it from the point of view of performance, t...

  • Page 52

    34 ibm certification study guide aix hacmp here the resource groups are the same as the ones in the mutual takeover configuration. Also, similar to the previous configuration, nodes 1 and 2 each have priorities of 1 for one of the resource groups, a or b. The only thing different in this configurati...

  • Page 53

    Cluster planning 35 • design the network topology • define a network mask for your site • define ip addresses (adapter identifiers) for each node’s service and standby adapters. • define a boot address for each service adapter that can be taken over, if you are using ip address takeover or rotating ...

  • Page 54

    36 ibm certification study guide aix hacmp dual network a dual-network setup has two separate networks for communication. Nodes are connected to two networks, and each node has two service adapters available to clients. If one network fails, the remaining network can still function, connecting nodes...

  • Page 55

    Cluster planning 37 the following diagram shows a cluster consisting of two nodes and a client. A single public network connects the nodes and the client, and the nodes are linked point-to-point by a private high-speed socc connection that provides an alternate path for cluster and lock traffic shou...

  • Page 56

    38 ibm certification study guide aix hacmp slip are considered public networks. Note that a slip line, however, does not provide client access. Private a private network provides communication between nodes only; it typically does not allow client access. An socc line or an atm network are also priv...

  • Page 57

    Cluster planning 39 until it assumes the shared ip address. Consequently, clinfo makes known the boot address for this adapter. In an hacmp for aix environment on the rs/6000 sp, the sp ethernet adapters can be configured as service adapters but should not be configured for ip address takeover. For ...

  • Page 58

    40 ibm certification study guide aix hacmp service label (address) instead of the boot label. If the node should fail, a takeover node acquires the failed node’s service address on its standby adapter, thus making the failure transparent to clients using that specific service address. During the rei...

  • Page 59

    Cluster planning 41 if you do not use hardware address takeover, the arp cache of clients can be updated by adding the clients’ ip addresses to the ping_client_list variable in the /usr/sbin/cluster/etc/clinfo.Rc file. 2.4.4 nfs exports and nfs mounts there are two items concerning nfs when doing th...

  • Page 60

    42 ibm certification study guide aix hacmp application on the takeover node when a fallover occurs. For more information about creating application server resources, see the hacmp for aix, version 4.3: installation guide, sc23-4278. 2.5.1 performance requirements in order to plan your application’s ...

  • Page 61

    Cluster planning 43 2.5.3 licensing methods some vendors require a unique license for each processor that runs an application, which means that you must license-protect the application by incorporating processor-specific information into the application when it is installed. As a result, it is possi...

  • Page 62

    44 ibm certification study guide aix hacmp 2.6 customization planning the cluster manager’s ability to recognize a specific series of events and subevents permits a very flexible customization scheme. The hacmp for aix software provides an event customization facility that allows you to tailor clust...

  • Page 63

    Cluster planning 45 event to inform system administrators that traffic may have to be rerouted. Afterwards, you can use a network_up notification event to inform system administrators that traffic can again be serviced through the restored network. 2.6.1.3 predictive event error correction you can s...

  • Page 64

    46 ibm certification study guide aix hacmp 2.6.2.1 single point-of-failure hardware component recovery as described in 2.2.1.2, “special network considerations” on page 12, the hps switch network is one resource that has to be considered as a single point of failure. Since a node can support only on...

  • Page 65

    Cluster planning 47 the above example screen will add a notification method to the odm, so that upon appearance of the hps_fault9_er entry in the error log, the error notification daemon will trigger the execution of the /usr/sbin/cluster/utilities/clstop -grsy command, which shuts hacmp down gracef...

  • Page 66

    48 ibm certification study guide aix hacmp 2.7 user id planning the following sections describe various aspects of user id planning. 2.7.1 cluster user and group ids one of the basic tasks any system administrator must perform is setting up user accounts and groups. All users require accounts to gai...

  • Page 67

    Cluster planning 49 2.7.2 cluster passwords while user and group management is very much facilitated with c-spoc, the password information still has to be distributed by some other means. If the system is not configured to use nis or dce, the system administrator still has to distribute the password...

  • Page 68

    50 ibm certification study guide aix hacmp 2.7.3.3 nfs-mounted home directories on shared volumes so, a combined approach is used in most cases. In order to make home directories a highly available resource, they have to be part of a resource group and placed on a shared volume. That way, all cluste...

  • Page 69

    © copyright ibm corp. 1999 51 chapter 3. Cluster hardware and software preparation this chapter covers the steps that are required to prepare the rs/6000 hardware and aix software for the installation of hacmp and the configuration of the cluster. This includes configuring adapters for tcp/ip, setti...

  • Page 70

    52 ibm certification study guide aix hacmp mirroring rootvg in order to avoid the impact of the failover time involved in a node failure. In terms of maximizing availability, this technique is just as valid for increasing the availability of a cluster as it is for increasing single-system availabili...

  • Page 71

    Cluster hardware and software preparation 53 mirrored. If the dump devices are not the paging device, that dump logical volume will not be mirrored. 3.1.2.1 procedure the following steps assume the user has rootvg contained on hdisk0 and is attempting to mirror the rootvg to a new disk: hdisk1. 1. E...

  • Page 72

    54 ibm certification study guide aix hacmp “-m” option. You should consult documentation on the usage of the “-m” option for mklvcopy . 4. Synchronize the newly created mirrors with the following command: 5. Bosboot to initialize all boot records and devices by executing the following command: where...

  • Page 73

    Cluster hardware and software preparation 55 3.1.2.2 necessary apar fixes table 11. Necessary apar fixes to determine if either fix is installed on a machine, execute the following: 3.1.3 aix prerequisite lpps in order to install hacmp and hacmp/es the aix setup must be in a proper state. The follow...

  • Page 74

    56 ibm certification study guide aix hacmp • nv6000.Database.Obj 4.1.0.0 • nv6000.Features.Obj 4.1.2.0 • nv6000.Client.Obj 4.1.0.0 and for haview 4.3 • xlc.Rte 3.1.4.0 • nv6000.Base.Obj 4.1.2.0 • nv6000.Database.Obj 4.1.2.0 • nv6000.Features.Obj 4.1.2.0 • nv6000.Client.Obj 4.1.2.0 3.1.4 aix paramete...

  • Page 75

    Cluster hardware and software preparation 57 and low-water marks. If a process tries to write to a file at the high-water mark, it must wait until enough i/o operations have finished to make the low-water mark. Use the smit chgsys fastpath to set high- and low-water marks on the change/show characte...

  • Page 76

    58 ibm certification study guide aix hacmp 3.1.4.3 editing the /etc/hosts file and nameserver configuration make sure all nodes can resolve all cluster addresses. See the chapter on planning tcp/ip networks (the section using hacmp with nis and dns) in the hacmp for aix, version 4.3: planning guide,...

  • Page 77

    Cluster hardware and software preparation 59 3.1.4.5 editing the /.Rhosts file make sure that each node’s service adapters and boot addresses are listed in the /.Rhosts file on each cluster node. Doing so allows the /usr/sbin/cluster/utilities/clruncmd command and the /usr/sbin/cluster/godm daemon t...

  • Page 78

    60 ibm certification study guide aix hacmp 3.2 network connection and testing the following sections describe important aspects of network connection and testing. 3.2.1 tcp/ip networks since there are several types of tcp/ip networks available within hacmp, there are several different characteristic...

  • Page 79

    Cluster hardware and software preparation 61 . Figure 9. Connecting networks to a hub 3.2.1.2 ip addresses and subnets the design of the hacmp for aix software specifies that: • all client traffic be carried over the service adapter • standby adapters be hidden from client applications and carry onl...

  • Page 80

    62 ibm certification study guide aix hacmp to comply with these rules, pay careful attention to the ip addresses you assign to standby adapters. Standby adapters must be on a separate subnet from the service adapters, even though they are on the same physical network. Placing standby adapters on a d...

  • Page 81

    Cluster hardware and software preparation 63 • scan the /tmp/hacmp.Out file to confirm that the /etc/rc.Net script has run successfully. Look for a zero exit status. • if ip address takeover is enabled, confirm that the /etc/rc.Net script has run and that the service adapter is on its service addres...

  • Page 82

    64 ibm certification study guide aix hacmp tmssa target-mode ssa is only supported with the ssa multi-initiator raid adapters (feature #6215 and #6219), microcode level 1801 or later. You need at least hacmp version 4.2.2 with apar ix75718. 3.2.2.2 configuring rs232 use the smit tty fastpath to crea...

  • Page 83

    Cluster hardware and software preparation 65 3.2.2.4 configuring target mode ssa the node number on each system needs to be changed from the default of zero to a number. All systems on the ssa loop must have a unique node number. To change the node number use the following command: chdev -l ssar -a ...

  • Page 84

    66 ibm certification study guide aix hacmp cat /etc/environment > /dev/tmssay.Im on the corresponding node for writing. X and y correspond to the appropriate opposite nodenumber. You should see the first command hanging until the second command is issued, and then showing its output. Target mode scs...

  • Page 85

    Cluster hardware and software preparation 67 for more information regarding adapters and cabling rules see 2.3.1, “ssa disks” on page 16 or the following documents: • 7133 ssa disk subsystems: service guide, sy33-0185-02 • 7133 ssa disk subsystem: operator guide, ga33-3259-01 • 7133 models 010 and 0...

  • Page 86

    68 ibm certification study guide aix hacmp adapter definitions by issuing the following command, you can check the correct adapter configuration. In order to work correctly, the adapter must be in the “available” state: the third column in the adapter device line shows the location of the adapter. D...

  • Page 87

    Cluster hardware and software preparation 69 ssa physical disks: • are configured as pdisk0, pdisk1,...,pdiskn. • have errors logged against them in the system error log. • support a character special file (/dev/pdisk0, /dev/pdisk1,...,/dev/p.Diskn). • support the ioctll subroutine for servicing and...

  • Page 88

    70 ibm certification study guide aix hacmp configuration verification this option enables you to display the relationships between physical (pdisk) and logical (hdisk) disks. Format disk this option enables you to format ssa disk drives. Certify disk this option enables you to test whether data on a...

  • Page 89

    Cluster hardware and software preparation 71 12.Run cfgmgr to install the microcode to adapters. 13.To complete the device driver upgrade, you must now reboot your system. 14.To confirm that the upgrade was a success, type lscfg -vl ssax where x is 0,1... For all ssa adapters. Check the ros level li...

  • Page 90

    72 ibm certification study guide aix hacmp 18.To confirm that the upgrade was a success, type lscfg -vl pdiskx where x is 0,1... For all ssa disks. Check the ros level line to see that each disk has the appropriate microcode level (for the correct microcode level see the above mentioned web-site). 3...

  • Page 91

    Cluster hardware and software preparation 73 3.3.2.1 cabling the following sections describe important information about cabling. Scsi adapters a overview of scsi adapters that can be used on a shared scsi bus is given in 2.3.2.3, “supported scsi adapters” on page 26. For the necessary adapter chang...

  • Page 92

    74 ibm certification study guide aix hacmp fc: 2902 or 9202 (2.4m), pn: 67g1260 - or - fc: 2905 or 9205 (4.5m), pn: 67g1261 - or - fc: 2912 or 9212 (12m), pn: 67g1262 - or - fc: 2914 or 9214 (14m), pn: 67g1263 - or - fc: 2918 or 9218 (18m), pn: 67g1264 • terminator (t) included in fc 2422 (y-cable),...

  • Page 93

    Cluster hardware and software preparation 75 fc: 2426 (0.94m), pn: 52g4234 • 16-bit scsi-2 differential system-to-system cable fc: 2424 (0.6m), pn: 52g4291 - or - fc: 2425 (2.5m), pn: 52g4233 this cable is used only if there are more than two nodes attached to the same shared bus. • 16-bit different...

  • Page 94

    76 ibm certification study guide aix hacmp t t t t 6 bit) 6 (16-bit) #2416 (16- #2424 6-bit) 6 (16-bit ) #2426 #2416 (16-b #2416 (16-bit) #2426 maximum total cable length: 25m.

  • Page 95

    Cluster hardware and software preparation 77 figure 11. 7135-110 raidiant arrays connected on two shared 16-bit scsi buses 3.3.2.3 adapter scsi id and termination change the scsi-2 differential controller is used to connect to 8-bit disk devices on a shared bus. The scsi-2 differential fast/wide ada...

  • Page 96

    78 ibm certification study guide aix hacmp scsi-2 differential fast/wide adapter/a and enhanced scsi-2 differential fast/wide adapter/a) are shown in figure 12 and figure 13 respectively. Figure 12. Termination on the scsi-2 differential controller figure 13. Termination on the scsi-2 differential f...

  • Page 97

    Cluster hardware and software preparation 79 the id of an scsi adapter, by default, is 7. Since each device on an scsi bus must have a unique id, the id of at least one of the adapters on a shared scsi bus has to be changed. The procedure to change the id of an scsi-2 differential controller is: 1. ...

  • Page 98

    80 ibm certification study guide aix hacmp 4. Reboot the machine to bring the change into effect. The same task can be executed from the command line by entering: also with this method, a reboot is required to bring the change into effect. The procedure to change the id of an scsi-2 differential fas...

  • Page 99

    Cluster hardware and software preparation 81 the command line version of this is: as in the case of the scsi-2 differential controller, a system reboot is required to bring the change into effect. The maximum length of the bus, including any internal cabling in disk subsystems, is limited to 19 mete...

  • Page 100

    82 ibm certification study guide aix hacmp 3.4.1 creating shared vgs the following sections contain information about creating non-concurrent vgs and vgs for concurrent access. 3.4.1.1 creating non-concurrent vgs this section covers how to create a shared volume group on the source node using the sm...

  • Page 101

    Cluster hardware and software preparation 83 creating a concurrent access volume group on serial disk subsystems to use a concurrent access volume group, defined on a serial disk subsystem such as an ibm 7133 disk subsystem, you must create it as a concurrent-capable volume group. A concurrent-capab...

  • Page 102

    84 ibm certification study guide aix hacmp use the smit mkvg fastpath to create a shared volume group. Use the default field values unless your site has other requirements, or unless you are specifically instructed otherwise. Table 15. Smit mkvg options (concurrent, raid) 3.4.2 creating shared lvs a...

  • Page 103

    Cluster hardware and software preparation 85 the journaled file system log (jfslog) is a logical volume that requires a unique name in the cluster. To make sure that logical volumes have unique names, rename the logical volume associated with the file system and the corresponding jfslog logical volu...

  • Page 104

    86 ibm certification study guide aix hacmp that is, you enter this command for each disk. In the resulting display, locate the line for the logical volume for which you just added copies. For copies placed on separate disks, the numbers in the logical partitions column and the physical partitions co...

  • Page 105

    Cluster hardware and software preparation 87 the taskguide uses a graphical interface to guide you through the steps of adding nodes to an existing volume group. For more information on the taskguide, see 3.4.6, “alternate method - taskguide” on page 90. Importing the volume group onto the destinati...

  • Page 106

    88 ibm certification study guide aix hacmp 3.4.4.4 varying off the volume group on the destination nodes use the varyoffvg command to deactivate the shared volume group so that it can be imported onto another destination node or activated as appropriate by the cluster event scripts. Enter: varyoffvg...

  • Page 107

    Cluster hardware and software preparation 89 command succeeds. If exactly half the copies are available, as with two of four, quorum is not achieved and the varyonvg command fails. 3.4.5.2 quorum after vary on if a write to a physical volume fails, the vgsas on the other physical volumes within the ...

  • Page 108

    90 ibm certification study guide aix hacmp forcing a varyon a volume group with quorum disabled and one or more physical volumes unavailable can be “forced” to vary on by using the -f flag with the varyonvg command. Forcing a varyon with missing disk resources can cause unpredictable results, includ...

  • Page 109

    Cluster hardware and software preparation 91 conflict with the cluster’s configuration. Online help panels give additional information to aid in each step. 3.4.6.1 taskguide requirements before starting the taskguide, make sure: • you have a configured hacmp cluster in place. • you are on a graphics...

  • Page 110

    92 ibm certification study guide aix hacmp.

  • Page 111

    © copyright ibm corp. 1999 93 chapter 4. Hacmp installation and cluster definition this chapter describes issues concerning the actual installation of hacmp version 4.3 and the definition of a cluster and its resources. It concentrates on the hacmp part of the installation, so, we will assume aix is...

  • Page 112

    94 ibm certification study guide aix hacmp cluster.Base.Server.Utils hacmp base server utilities • cluster.Cspoc this component includes all of the commands and environment for the c-spoc utility, the cluster-single point of control feature. These routines are responsible for centralized administrat...

  • Page 113

    Hacmp installation and cluster definition 95 • cluster.Vsm the visual systems management fileset contains icons and bitmaps for the graphical management of hacmp resources, as well as the xhacmpm command: cluster.Vsm hacmp x11 dependent • cluster.Haview this fileset contains the files for including ...

  • Page 114

    96 ibm certification study guide aix hacmp this fileset contains the application heart beat daemon, oracle parallel server is an application that makes use of it: cluster.Hc.Rte application heart beat daemon the installation of crm requires the following software: bos.Rte.Lvm.Usr.4.3.2.0 aix run-tim...

  • Page 115

    Hacmp installation and cluster definition 97 hacmp software to hacmp for aix, version 4.3. The comments on upgrading the operating system are not included. If you are already running aix 4.3, see the special note at the end of this section. 4.1.2.1 upgrading from version 4.1.0 through 4.2.2 to versi...

  • Page 116

    98 ibm certification study guide aix hacmp install hacmp 4.3 for aix on node a 5. After upgrading aix and verifying that the disks are correctly configured, install the hacmp 4.3 for aix software on node a. For a short description of the filesets, please refer to 4.1.1, “first time installs” on page...

  • Page 117

    Hacmp installation and cluster definition 99 file on node a using the following command: /usr/sbin/cluster/utilities/cllsif -x >> /.Rhosts this command will append information to the/.Rhosts file instead of overwriting it. Then, you can ftp this file to the other nodes as necessary. 12.Verify the cl...

  • Page 118

    100 ibm certification study guide aix hacmp 2. If you wish to save your cluster configuration, see the chapter saving and restoring cluster configurations in the hacmp for aix, version 4.3: administration guide, sc23-4279. 3. Commit your current hacmp for aix software on all nodes. 4. Shut down one ...

  • Page 119

    Hacmp installation and cluster definition 101 • the network modules you define the cluster topology by entering information about each component into hacmp-specific odm classes. You enter the hacmp odm data by using the hacmp smit interface or the vsm utility xhacmpm . The xhacmpm utility is an x wi...

  • Page 120

    102 ibm certification study guide aix hacmp adding or changing a node name after the initial configuration if you want to add or change a node name after the initial configuration, use the change/show cluster node name screen. See the chapter on changing the cluster topology of the hacmp for aix, ve...

  • Page 121

    Hacmp installation and cluster definition 103 network name enter an ascii text string that identifies the network. The network name can include alphabetic and numeric characters and underscores. Use no more than 31 characters. The network name is arbitrary, but must be used consistently for adapters...

  • Page 122

    104 ibm certification study guide aix hacmp adapter identifier enter the ip address in dotted decimal format or a device file name. Ip address information is required for non-serial network adapters only if the node’s address cannot be obtained from the domain name server or the local /etc/hosts fil...

  • Page 123

    Hacmp installation and cluster definition 105 adding or changing adapters after the initial configuration if you want to change the information about an adapter after the initial configuration, use the change/show an adapter screen. See the chapter on changing the cluster topology in the hacmp for a...

  • Page 124

    106 ibm certification study guide aix hacmp • slip • sp switch • atm it is highly unlikely that you will add or remove a network module. For information about changing a characteristic of a network module, such as the failure detection rate, see the chapter on changing the cluster topology in the ha...

  • Page 125

    Hacmp installation and cluster definition 107 configuration. If the cluster manager is active on some other cluster nodes but not on the local node, the synchronization operation is aborted. Before attempting to synchronize a cluster configuration, ensure that all nodes are powered on, that the hacm...

  • Page 126

    108 ibm certification study guide aix hacmp 4.3 defining resources the hacmp for aix software provides a highly available environment by identifying a set of cluster-wide resources essential to uninterrupted processing, and then by defining relationships among nodes that ensure these resources are a...

  • Page 127

    Hacmp installation and cluster definition 109 4.3.1.1 configuring resources for resource groups once you have defined resource groups, you further configure them by assigning cluster resources to one resource group or another. You can configure resource groups even if a node is powered down. However...

  • Page 128

    110 ibm certification study guide aix hacmp these settings also have to be synchronized throughout the cluster. Therefore synchronize cluster resources has to be chosen from the corresponding smit menu. If the cluster manager is running on the local node, synchronizing cluster resources triggers a d...

  • Page 129

    Hacmp installation and cluster definition 111 as the path locations for start and stop scripts for the application. These scripts have to be in the same location on every service node. Just as for pre- and post-events, these scripts can be adapted to specific nodes. They don’t need to be equal in co...

  • Page 131

    Hacmp installation and cluster definition 113 for cascading resource groups the failed node is going to reaquire its resources, once it is up and running again. So, you have to restart hacmp on it through smitty clstart and check again for the logfile, as well as the clusters status. Further and mor...

  • Page 132

    114 ibm certification study guide aix hacmp essentially, a snapshot saves all the odm classes hacmp has generated during its configuration. It does not save user customized scripts, such as start or stop scripts for an application server. However, the location and names of these scripts are in an ha...

  • Page 133

    Hacmp installation and cluster definition 115.

  • Page 134

    116 ibm certification study guide aix hacmp.

  • Page 135

    © copyright ibm corp. 1999 117 chapter 5. Cluster customization within an hacmp for aix cluster, there are several things that are customizable. The following paragraphs explain the customizing features for events, error notification, network modules and topology services. 5.1 event customization an...

  • Page 136

    118 ibm certification study guide aix hacmp acquire_service_addr (if configured for ip address takeover.) configures boot addresses to the corresponding service address, and starts tcp/ip servers and network daemons by running the telinit -a command. Acquire_takeover_addr the script checks to see if...

  • Page 137

    Cluster customization 119 event occurs only after a node_up_remote event has successfully completed. Sequence of node_down events node_down this event occurs when a node intentionally leaves the cluster or fails. Depending on whether the exiting node is local or remote, this event initiates either t...

  • Page 138

    120 ibm certification study guide aix hacmp node_down_local_completeinstructs the cluster manager to exit when the local node has left the cluster. This event occurs only after a node_down_local event has successfully completed. Node_down_remote_completestarts takeover application servers. This even...

  • Page 139

    Cluster customization 121 no actions since appropriate actions depend on the local network configuration. 5.1.1.3 network adapter events swap_adapter this event occurs when the service adapter on a node fails. The swap_adapter event exchanges or swaps the ip addresses of the service and a standby ad...

  • Page 140

    122 ibm certification study guide aix hacmp reconfig_resource_completethis event indicates that a cluster resource dynamic reconfiguration has completed. 5.1.2 pre- and post-event processing to tailor event processing to your environment, specify commands or user-defined scripts that should execute ...

  • Page 141

    Cluster customization 123 for example, a file system cannot be unmounted, because of a process running on it. Then, you might want to kill that process first, before unmounting the file system, in order to get the event script done. Now, since the event script didn’t succeed in its first run, the re...

  • Page 142

    124 ibm certification study guide aix hacmp each time an error is logged in the system error log, the error notification daemon determines if the error log entry matches the selection criteria. If it does, an executable is run. This executable, called a notify method, can range from a simple command...

  • Page 143

    Cluster customization 125 the failure rate of networks varies, depending on their characteristics. For example, for an ethernet, the normal failure detection rate is two keepalives per second; fast is about four per second; slow is about one per second. For an hps network, because no network traffic...

  • Page 144

    126 ibm certification study guide aix hacmp to prevent problems with nfs file systems in an hacmp cluster, make sure that each shared volume group has the same major number on all nodes. The lvlstmajor command lists the free major numbers on a node. Use this command on each node to find a major numb...

  • Page 145

    Cluster customization 127 figure 14. Nfs cross mounts when node a fails, node b uses the cl_nfskill utility to close open files in node a:/afs, unmounts it, mounts it locally, and re-exports it to waiting clients. After takeover, node b has: /bfs locally mounted /bfs nfs-exported /afs locally mounte...

  • Page 146

    128 ibm certification study guide aix hacmp • ensure that node name and the service adapter label are the same on each node in the cluster or • alias the node name to the service adapter label in the /etc/hosts file. 5.4.5 cross mounted nfs file systems and the network lock manager if an nfs client ...

  • Page 147

    Cluster customization 129 ######## add for nfs lock removal (start) ######## ######## add for nfs lock removal (finish) ######## ############################################################################### # # name: cl_deactivate_nfs # # given a list of nfs-mounted filesystems, we try and unmount...

  • Page 148

    130 ibm certification study guide aix hacmp fi /bin/rm -f /etc/sm.Bak/$host /bin/rm -f /etc/sm/$host /bin/rm -f /etc/state fi ######## add for nfs lock removal (finish) ######## # send a sigkill to all processes having open file # descriptors within this logical volume to allow # the unmount to succ...

  • Page 149

    © copyright ibm corp. 1999 131 chapter 6. Cluster testing before you start to test the hacmp configuration, you need to guarantee that your cluster nodes are in a stable state. Check the state of the: • devices • system parameters • processes • network adapters • lvm • cluster • other items such as ...

  • Page 150

    132 ibm certification study guide aix hacmp 6.1.2 system parameters • type date on all nodes to check that all the nodes in the cluster are running with their clocks on the same time. • ensure that the number of user licenses has been correctly set (lslicense ). • check high water mark and other sys...

  • Page 151

    Cluster testing 133 • check that all interfaces communicate ( ping or ping -r ). • list the arp table entries with arp -a . • check the status of the tcp/ip daemons ( lssrc -g tcpip ). • ensure that there are no bad entries in the /etc/hosts file, especially at the bottom of the file. • verify that,...

  • Page 152

    134 ibm certification study guide aix hacmp • verify the cluster configuration by running /usr/sbin/cluster/diag/clconfig -v ’-tr’ . • to show cluster configuration, run: /usr/sbin/cluster/utilities/cllscf . • to show the clstrmgr version, type: snmpinfo -m dump -o /usr/sbin/cluster/hacmp.Defs clstr...

  • Page 153

    Cluster testing 135 • use ifconfig to swap the service address back to the original service interface back ( ifconfig en1 down ). This will cause the service ip address to failover back to the service adapter on nodef. 6.2.1.2 ethernet or token ring adapter or cable failure perform the following ste...

  • Page 154

    136 ibm certification study guide aix hacmp • generate the switch error in the error log which is being monitored by hacmp error notification (for configuration see 2.6.2.1, “single point-of-failure hardware component recovery” on page 46), or, if the network_down event has been customized, bring do...

  • Page 155

    Cluster testing 137 • verify that all sharedvg file systems and paging spaces are accessible ( df -k and lsps -a ). 6.2.2 node failure / reintegration the following sections deal with issues of node failure and reintegration. 6.2.2.1 aix crash perform the following steps in the event of an aix crash...

  • Page 156

    138 ibm certification study guide aix hacmp • verify that failover has occurred ( netstat -i and ping for networks, lsvg -o and vi of a test file for volume groups, and ps -u > for application processes). • power cycle nodef. If hacmp is not configured to start from /etc/inittab (on restart), start ...

  • Page 157

    Cluster testing 139 • monitor the cluster log files on nodet. • disconnect the network cable from the appropriate service and all the standby interfaces at the same time (but not the administrative sp ethernet) on nodef. This will cause hacmp to detect a network_down event. • hacmp triggers events d...

  • Page 158

    140 ibm certification study guide aix hacmp • reconnect hdisk0, close the casing, and turn the key to normal mode. • power on nodef then verify that the rootvg logical volumes are no longer stale ( lsvg -l rootvg ). 6.2.4.2 7135 disk failure perform the following steps in the event of a disk failure...

  • Page 159

    Cluster testing 141 • monitor cluster logfiles on nodet if hacmp has been customized to monitor 7133 disk failures. • since the 7133 disk is hot pluggable, remove a disk from drawer 1 associated with nodef's shared volume group. • the failure of the 7133 disk will be detected in the error log ( errp...

  • Page 160

    142 ibm certification study guide aix hacmp.

  • Page 161

    © copyright ibm corp. 1999 143 chapter 7. Cluster troubleshooting typically, a functioning hacmp cluster requires minimal intervention. If a problem occurs, however, diagnostic and recovery skills are essential. Thus, troubleshooting requires that you identify the problem quickly and apply your unde...

  • Page 162

    144 ibm certification study guide aix hacmp for a more detailed description of the cluster log files consult chapter 2 of the hacmp for aix, version 4.3: troubleshooting guide, sc23-4280. 7.2 config_too_long if the cluster manager recognizes a state change in the cluster, it acts upon it by executin...

  • Page 163

    Cluster troubleshooting 145 hang. After a certain amount of time, by default 360 seconds, the cluster manager will issue a config_too_long message into the /tmp/hacmp.Out file. The message issued looks like this: the cluster has been in reconfiguration too long;something may be wrong. In most cases,...

  • Page 164

    146 ibm certification study guide aix hacmp 7.3.1 tuning the system using i/o pacing use i/o pacing to tune the system so that system resources are distributed more equitably during large disk writes. Enabling i/o pacing is required for an hacmp cluster to behave correctly during large disk writes, ...

  • Page 165

    Cluster troubleshooting 147 7.3.4 changing the failure detection rate use the smit change/show a cluster network module screen to change the failure detection rate for your network module only if enabling i/o pacing or extending the syncd frequency did not resolve deadman problems in your cluster. B...

  • Page 166

    148 ibm certification study guide aix hacmp and control messages so that the cluster manager has accurate information about the status of its partner. When a cluster becomes partitioned, and the network problem is cleared after the point when takeover processing has begun so that keepalive packets s...

  • Page 167

    Cluster troubleshooting 149 7.6 user id problems within an hacmp cluster, you always have more than one node potentially offering the same service to a specific user or a specific user id. As the node providing the service can change, the system administrator has to ensure that the same user and gro...

  • Page 168

    150 ibm certification study guide aix hacmp • go from the simple to the complex. Make the simple tests first. Do not try anything complex and complicated until you have ruled out the simple and obvious. • do not make more than one change at a time. If you do, and one of the changes corrects the prob...

  • Page 169

    © copyright ibm corp. 1999 151 chapter 8. Cluster management and administration this chapter covers all aspects of monitoring and managing an existing hacmp cluster. This includes a description of the different monitoring methods and tools available, how to start and stop the cluster, changing clust...

  • Page 170

    152 ibm certification study guide aix hacmp consult the hacmp for aix, version 4.3: troubleshooting guide, sc23-4280, for help if you detect a problem with an hacmp cluster. 8.1.1 the clstat command hacmp for aix provides the /usr/sbin/cluster/clstat command for monitoring a cluster and its componen...

  • Page 171

    Cluster management and administration 153 more details on how to configure haview and on how to monitor your cluster with haview can be found in chapter 3, “monitoring an hacmp cluster” in hacmp for aix, version 4.3: administration guide, sc23-4279. 8.1.3 cluster log files hacmp for aix writes the m...

  • Page 172

    154 ibm certification study guide aix hacmp 8.1.3.5 /tmp/cm.Log contains timestamped, formatted messages generated by hacmp for aix clstrmgr activity. This file is typically used by ibm support personnel. 8.1.3.6 /tmp/cspoc.Log contains timestamped, formatted messages generated by hacmp for aix c-sp...

  • Page 173

    Cluster management and administration 155 (c-spoc) utility can be used to start and stop cluster services on all nodes in cluster environments. Starting cluster services refers to the process of starting the hacmp for aix daemons that enable the coordination required between nodes in a cluster. Star...

  • Page 174

    156 ibm certification study guide aix hacmp 8.2.1.4 cluster information program daemon (clinfo) this daemon provides status information about the cluster to cluster nodes and clients and invokes the /usr/sbin/cluster/etc/clinfo.Rc script in response to a cluster event. The clinfo daemon is optional ...

  • Page 175

    Cluster management and administration 157 are started in sequential order - not in parallel. The output of the command run on the remote node is returned to the originating node. Because the command is executed remotely, there can be a delay before the command output is returned. 8.2.2.1 automatical...

  • Page 176

    158 ibm certification study guide aix hacmp node. Because the command is executed remotely, there can be a delay before the command output is returned. 8.2.3.1 when to stop cluster services you typically stop cluster services in the following situations: • before making any hardware or software chan...

  • Page 177

    Cluster management and administration 159 prevents unpredictable behavior from corrupting the data on the shared disks. See the clexit.Rc man page for additional information. 8.2.4 starting and stopping cluster services on clients use the /usr/sbin/cluster/etc/rc.Cluster script or the startsrc comma...

  • Page 178

    160 ibm certification study guide aix hacmp 8.3 replacing failed components from time to time, it will be necessary to perform hardware maintenance or upgrades on cluster components. Some replacements or upgrades can be performed while the cluster is operative, while others require planned downtime....

  • Page 179

    Cluster management and administration 161 • the new adapter must be of the same type or a compatible type as the replaced adapter. • when replacing or adding an scsi adapter, remove the resistors for shared buses. Furthermore, set the scsi id of the adapter to a value different than 7. 8.3.3 disks d...

  • Page 180

    162 ibm certification study guide aix hacmp 4. Logically remove the disk from the system ( rmdev -l hdiskx -d; rmdev -l pdisky -d if a ssa disk) on all nodes. 5. Physically remove the failed disk and replace it with a new disk. 6. Add the disk to the odm ( mkdev or cfgmgr) on all nodes. 7. Add the d...

  • Page 181

    Cluster management and administration 163 8.4 changing shared lvm components changes to vg constructs are probably the most frequent kind of changes to be performed in a cluster. As a system administrator of an hacmp for aix cluster, you may be called upon to perform any of the following lvm-related...

  • Page 182

    164 ibm certification study guide aix hacmp when changing shared lvm components manually, you will usually need to run through the following procedure: 1. Stop hacmp on the node owning the shared volume group (sometimes a stop of the applications using the shared volume group may be sufficient). 2. ...

  • Page 183

    Cluster management and administration 165 lazy update has some limitations, which you need to consider when you rely on lazy update in general: • if the first disk in a sharedvg has been replaced, the importvg command will fail as lazy update expects to be able to match the hdisk number for the firs...

  • Page 184

    166 ibm certification study guide aix hacmp • shared volume groups • list all volume groups in the cluster. • import a volume group (with hacmp 4.3 only). • extend a volume group (with hacmp 4.3 only). • reduce a volume group (with hacmp 4.3 only). • mirror a volume group (with hacmp 4.3 only). • un...

  • Page 185

    Cluster management and administration 167 to use the smit shortcuts to c-spoc, type smit cl_lvm or smit cl_conlvm for concurrent volume groups. Concurrent volume groups must be varied on in concurrent mode to perform tasks. 8.4.4 taskguide the taskguide is a graphical interface that simplifies the t...

  • Page 186

    168 ibm certification study guide aix hacmp to change the nodes associated with a given resource group, or to change the priorities assigned to the nodes in a resource group chain, you must redefine the resource group. You must also redefine the resource group if you add or change a resource assigne...

  • Page 187

    Cluster management and administration 169 • if the cluster manager is active on the local node, synchronization triggers a cluster-wide, dynamic reconfiguration event. In dynamic reconfiguration, the configuration data stored in the dcd is updated on each cluster node, and, in addition, the new odm ...

  • Page 188

    170 ibm certification study guide aix hacmp 8.5.3.1 resource migration types before performing a resource migration, decide if you will declare the migration sticky or non-sticky . Sticky resource migration a sticky migration permanently attaches a resource group to a specified node. The resource gr...

  • Page 189

    Cluster management and administration 171 inactive_takeover flag set to false and has not yet started because its primary node is down. In general, however, only rotating resource groups should be migrated in a non-sticky manner. Such migrations are one-time events and occur similar to normal rotati...

  • Page 190

    172 ibm certification study guide aix hacmp if you do not include a location specifier in the location field, the dare resource migration utility performs a default migration, again making the resources available for reacquisition. Stop location the second special location keyword, stop , causes a r...

  • Page 191

    Cluster management and administration 173 note that you cannot add nodes to the resource group list with the dare resource migration utility. This task is performed through smit. Stopping resource groups if the location field of a migration contains the keyword stop instead of an actual nodename, th...

  • Page 192

    174 ibm certification study guide aix hacmp be aware that persistent sticky location markers are saved and restored in cluster snapshots. You can use the clfindres command to find out if sticky markers are present in a resource group. If you want to remove sticky location markers while the cluster i...

  • Page 193

    Cluster management and administration 175 5. Restart the hacmp for aix software on the node using the smit clstart fastpath and verify that the node successfully joined the cluster. 6. Repeat steps 1 through 5 on the remaining cluster nodes. Figure 15 below shows the procedure: figure 15. Applying a...

  • Page 194

    176 ibm certification study guide aix hacmp • cluster nodes should be running the same hacmp maintenance levels. There might be incompatibilities between various maintenance levels of hacmp, so you must ensure that consistent levels are maintained across all cluster nodes. The cluster must be taken ...

  • Page 195

    Cluster management and administration 177 8.7.1.1 how to do a split-mirror backup this same procedure can be used with just one mirrored copy of a logical volume. If you remove a mirrored copy of a logical volume (and file system), and then create a new logical volume (and file system) using the all...

  • Page 196

    178 ibm certification study guide aix hacmp 9. After the backup is complete and verified, unmount and delete the new file system and the logical volume you used for it. 10.Use the mklvcopy command to add back the logical volume copy you previously split off to the fslv logical volume. 11.Resynchroni...

  • Page 197

    Cluster management and administration 179 they don’t match, the user won’t get anything done after a failover happened. So, the administrator has to keep definitions equal throughout the cluster. Fortunately, the c-spoc utility, as of hacmp version 4.3 and later, does this for you. When you create a...

  • Page 198

    180 ibm certification study guide aix hacmp to add a user on one or more nodes in a cluster, you can either use the aix mkuser command in a rsh to one clusternode after the other, or use the c-spoc cl_mkuser command or the add a user to the cluster smit screen. The cl_mkuser command calls the aix mk...

  • Page 199

    Cluster management and administration 181 to remove a user account from one or more cluster nodes, you can either use the aix rmuser command on one cluster node after the other, or use the c-spoc cl_rmuser command or the c-spocremove a user from the cluster smit screen. The cl_rmuser command execute...

  • Page 200

    182 ibm certification study guide aix hacmp.

  • Page 201

    © copyright ibm corp. 1999 183 chapter 9. Special rs/6000 sp topics this chapter will introduce you to some special topics that only apply if you are running hacmp on the sp system. 9.1 high availability control workstation (hacws) if you are thinking about what could happen to your sp whenever the ...

  • Page 202

    184 ibm certification study guide aix hacmp need to have the frame supervisors support dual tty lines in order to get both control workstations connected at the same time. Contact your ibm representative for the neccessary hardware (see figure 16 on page 184). Both the tty network and the rs/6000 sp...

  • Page 203

    Special rs/6000 sp topics 185 the backup cws has to be installed with the same level of aix and pssp. Depending on the kerberos configuration of the primary cws, the backup cws has to be configured either as a secondary authentication server for the authentication realm of your rs/6000 sp when the p...

  • Page 204

    186 ibm certification study guide aix hacmp ordinary hacmp cluster, as it is described in chapter 7 of the hacmp for aix, version 4.3: installation guide, sc23-4278. Now the cluster environment has to be configured. Define a cluster id and name for your hacws cluster and define the two nodes to hacm...

  • Page 205

    Special rs/6000 sp topics 187 after that, identify the hacws event scripts to hacmp by executing the /usr/sbin/hacws/spcw_addevents command, and verify the configuration with the /usr/sbin/hacws/hacws_verify command. You should also check the cabling from the backup cws with the /usr/sbin/hacws/spcw...

  • Page 206

    188 ibm certification study guide aix hacmp the following is simply a shortened description on how kerberos works. For more details, the redbook inside the rs/6000 sp, sg24-5145, covers the subject in much more detail. When dealing with authentication and kerberos, three entities are involved: the c...

  • Page 207

    Special rs/6000 sp topics 189 allow the clients to get service tickets to be used with other servers without the need to give them the password every time they request services. So, given a user has a ticket-granting ticket, if a user requests a kerberized service, he has to get a service ticket for...

  • Page 208

    190 ibm certification study guide aix hacmp after setting the cluster’s security settings to enhanced for all these nodes, you can verify that it is working as expected, for example, by running clverify, which goes out to the nodes and checks the consistency of files. 9.3 vsds - rvsds vsds (virtual ...

  • Page 209

    Special rs/6000 sp topics 191 with reference to figure 17 above, imagine two nodes, node x and node y, running the same application. The nodes are connected by the switch and have locally-attached disks. On node x’s disk resides a volume group containing the raw logical volume lv_x. Similarly, node ...

  • Page 210

    192 ibm certification study guide aix hacmp the vsds in this scenario are mapped to the raw logical volumes lv_x and lv_y. Node x is a client of node y’s vsd, and vice versa. Node x is also a direct client of its own vsd (lv_x), and node y is a direct client of vsd lv_y. Vsd configuration is flexibl...

  • Page 211

    Special rs/6000 sp topics 193 impact of servicing a local i/o request through vsd relative to the normal vmm/lvm pathway is very small. Ibm supports any ip network for vsd, but we recommend the switch for performance. Vsd provides distributed data access, but not a locking mechanism to preserve data...

  • Page 212

    194 ibm certification study guide aix hacmp operation that was in progress, as well as new i/o operations against rvsd_x, are suspended until failover is complete. When node x is repaired and rebooted, rvsd switches the rvsd_x back to its primary, node x. The rvsd subsystems are shown in figure 20 o...

  • Page 213

    Special rs/6000 sp topics 195 9.4 sp switch as an hacmp network one of the fascinating things with an rs/6000 sp is the switch network. It has developed over time; so, currently there are two types of switches at customer sites. The “older” hps or hips switch (high performance switch), also known as...

  • Page 214

    196 ibm certification study guide aix hacmp 9.4.2 eprimary management the sp switch has an internal primary backup concept, where the primary node, known as the eprimary, is backed up automatically by a backup node. So, in case any serious failure happens on the primary, it will resign from work, an...

  • Page 215

    Special rs/6000 sp topics 197 in case this node was the eprimary node on the switch network, and it is an sp switch, then the rs/6000 sp software would have chosen a new eprimary independently from the hacmp software as well..

  • Page 216

    198 ibm certification study guide aix hacmp.

  • Page 217

    © copyright ibm corp. 1999 199 chapter 10. Hacmp classic vs. Hacmp/es vs. Hanfs so, why would you prefer to install one version of hacmp instead of another? This chapter summarizes the differences between them, to give you an idea in which situation one or the other best matches your needs. The cert...

  • Page 218

    200 ibm certification study guide aix hacmp handling membership and event management by using heartbeats. On the sp, the original high availability infrastructure was built on this technology, and hacmp/es version 4.3. Is now another instance relying on it. As of aix 4.3.2 and pssp 3.1, the high ava...

  • Page 219

    Hacmp classic vs. Hacmp/es vs. Hanfs 201 see part 4 of hacmp for aix, version 4.3: enhanced scalability installation and administration guide, sc23-4284, for more information on these services. 10.2.2 enhanced cluster security with hacmp version 4.3 comes an option to switch security mode between st...

  • Page 220

    202 ibm certification study guide aix hacmp 10.4 similarities and differences all three products have the basic structure in common. They all use the same concepts and structures. So, a cluster or a network, in the hacmp context, is the same, no matter what product is being used. There is always a c...

  • Page 221

    Hacmp classic vs. Hacmp/es vs. Hanfs 203 for switchless rs/6000 sp systems or sps with the newer sp switch, the decision will be based on a more functional level. Event management is much more flexible in hacmp/es, since you can define custom events. These events can act on anything that haemd can d...

  • Page 222

    204 ibm certification study guide aix hacmp.

  • Page 223

    © copyright ibm corp. 1999 205 appendix a. Special notices this publication is intended to help system administrators, system engineers and other system professionals to pass the ibm hacmp certification exam. The information in this publication is not intended as the specification for any of the fol...

  • Page 224

    206 ibm certification study guide aix hacmp been reviewed by ibm for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. Any pointers in ...

  • Page 225

    Special notices 207 java and hotjava are trademarks of sun microsystems, incorporated. Microsoft, windows, windows nt, and the windows 95 logo are trademarks or registered trademarks of microsoft corporation. Pc direct is a trademark of ziff communications company and is used by ibm corporation unde...

  • Page 226

    208 ibm certification study guide aix hacmp.

  • Page 227

    © copyright ibm corp. 1999 209 appendix b. Related publications the publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook. B.1 international technical support organization publications for information on ordering...

  • Page 228

    210 ibm certification study guide aix hacmp b.3 other publications these publications are also relevant as additional sources of information: • ibm rs/6000 sp: planning, volume 2, control workstation and software environment, ga22-7281 • ibm pssp for aix: installation and migration guide, ga22-7347 ...

  • Page 229

    © copyright ibm corp. 1999 211 how to get itso redbooks this section explains how both customers and ibm employees can find out about itso redbooks, cd-roms, workshops, and residencies. A form for ordering books and cd-roms is also provided. This information was current at the time of publication, b...

  • Page 230

    212 ibm certification study guide aix hacmp how customers can get itso redbooks customers may request itso deliverables (redbooks, bookmanager books, and cd-roms) and information about redbooks, workshops, and residencies in the following ways: • online orders – send orders to: • telephone orders • ...

  • Page 231

    213 ibm redbook order form please send me the following: we accept american express, diners, eurocard, master card, and visa. Payment by credit card not available in all countries. Signature mandatory for credit card payment. Title order number quantity first name last name company address city post...

  • Page 232

    214 ibm certification study guide aix hacmp.

  • Page 233

    © copyright ibm corp. 1999 215 list of abbreviations aix advanced interactive executive apa all points addressable apar authorized program analysis report the description of a problem to be fixed by ibm defect support. This fix is delivered in a ptf (see below). Arp address resolution protocol ascii...

  • Page 234

    216 ibm certification study guide aix hacmp netbios network basic input/output system nfs network file system nim network interface module (this is the definition of nim in the hacmp context. Nim in the aix 4.1 context stands for network installation manager). Nis network information service nvram n...

  • Page 235

    © copyright ibm corp. 1999 217 index symbols /.Rhosts file editing 59 /etc/hosts file and adapter label 38 /sbin/rc.Boot file 146 /usr/sbin/cluster/godm daemon 59 a abbreviations 215 abnormal termination 158 acronyms 215 adapter failure 134 adapter function 38 adapter hardware address 104 adapter id...

  • Page 236

    218 ibm certification study guide aix hacmp dgsp message 148 disk capacities 19 disk failure 139 dual-network 36 dynamic reconfiguration 169 e editing /.Rhosts file 59 emsvcsd 156 enhanced cluster security 201 eprimary 196 error notification 45 , 123 ethernet 13 event customization 44 , 117 event em...

  • Page 237

    219 network topology 35 networks point-to-point 36 nfs mounting filesystems 126 takeover issues 126 nfs cross mount 41 nfs exports 41 nfs mount 41 nim 199 nis 58 node events 117 node failure / reintegration 137 node isolation 147 node relationships 108 non-concurrent access quorum 90 non-sticky reso...

  • Page 238

    220 ibm certification study guide aix hacmp token-ring 13 topology service 200 topsvcsd 156 u upgrading 96 user accounts adding 179 changing 180 creating 179 removing 180 user and group ids 48 v vgda 88 vgsa 88 virtual shared disk (vsds) 190 x xhacmpm 101.

  • Page 239

    © copyright ibm corp. 1999 221 itso redbook evaluation ibm certification study guide aix hacmp sg24-5131-00 your feedback is very important to help us maintain the quality of itso redbooks. Please complete this questionnaire and return it using one of the following methods: • use the online evaluati...

  • Page 240

    Pr int e d in the u .S.A . Sg24- 5131- 00 ibm certification study guide aix hacmp sg24-5131-00