Summary of Storwize V7000

  • Page 1

    Ibm storwize v7000 version 6.3.0 troubleshooting, recovery, and maintenance guide gc27-2291-02.

  • Page 2

    Note before using this information and the product it supports, read the general information in “notices” on page 143, the information in the “safety and environmental notices” on page ix, as well as the information in the ibm environmental notices and user guide on the documentation dvd. This editi...

  • Page 3: Contents

    Contents figures . . . . . . . . . . . . . . . V tables . . . . . . . . . . . . . . . Vii safety and environmental notices . . . Ix sound pressure . . . . . . . . . . . . . Ix about this guide . . . . . . . . . . . Xi who should use this guide . . . . . . . . . Xi summary of changes for gc27-2291-02...

  • Page 4

    Procedure: finding the status of the ethernet connections . . . . . . . . . . . . . . 55 procedure: removing system data from a node canister . . . . . . . . . . . . . . . . 56 procedure: deleting a system completely . . . . 56 procedure: fixing node errors . . . . . . . . 56 procedure: changing the...

  • Page 5: Figures

    Figures 1. 12 drives on either 2076-112 or 2076-312 . . . 2 2. 24 drives on either 2076-124 or 2076-324 . . . 2 3. Led indicators on a single 3.5" drive . . . . 3 4. Led indicators on a single 2.5" drive . . . . 3 5. 12 drives and two end caps . . . . . . . 4 6. Left enclosure end cap . . . . . . . ...

  • Page 6

    Vi storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 7: Tables

    Tables 1. Storwize v7000 library . . . . . . . . Xiii 2. Other ibm publications . . . . . . . . Xv 3. Ibm documentation and related websites xv 4. Drive leds . . . . . . . . . . . . . 3 5. Led descriptions . . . . . . . . . . . 5 6. Power supply unit leds in the rear of the control enclosure . . . ....

  • Page 8

    Viii storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 9

    Safety and environmental notices review the multilingual safety notices for the ibm ® storwize ® v7000 system before you install and use the product. Suitability for telecommunication environment: this product is not intended to connect directly or indirectly by any means whatsoever to interfaces of...

  • Page 10

    X storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 11: About This Guide

    About this guide this guide describes how to service, maintain, and troubleshoot the ibm storwize v7000. The chapters that follow introduce you to the hardware components and to the tools that assist you in troubleshooting and servicing the storwize v7000, such as the management gui and the service ...

  • Page 12

    New information this topic describes the changes to this guide since the previous edition, gc27-2291-00. The following sections summarize the changes that have since been implemented from the previous version. This version includes the following new information: v support statements for the 2076-312...

  • Page 13

    Publib.Boulder.Ibm.Com/infocenter/storwize/ic/index.Jsp storwize v7000 library unless otherwise noted, the publications in the storwize v7000 library are available in adobe portable document format (pdf) from the following website: support for storwize v7000 website at www.Ibm.Com/storage/support/st...

  • Page 14

    Table 1. Storwize v7000 library (continued) title description order number ibm storwize v7000 safety notices this guide contains translated caution and danger statements. Each caution and danger statement in the storwize v7000 documentation has a number that you can use to locate the corresponding s...

  • Page 15

    Table 2. Other ibm publications title description order number ibm storage management pack for microsoft system center operations manager user guide this guide describes how to install, configure, and use the ibm storage management pack for microsoft system center operations manager (scom). Gc27-390...

  • Page 16

    To submit any comments about this book or any other storwize v7000 documentation: v go to the feedback page on the website for the storwize v7000 information center at publib.Boulder.Ibm.Com/infocenter/storwize/ic/index.Jsp?Topic=/ com.Ibm.Storwize v7000.Doc/feedback.Htm. There you can use the feedb...

  • Page 17

    Chapter 1. Storwize v7000 hardware components a storwize v7000 system consists of one or more machine type 2076 rack-mounted enclosures. There are several model types. The main differences among the model types are the following items: v the number of drives that an enclosure can hold. Drives are lo...

  • Page 18

    V the number of ports at the rear of the enclosure. Control enclosures have ethernet ports, fibre channel ports, and usb ports. Expansion enclosures do not have any of these ports. V the number of leds on the power supplies. Control enclosure power supplies have six; expansion enclosure power suppli...

  • Page 19

    1 fault led 2 activity led table 4 shows the status descriptions for the two leds. Table 4. Drive leds name description color activity indicates if the drive is ready or active. V if the led is on, the drive is ready to be used. V if the led is off, the drive is not ready. V if the led is flashing, ...

  • Page 20

    Enclosure end cap indicators this topic describes the indicators on the enclosure end cap. Figure 5 shows where the end caps are located on the front of an enclosure with 12 drives. The end caps are located in the same position for an enclosure with 24 drives. V 1 left end cap v 2 drives v 3 right e...

  • Page 21

    Table 5. Led descriptions name description color symbol power 1 the power led is the upper led. When the green led is lit, it indicates that the main power is available to the enclosure green fault 2 the fault led is the middle led. When the amber led is lit, it indicates that one of the enclosure c...

  • Page 22

    1 power supply unit 1 2 power supply unit 2 3 canister 1 4 canister 2 power supply unit and battery for the control enclosure the control enclosure contains two power supply units, each with an integrated battery. The two power supply units in the enclosure are installed with one unit top side up an...

  • Page 23

    Table 6 identifies the leds in the rear of the control enclosure. Table 6. Power supply unit leds in the rear of the control enclosure name color symbol ac power failure amber power supply ok green fan failure amber dc power failure amber battery failure amber + - battery state green + - see “proced...

  • Page 24

    There is a power switch on each of the power supply units. The switch must be on for the power supply unit to be operational. If the power switches are turned off, the power supply units stop providing power to the system. Figure 11 shows the locations of the leds 1 in the rear of the power supply u...

  • Page 25

    Each node canister has four fibre channel ports located on the left side of the canister as shown in figure 12. The ports are in two rows of two ports. The ports are numbered 1 - 4 from left to right and top to bottom. Note: the reference to the left and right locations applies to canister 1, which ...

  • Page 26

    Table 8. Fibre channel port led locations on canister 1 associated port led location led status port 3 3 first led between ports 1 and 3 1 speed port 1 1 second led between ports 1 and 3 2 speed port 3 3 third led between ports 1 and 3 3 link port 1 1 fourth led between ports 1 and 3 4 link port 4 4...

  • Page 27

    The usb ports are numbered 1 on the left and 2 on the right as shown in figure 14. One port is used during installation. Note: the reference to the left and right locations applies to canister 1, which is the upper canister. The port locations are inverted for canister 2, which is the lower canister...

  • Page 28

    Table 10 provides a description of the two leds. Table 10. 1 gbps ethernet port leds name description color link speed (led on right of upper canister) the led is on when there is a link connection; otherwise, the led is off. Green activity (led on left of upper canister) the led is flashing when th...

  • Page 29

    Table 11. 10 gbps ethernet port leds name description color link speed the led is on when there is a link connection; otherwise, the led is off. Amber activity the led is flashing when there is activity on the link; otherwise, the led is off. Green node canister sas ports and indicators two serial-a...

  • Page 30

    Node canister leds each node canister has three leds that provide status and identification for the node canister. The three leds are located in a horizontal row near the upper right of the canister 1. Figure 18 shows the rear view of the node canister leds. Note: the reference to the left and right...

  • Page 31

    Table 13. Node canister leds (continued) name description color symbol fault indicates if a fault is present and identifies which canister. V the on status indicates that the node is in service state or an error exists that might be stopping the software from starting. Do not assume that this status...

  • Page 32

    The sas ports are numbered 1 on the left and 2 on the right as shown in figure 19. Use of port 1 is required. Use of port 2 is optional. Each port connects four data channels. Note: the reference to the left and right locations applies to canister 1, which is the upper canister. The port locations a...

  • Page 33

    Table 15. Expansion canister leds name description color symbol status indicates if the canister is active. V if the led is on, the canister is active. V if the led is off, the canister is not active. V if the led is flashing, there is a vital product data (vpd) error. Green fault indicates if a fau...

  • Page 34

    18 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 35

    Chapter 2. Best practices for troubleshooting troubleshooting is made easier by taking advantage of certain configuration options and ensuring that you have recorded vital information that is required to access your system. Record access information it is important that anyone who has responsibility...

  • Page 36

    Follow power management procedures access to your volume data can be lost if you incorrectly power off all or part of a system. Use the management gui or the cli commands to power off a system. Using either of these methods ensures that the data that is cached in the node canister memory is correctl...

  • Page 37

    Rather than reporting a problem, an email is sent to ibm that describes your system hardware and critical configuration information. Object names and other information, such as ip addresses, are not sent. The inventory email is sent on a regular basis. Based on the information that is received, ibm ...

  • Page 38

    The release notes provide information about new function in a release plus any issues that have been resolved. Update your code regularly if the release notes indicate an issue that you might be exposed to. Keep your records up to date record the location information for your enclosures. If you have...

  • Page 39

    Chapter 3. Understanding the storwize v7000 battery operation for the control enclosure storwize v7000 node canisters cache volume data and hold state information in volatile memory. If the power fails, the cache and state data is written to a local solid-state drive (ssd) that is held within the ca...

  • Page 40

    Completed charging, then the system starts in service state and does not permit i/o operations to be restarted until the batteries are half charged. The recharging takes approximately 30 minutes. In a system with a failed battery, an ac power failure causes both canisters to save critical data and c...

  • Page 41

    Maintenance discharges are scheduled for the following situations: v a battery has been powered on for three months without a maintenance discharge. V a battery has provided protection for saving critical data at least twice. V a battery has provided protection for at least 10 brown outs, which last...

  • Page 42

    26 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 43

    Chapter 4. Understanding the medium errors and bad blocks a storage system returns a medium error response to a hose when it is unable to successfully read a block. The storwize v7000 response to a host read follows this behavior. The volume virtualization that is provided extends the time when a me...

  • Page 44

    The recommended actions for these alerts guide you in correcting the situation. Bad blocks are cleared by deallocating the volume disk extent by deleting the volume or by issuing write i/o to the block. It is good practice to correct bad blocks as soon as they are detected. This action prevents the ...

  • Page 45: System

    Chapter 5. Storwize v7000 user interfaces for servicing your system storwize v7000 provides a number of user interfaces to troubleshoot, recover, or maintain your system. The interfaces provide various sets of facilities to help resolve situations that you might encounter. The interfaces for servici...

  • Page 46

    Some events require a certain number of occurrences in 25 hours before they are displayed as unfixed. If they do not reach this threshold in 25 hours, they are flagged as expired. Monitoring events are below the coalesce threshold and are usually transient. You can also sort events by time or error ...

  • Page 47

    4. When you have logged on, select monitoring > events. 5. Ensure that the events log is filtered using recommended actions. 6. Select the recommended action and run the fix procedure. 7. Continue to work through the alerts in the order suggested, if possible. After all the alerts are fixed, check t...

  • Page 48

    V recover a system if it fails. V install a software package from the support site or rescue the software from another node. V upgrade software on node canisters manually versus performing a standard upgrade procedure. V configure a control enclosure chassis after replacement. V change the service i...

  • Page 49

    Cluster (system) command-line interface use the command-line interface (cli) to manage a clustered system using the task commands and information commands. For a full description of the commands and how to start an ssh command-line session, see the “command-line interface” topic in the “reference” s...

  • Page 50

    The initialization tool is a windows application. Use the initialization tool to set up the usb key to perform the most common tasks. When a usb key is inserted into one of the usb ports on a node canister in a control enclosure, the node canister searches for a control file on the usb key and runs ...

  • Page 51

    The initialization tool is available on the usb key that is shipped with the control enclosures. The name of the application file is inittool.Exe. If you cannot locate the usb key, you can download the application from the support website: support for storwize v7000 website at www.Ibm.Com/storage/su...

  • Page 52

    Satask chserviceip -default -resetpassword parameters -serviceip (required) the ipv4 address for the service assistant. -gw (optional) the ipv4 gateway for the service assistant. -mask (optional) the ipv4 subnet for the service assistant. -serviceip_6 (required) the ipv6 address for the service assi...

  • Page 53

    Description this command resets the service assistant password to the default value passw0rd. If the node canister is active in a system, the superuser password for the system is reset; otherwise, the superuser password is reset on the node canister. If the node canister becomes active in a system, ...

  • Page 54

    Create cluster command use this command to create a storage system. Syntax satask mkcluster -clusterip ipv4 -gw ipv4 -mask ipv4 -name cluster_name satask mkcluster -clusterip_6 ipv6 -gw_6 ipv6 -prefix_6 int -name cluster_name parameters -clusterip (required) the ipv4 address for ethernet port 1 on t...

  • Page 55

    Chapter 6. Resolving a problem this topic describes the procedures that you follow to resolve fault conditions that exist on your system. This topic assumes that you have a basic understanding of the storwize v7000 system concepts. The following procedures are often used to find and resolve problems...

  • Page 56

    When you have logged on, select monitoring > events. Depending on how you choose to filter alerts, you might see only the alerts that require attention, alerts and messages that are not fixed, or all event types whether they are fixed or unfixed. Select the recommended alert, or any other alert, and...

  • Page 57

    Getting node canister and system information using the service assistant” on page 48; otherwise, go to “procedure: getting node canister and system information using a usb key” on page 49 and obtain the state of each of the node canisters from the data that is returned. If there is not a node canist...

  • Page 58

    The failure is reported regardless of the method that you used to create a clustered storage system: v usb key v service assistant v service command line the create clustered-system function protects the system from loss of volume data. If you create a clustered system on a control enclosure that wa...

  • Page 59

    V use a usb key to find the service address of a node. For more information, go to “procedure: getting node canister and system information using a usb key” on page 49. Problem: cannot connect to the service assistant this topic provides assistance if you are unable to display the service assistant ...

  • Page 60

    If you are unable to change the service address, for example, because you cannot use a usb key in the environment, see “procedure: accessing a canister using a directly attached ethernet cable” on page 59. Problem: management gui or service assistant does not display correctly this topic provides as...

  • Page 61

    V no sas cable can be connected between ports in the same enclosure. V for any enclosure, the cables that are connected to sas port 1 on each canister must attach to the same enclosure. Similarly, for any enclosure, the cables that are connected to sas port 2 on each canister must attach to the same...

  • Page 62

    Problem: mirrored volume copies no longer identical the management gui provides options to either check copies that are identical or to check that the copies are identical and to process any differences that are found. To confirm that the two copies of a mirrored volume are still identical, choose t...

  • Page 63

    Procedure: resetting superuser password you can reset the superuser password to the default password of passw0rd by using a usb key command action. You can use this procedure to reset the superuser password if you have forgotten the password. This command runs differently depending on whether you ru...

  • Page 65

    2. View the information about the node canister that you connected to or the other node canister in the same enclosure or to any other node in the same system that you are able to access over the san. Note: if the node that you want to see information about is not the current node, change it to the ...

  • Page 66

    System is not showing any information about a device. For information about the leds, go to “power supply unit and battery for the control enclosure” on page 6, “power supply unit for the expansion enclosure” on page 7, “fibre channel ports and indicators” on page 8, “ethernet ports and indicators” ...

  • Page 67

    Table 18. Power-supply unit leds power supply ok ac failure fan failure dc failure status action on on on on communication failure between the power supply unit and the enclosure chassis replace the power supply unit. If failure is still present, replace the enclosure chassis. Off off off off no ac ...

  • Page 68

    Table 18. Power-supply unit leds (continued) power supply ok ac failure fan failure dc failure status action off on off on no ac power to this power supply 1. Check that the switch on the power supply unit is on. 2. Check that the ac power is on. 3. Reseat and replace the power cable. On off off off...

  • Page 69

    Table 19. Power leds (continued) power led status description slow flashing (1 hz) power is available, but the canister is in standby mode. Try to start the node canister by reseating it. Go to “procedure: reseating a node canister” on page 60. Fast flashing (2 hz) the canister is running its power-...

  • Page 70

    Table 20. System status and fault leds (continued) system status led fault led status action on on code is active and is in starting state. However, it does not have enough resources to form the clustered system. The node canister cannot become active in a clustered system. There are no detected pro...

  • Page 71

    Table 21. Control enclosure battery leds (continued) battery good + - battery fault + - description action off flashing recoverable battery fault. None flashing flashing the battery cannot be used because the firmware for the power supply unit is being downloaded. None procedure: finding the status ...

  • Page 72

    Procedure: removing system data from a node canister this procedure guides you through the process to remove system information from a node canister. The information that is removed includes configuration data, cache data, and location data. Attention: if the enclosure reaches a point where the syst...

  • Page 73

    Node errors are reported when there is an error that is detected that affects a specific node canister. 1. Use the service assistant to view the current node errors on any node. 2. If available, use the management gui to run the recommended action for the alert. 3. Follow the fix procedure instructi...

  • Page 74

    2. Select change service ip from the menu. 3. Complete the panel. V use one of the following procedures if you cannot connect to the node canister from another node: – use the initialization tool to write the correct command file to the usb key. Go to “using the initialization tool” on page 34. – us...

  • Page 75

    Time_to_charge field for the battery. The results provide an estimate of the time, in minutes, before the system can start. If the time is not 0, wait for the required time. Check that the node canister that you inserted the usb key into has its clustered-state led on permanently. For additional inf...

  • Page 76

    Default service ip addresses 192.168.70.121 subnet mask: 255.255.255.0 and 190.168.70.122 subnet mask: 255.255.255.0 cannot be accessed on your network. Note: do not attempt to use a directly attached ethernet cable to a canister that is active in a clustered system. You might disrupt access from ho...

  • Page 77

    5. Pull out the handle to its full extension. 6. Grasp the canister and pull it out 2 or 3 inches. 7. Push the canister back into the slot until the handle starts to move. 8. Finish inserting the canister by closing the handle until the locking catch clicks into place. 9. Verify that the cables were...

  • Page 78

    Special tools that are only available to the support teams are required to interpret the contents of the support package. The files are not designed for customer use. Always follow the instructions that are given by the support team to determine whether to collect the package by using the management...

  • Page 79

    Fibre channel link failures when a failure occurs on a single fibre channel link, the small form-factor pluggable (sfp) transceiver might need to be replaced. The following items can indicate that a single fibre channel link has failed: v the customer's san monitoring tools v the fibre channel statu...

  • Page 80

    64 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 81

    Chapter 7. Recovery procedures this topic describes these recovery procedures: recover a system and back up and restore a system configuration. Recover system procedure the recover system procedure recovers the entire storage system if the data has been lost from all control enclosure node canisters...

  • Page 82

    When to run the recover system procedure a recover procedure must be attempted only after a complete and thorough investigation of the cause of the system failure. Attempt to resolve those issues by using other service procedures. Attention: if you experience failures at any time while you are runni...

  • Page 83

    Enclosure, ensure that it has sas connectivity to the listed enclosure. If the node canister that is reporting the fault is in a different i/o group from the listed enclosure, ensure that the listed enclosure has sas connectivity to both node canisters in the control enclosure in its i/o group. Afte...

  • Page 84

    V any data that was in the cache at the point of failure is lost. The loss of data can result in data corruption on the affected volumes. If the volumes are corrupted, call the ibm support center. Fix hardware errors before you can run a system recovery procedure, it is important that the root cause...

  • Page 85

    Attention: this service action has serious implications if not performed properly. If at any time during the procedure, you encounter an error that is not covered by this procedure, stop and call ibm support. Note: your web browser must not block pop-up windows; otherwise, progress windows cannot op...

  • Page 86

    Recovering from offline vdisks using the cli if a recovery procedure (t3 procedure) completes with offline volumes, you can use the command-line interface (cli) to access the volumes. If you have performed the recovery procedure, and it has completed successfully but there are offline volumes, you c...

  • Page 87

    V run the application consistency checks. Backing up and restoring the system configuration you can back up and restore the configuration data for the clustered system after preliminary tasks are completed. Configuration data for the system provides information about your system and the objects that...

  • Page 88

    You can restore the configuration by using any node as the configuration node. However, if you do not use the node that was the configuration node when the system was first created, the unique identifier (uid) of the volumes that are within the i/o groups can change. This action can affect ibm tivol...

  • Page 89

    Where ssh_private_key_file is the name of the ssh private key file for the superuser and cluster_ip is the ip address or dns name of the clustered system for which you want to back up the configuration. 4. Issue the following cli command to remove all of the existing configuration backup and restore...

  • Page 90

    You must copy these files to a location outside of your system because the /tmp directory on this node becomes inaccessible if the configuration node changes. The configuration node might change in response to an error recovery action or to a user maintenance activity. Tip: to maintain controlled ac...

  • Page 91

    A. Point your browser to the service ip address of one of the nodes, for example, https://node_service_ip_address/service/. B. Log on to the service assistant. C. From the system page, put the node into service state if it is not already in that state. D. Select manage system. E. Click remove system...

  • Page 92

    D. Continue to follow the on-screen instructions to add the control enclosures. Decline the offer to configure storage for the new enclosures when asked if you want to do so. 7. From the management gui, click access > users to set up your system and configure an ssh key for the superuser. This allow...

  • Page 93

    Pscp -i ssh_private_key_file superuser@cluster_ip:/tmp/svc.Config.Restore.Prepare.Log full_path_for_where_to_copy_log_files 14. Open the log file from the server where the copy is now stored. 15. Check the log file for errors. V if there are errors, correct the condition that caused the errors and r...

  • Page 94

    Where ssh_private_key_file is the name of the ssh private key file for the superuser and cluster_ip is the ip address or dns name of the clustered system from which you want to delete the configuration. 2. Issue the following cli command to erase all of the files that are stored in the /tmp director...

  • Page 95

    Chapter 8. Removing and replacing parts you can remove and replace field-replaceable units (frus) from the control enclosure or the expansion enclosure. Attention: if your system is powered on and performing i/o operations, go to the management gui and follow the fix procedures. Performing the repla...

  • Page 96

    V if both the power led and system status led are on, do not remove a node canister unless directed to do so by a service procedure. V if the system status is off, it is acceptable to remove a node canister. However, do not remove a node canister unless directed to do so by a service procedure. V if...

  • Page 97

    7. Pull out the handle to its full extension. 8. Grasp canister and pull it out. 9. Insert the new canister into the slot with the handle pointing towards the center of the slot. Insert the unit in the same orientation as the one that you removed. 10. Push the canister back into the slot until the h...

  • Page 98

    V if the power led is on, do not remove an expansion canister unless directed to do so by a service procedure. V if the power led is flashing or off, it is safe to remove an expansion canister. However, do not remove an expansion canister unless directed to do so by a service procedure. Attention: e...

  • Page 99

    6. Pull out the handle to its full extension. 7. Grasp canister and pull it out. 8. Insert the new canister into the slot with the handle pointing towards the center of the slot. Insert the unit in the same orientation as the one that you removed. 9. Push the canister back into the slot until the ha...

  • Page 100

    1. Carefully determine the failing physical port connection. Important: the fibre channel links in the enclosures are supported with both longwave sfp transceivers and shortwave sfp transceivers. A longwave sfp transceiver has some blue components that are visible even when the sfp transceiver is pl...

  • Page 101

    Replacing a power supply unit for a control enclosure you can replace either of the two 764 watt hot-swap redundant power supplies in the control enclosure. These redundant power supplies operate in parallel, one continuing to power the canister if the other fails. Danger when working on or around t...

  • Page 102

    Attention: if your system is powered on and performing i/o operations, go to the management gui and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or access to data. Attention: a powered-on enclosure must not have...

  • Page 103

    B. Grip the handle to pull the power supply out of the enclosure as shown in figure 30. 6. Insert the replacement power supply unit into the enclosure with the handle pointing towards the center of the enclosure. Insert the unit in the same orientation as the one that you removed. Svc00633 figure 29...

  • Page 104

    7. Push the power supply unit back into the enclosure until the handle starts to move. 8. Finish inserting the power supply unit into the enclosure by closing the handle until the locking catch clicks into place. 9. Reattach the power cable and cable retention bracket. 10. Turn on the power switch t...

  • Page 105

    Replacing a power supply unit for an expansion enclosure you can replace either of the two 580 watt hot-swap redundant power supplies in the expansion enclosure. These redundant power supplies operate in parallel, one continuing to power the canister if the other fails. Danger when working on or aro...

  • Page 106

    Attention: if your system is powered on and performing i/o operations, go to the management gui and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or access to data. Attention: a powered-on enclosure must not have...

  • Page 107

    B. Grip the handle to pull the power supply out of the enclosure as shown in figure 32. 6. Insert the replacement power supply unit into the enclosure with the handle pointing towards the center of the enclosure. Insert the unit in the same orientation as the one that you removed. Svc00633 figure 31...

  • Page 108

    7. Push the power supply unit back into the enclosure until the handle starts to move. 8. Finish inserting the power supply unit in the enclosure by closing the handle until the locking catch clicks into place. 9. Reattach the power cable and cable retention bracket. 10. Turn on the power switch to ...

  • Page 109

    Replacing a battery in a power supply unit this topic describes how to replace the battery in the control enclosure power-supply unit. Danger when working on or around the system, observe the following precautions: electrical voltage and current from power, telephone, and communication cables are ha...

  • Page 110

    Caution: the battery is a lithium ion battery. To avoid possible explosion, do not burn. Exchange only with the ibm-approved part. Recycle or discard the battery as instructed by local regulations. In the united states, ibm has a process for the collection of this battery. For information, call 1-80...

  • Page 111

    A. Press the catch to release the handle 1. B. Lift the handle on the battery 2. C. Lift the battery out of the power supply unit 3. 4. Install the replacement battery. Attention: the replacement battery has protective end caps that must be removed prior to use. A. Remove the battery from the packag...

  • Page 112

    D. Place the replacement battery in the opening on top of the power supply in its proper orientation. E. Press the battery to seat the connector. F. Place the handle in its downward location 5. Push the power supply unit back into the enclosure until the handle starts to move. 6. Finish inserting th...

  • Page 113

    3. Open the handle to the full extension. 4. Pull out the drive. 5. Push the new drive back into the slot until the handle starts to move. 6. Finish inserting the drive by closing the handle until the locking catch clicks into place. Svc00612 figure 34. Unlocking the 3.5" drive svc00613 figure 35. R...

  • Page 114

    Replacing a 2.5" drive assembly or blank carrier this topic describes how to remove a 2.5" drive assembly or blank carrier. Attention: if your drive is configured for use, go to the management gui and follow the fix procedures. Performing the replacement actions without the assistance of the fix pro...

  • Page 115

    4. Pull out the drive. 5. Push the new drive back into the slot until the handle starts to move. 6. Finish inserting the drive by closing the handle until the locking catch clicks into place. Replacing an enclosure end cap this topic describes how to replace an enclosure end cap. To replace the encl...

  • Page 116

    2. Pull the tab with the arrow away from the connector. 3. Plug the replacement cable into the specific port. 4. Ensure that the sas cable is fully inserted. A click is heard when the cable is successfully inserted. Replacing a control enclosure chassis this topic describes how to replace a control ...

  • Page 117

    Danger when working on or around the system, observe the following precautions: electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v connect power to this unit only with the ibm provided power cord. Do not use the ibm provided power...

  • Page 118

    Attention: perform this procedure only if instructed to do so by a service action or the ibm support center. If you have a single control enclosure, this procedure requires that you shut down your system to replace the control enclosure. If you have more than one control enclosure, you can keep part...

  • Page 119

    For each of the canisters, verify the status of the system status led. If the led is lit on either of the canisters, do not continue because the system is still online. Determine why the node canisters did not shut down in step 3 on page 102 or step 4 on page 102. Note: if you continue while the sys...

  • Page 120

    25. Turn on the power to the enclosure using the switches on the power supply units. The node canisters boot up. The fault leds are on because the new enclosure has not been set with the identity of the old enclosure. The node canisters report that they are in the wrong location. A. Connect to the s...

  • Page 121

    28. From the actions menu, select remove enclosure and confirm the action. The physical hardware has already been removed. You can ignore the messages about removing the hardware. Verify that the original enclosure is no longer listed in the tree view. 29. Add the new enclosure to the system. A. Sel...

  • Page 122

    Danger when working on or around the system, observe the following precautions: electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v connect power to this unit only with the ibm provided power cord. Do not use the ibm provided power...

  • Page 123

    Attention: if your system is powered on and performing i/o operations, go the management gui and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or access to data. Even though many of these procedures are hot-swapp...

  • Page 124

    15. Reinstall the drives in the new enclosure. The drives must be inserted back into the same location from which they were removed on the old enclosure. 16. Reinstall the canisters in the enclosure. 17. Install the power supply units. 18. Reattach the data cables to each canister by using the infor...

  • Page 125

    5. From one side of the rack cabinet, grip the rail and slide the rail pieces together to shorten the rail. 6. Disengage the rail location pins 2. 7. From the other side the rack cabinet, grip the rail and slide the rail pieces together to shorten the rail. 8. Disengage the rail location pins 2. 9. ...

  • Page 126

    Table 22. Replaceable units (continued) part part number applicable models fru or customer replaced 3 m sas cable 44v4163 212, 224 customer replaced 6 m sas cable 44v4164 212, 224 customer replaced 1 m fibre channel cable 39m5699 112, 124, 312, 324 customer replaced 5 m fibre channel cable 39m5700 1...

  • Page 127

    Table 22. Replaceable units (continued) part part number applicable models fru or customer replaced 2.5" ssd, 300 gb, in carrier assembly 85y5861 124, 224, 324 customer replaced 2.5" 10 k, 300 gb, in carrier assembly 85y5862 124, 224, 324 customer replaced 2.5" 10 k, 450 gb, in carrier assembly 85y5...

  • Page 128

    112 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 129

    Chapter 9. Event reporting events that are detected are saved in an event log. As soon as an entry is made in this event log, the condition is analyzed. If any service activity is required, a notification is sent. Event reporting process the following methods are used to notify you and the ibm suppo...

  • Page 130

    To avoid having a repeated event that fills the event log, some records in the event log refer to multiple occurrences of the same event. When event log entries are coalesced in this way, the time stamp of the first occurrence and the last occurrence of the problem is saved in the log entry. A count...

  • Page 131

    Of service actions that are being performed. If a recommended service action is active, these events are notified only if they are still unfixed when the service action completes. Each event that storwize v7000 detects is assigned a notification type of error, warning, or information. When you confi...

  • Page 132

    Understanding the error codes error codes are generated by the event-log analysis and system configuration code. Error codes help you to identify the cause of a problem, the failing field-replaceable units (frus), and the service actions that might be needed to solve the problem. Event ids the storw...

  • Page 133

    Table 25. Informational events (continued) event id notification type description 980350 i the node is now a functional member of the cluster (system). 980351 i a noncritical hardware error occurred. 980352 i attempt to automatically recover offline node starting. 980370 i both nodes in the i/o grou...

  • Page 134

    Table 25. Informational events (continued) event id notification type description 981101 i sas discovery occurred; no configuration changes were detected. 981102 i sas discovery occurred; configuration changes are pending. 981103 i sas discovery occurred; configuration changes are complete. 981104 w...

  • Page 135

    Table 25. Informational events (continued) event id notification type description 984512 i a component firmware update is needed but is prevented from running. 985001 i the metro mirror or global mirror background copy is complete. 985002 i the metro mirror or global mirror is ready to restart. 9850...

  • Page 136

    Table 25. Informational events (continued) event id notification type description 987400 w the node unexpectedly lost power but has now been restored to the cluster (system). 988100 w an overnight maintenance procedure has failed to complete. Resolve any hardware and configuration problems that you ...

  • Page 137

    Table 26. Error event ids and error codes (continued) event id notification type condition error code 009173 w the flashcopy feature has exceeded the amount that is licensed. 3032 009174 w the metro mirror or global mirror feature has exceeded the amount that is licensed. 3032 009175 w the usage for...

  • Page 138

    Table 26. Error event ids and error codes (continued) event id notification type condition error code 010025 w a disk i/o medium error has occurred. 1320 010026 w a suitable mdisk or drive for use as a quorum disk was not found. 1330 010027 w the quorum disk is not available. 1335 010028 w a control...

  • Page 139

    Table 26. Error event ids and error codes (continued) event id notification type condition error code 010060 e a solid-state drive (ssd) exceeded the warning temperature threshold. 1217 010061 e a solid-state drive (ssd) exceeded the offline temperature threshold. 1218 010062 e a drive exceeded the ...

  • Page 140

    Table 26. Error event ids and error codes (continued) event id notification type condition error code 010097 e a drive is reporting excessive errors. 1685 010098 w there are too many drives presented to a cluster (system). 1200 020001 e there are too many medium errors on the managed disk. 1610 0200...

  • Page 141

    Table 26. Error event ids and error codes (continued) event id notification type condition error code 045021 e a canister was removed from the system. 1036 045022 e a canister has been in a degraded state for too long and cannot be recovered. 1034 045023 e a canister is encountering communication pr...

  • Page 142

    Table 26. Error event ids and error codes (continued) event id notification type condition error code 045058 e an enclosure battery is at end of life. 1113 045062 w an enclosure battery conditioning is required but not possible. 1131 045063 e there was an enclosure battery communications error. 1116...

  • Page 143

    Table 26. Error event ids and error codes (continued) event id notification type condition error code 062001 w unable to mirror medium error during volume copy synchronization 1950 062002 w the mirrored volume is offline because the data cannot be synchronized. 1870 062003 w the repair process for t...

  • Page 144

    Table 26. Error event ids and error codes (continued) event id notification type condition error code 071510 e detected memory size does not match the expected memory size. 1032 071523 w the internal disk file system of the node is damaged. 1187 071524 e unable to update the bios settings. 1034 0715...

  • Page 145

    Table 26. Error event ids and error codes (continued) event id notification type condition error code 074001 w unable to determine the vital product data (vpd) for an fru. This is probably because a new fru has been installed and the software does not recognize that fru. The cluster (system) continu...

  • Page 146

    Node error code overview node error codes describe failure that relate to a specific node canister. Because node errors are specific to a node, for example, memory has failed, the errors are only reported on that node. However, some of the conditions that the node detects relate to the shared compon...

  • Page 147

    Table 27. Message classification number range (continued) message classification range error codes when recovering a clustered system 920, 990 node errors 500 incorrect enclosure explanation: the node canister has saved cluster information, which indicates that the canister is now located in a diffe...

  • Page 148

    A replacement enclosure chassis, and restart the hardware remove and replace control enclosure chassis procedure. 3. If this action does not resolve the issue, contact ibm technical support. They will work with you to ensure that the cluster state data is not lost while resolving the problem. Possib...

  • Page 149

    Install the second node canister in this enclosure. Restart the node canister. The two node canister should show node error 504 and the actions for that error should be followed. 3. If you are sure this node canister came from the enclosure that is being replaced, and that the original partner canis...

  • Page 150

    User response: follow troubleshooting procedures to reload the software. 1. Follow the procedures to rescue the software of a node from another node. 2. If the rescue node does not succeed, use the hardware remove and replace procedures for the node canister. Possible cause-frus or other: v node can...

  • Page 151

    Diagnostics the system provides to diagnose problems on sas cables and expansion enclosures. 4. If a quorum disk on an external storage system is shown as missing, find the storage control and confirm that the lun is available, check the fibre channel connections between the storage controller and t...

  • Page 152

    The node error does not persist across restarts of the node software and operating system. User response: follow troubleshooting procedures to reload the software: 1. Get a support package (snap), including dumps, from the node using the management gui or the service assistant. 2. If more than one n...

  • Page 153

    4. If all nodes have either node error 578 or 550, follow the cluster recovery procedures. 5. Attempt to determine what caused the nodes to shut down. Possible cause-frus or other: v none 671 the available battery charge is not enough to allow the node canister to start. Two batteries are charging. ...

  • Page 154

    Either because of a service assistant user action or because the node was deleted from the cluster. User response: when it is no longer necessary to hold the node in the service state, exit the service state to allow the node to run: 1. Use the service assistant action to release the service state. ...

  • Page 155

    User response: follow troubleshooting procedures to fix the hardware: 1. Determine status of other node. 2. Restart or replace the node if it has failed (should be node error on partner). 860 the fibre channel network fabric is too large. Explanation: this is a non-critical node error. The node will...

  • Page 156

    140 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 157: Appendix. Accessibility

    Appendix. Accessibility accessibility features help a user who has a physical disability, such as restricted mobility or limited vision, to use software products successfully. Features this list includes the major accessibility features in the management gui: v you can use screen-reader software and...

  • Page 158

    – press tab to navigate to the magnifying glass icon in the filter pane and press enter. – type the filter text. – press tab to navigate to the red x icon and press enter to reset the filter. V for information areas: – press tab to navigate to information areas. – press tab to navigate to the fields...

  • Page 159: Notices

    Notices this information was developed for products and services offered in the u.S.A. Ibm may not offer the products, services, or features discussed in this document in other countries. Consult your local ibm representative for information on the products and services currently available in your a...

  • Page 160

    Ibm may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs ...

  • Page 161

    Programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. Ibm, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs...

  • Page 162

    Operation of this equipment in a residential area is likely to cause harmful interference, in which case the user will be required to correct the interference at his own expense. Properly shielded and grounded cables and connectors must be used in order to meet fcc emission limits. Ibm is not respon...

  • Page 163

    Ibm technical regulations, department m456 ibm-allee 1, 71137 ehningen, germany tel: +49 7032 15-2937 e-mail: mailto: tjahn @ de.Ibm.Com germany electromagnetic compatibility directive deutschsprachiger eu hinweis: hinweis für geräte der klasse a eu-richtlinie zur elektromagnetischen verträglichkeit...

  • Page 164

    Generelle informationen: das gerät erfüllt die schutzanforderungen nach en 55024 und en 55022 klasse a. Japan vcci council class a statement people's republic of china class a electronic emission statement international electrotechnical commission (iec) statement this product has been designed and b...

  • Page 165

    Russia electromagnetic interference (emi) class a statement taiwan class a compliance statement european contact information this topic contains the product service contact information for europe. European community contact: ibm technical regulations pascalstr. 100, stuttgart, germany 70569 tele: 00...

  • Page 166

    150 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 167: Index

    Index numerics 2.5" drive assembly replacing 98 3.5" drive assembly replacing 96 a about this document sending comments xvi accessibility keyboard 141 repeat rate up and down buttons 141 shortcut keys 141 accessing canisters ethernet cable 59 cluster (system) cli 33 management gui 30 publications 14...

  • Page 168

    End cap indicators 4 environmental notices ix error control enclosure 45 expansion enclosure 45 node canister 44 not detected 45 sas cabling 44 usb key 46 error codes 120 understanding 116 error event ids 120 error events 113 errors logs describing the fields 114 error events 113 managing 114 unders...

  • Page 169

    Parts (continued) replacing overview 79 preparing 79 passwords best practices 19 people's republic of china, electronic emission statement 148 performing node rescue 62 ports ethernet 11 port names, worldwide 10 port numbers, fibre channel 10 sas 13, 16 post (power-on self-test) 115 power management...

  • Page 170

    U understanding clustered-system recovery codes 130 error codes 116 event log 113 united kingdom electronic emission notice 148 usb key detection error 46 using 34 when to use 34 usb ports 11 using gui interfaces 29 initialization tool 35 initialization tool interface 34 management gui 29 service as...

  • Page 172

    Printed in usa gc27-2291-02.