IBM Storwize V7000 Troubleshooting And Maintenance Manual

Manual is about: Flex System Storage Node

Summary of Storwize V7000

  • Page 1

    Ibm storwize v7000 troubleshooting, recovery, and maintenance guide gc27-2291-05.

  • Page 2

    Note before using this information and the product it supports, read the general information in “notices” on page 161, the information in the “safety and environmental notices” on page iii, as well as the information in the ibm environmental notices and user guide , which is provided on a dvd. This ...

  • Page 3

    Safety and environmental notices review the safety notices, environmental notices, and electronic emission notices for ibm ® storwize ® v7000 before you install and use the product. Here are examples of a caution and a danger notice: caution: a caution notice indicates the presence of a hazard that ...

  • Page 4

    The following notices and statements are used in ibm documents. They are listed in order of decreasing severity of potential hazards. Danger notice definition a special note that emphasize a situation that is potentially lethal or extremely hazardous to people. Caution notice definition a special no...

  • Page 5

    Caution: electrical current from power, telephone, and communication cables can be hazardous. To avoid personal injury or equipment damage, disconnect the attached power cords, telecommunication systems, networks, and modems before you open the machine covers, unless instructed otherwise in the inst...

  • Page 6

    Caution: removing components from the upper positions in the rack cabinet improves rack stability during a relocation. Follow these general guidelines whenever you relocate a populated rack cabinet within a room or building. V reduce the weight of the rack cabinet by removing equipment starting at t...

  • Page 7

    Caution: do not place any object on top of a rack-mounted device unless that rack-mounted device is intended for use as a shelf. (r008) caution: if the rack is designed to be coupled to another rack only the same model rack should be coupled together with another same model rack. (r009) danger notic...

  • Page 8

    Danger when working on or around the system, observe the following precautions: electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v if ibm supplied a power cord(s), connect power to this unit only with the ibm provided power cord. ...

  • Page 9

    Observe the following precautions when working on or around your it rack system: v heavy equipment–personal injury or equipment damage might result if mishandled. V always lower the leveling pads on the rack cabinet. V always install stabilizer brackets on the rack cabinet. V to avoid hazardous cond...

  • Page 10

    Danger main protective earth (ground): this symbol is marked on the frame of the rack. The protective earthing conductors should be terminated at that point. A recognized or certified closed loop connector (ring terminal) should be used and secured to the frame with a lock washer using a bolt or stu...

  • Page 11

    V do not wear loose clothing that can be trapped in the moving parts of a device. Ensure that your sleeves are fastened or rolled up above your elbows. If your hair is long, fasten it. V insert the ends of your necktie or scarf inside clothing or fasten it with a nonconducting clip, approximately 8 ...

  • Page 12

    The ibm systems environmental notices and user guide (ftp:// public.Dhe.Ibm.Com/systems/support/warranty/envnotices/ environmental_notices_and_user_guide.Pdf), z125-5823 document includes statements on limitations, product information, product recycling and disposal, battery information, flat panel ...

  • Page 13

    About this guide this guide describes how to service, maintain, and troubleshoot the ibm storwize v7000. The chapters that follow introduce you to the hardware components and to the tools that assist you in troubleshooting and servicing the storwize v7000, such as the management gui and the service ...

  • Page 14

    Each of the pdf publications in the table 2 is also available in the information center by clicking the number in the “order number” column: table 2. Storwize v7000 library title description order number ibm storwize v7000 quick installation guide this guide provides instructions for unpacking your ...

  • Page 15

    Table 2. Storwize v7000 library (continued) title description order number ibm license agreement for machine code this multilingual guide contains the license agreement for machine code for the storwize v7000 product. Sc28-6872 (contains z125-5468) ibm documentation and related websites table 3 list...

  • Page 16

    Www.Ibm.Com/e-business/linkweb/publications/servlet/pbi.Wss related websites the following websites provide information about storwize v7000 or related products or technologies: type of information website storwize v7000 support www.Ibm.Com/storage/support/storwize/v7000 technical support for ibm st...

  • Page 17

    Table 4. Ibm websites for help, services, and information (continued) website address support for ibm system storage and ibm totalstorage products www.Ibm.Com/storage/support/ note: available services, telephone numbers, and web links are subject to change without notice. Help and service before cal...

  • Page 18

    V check all cables to make sure that they are connected. V check all power switches to make sure that the system and optional devices are turned on. V use the troubleshooting information in your system documentation. The troubleshooting section of the information center contains procedures to help y...

  • Page 19

    Chapter 1. Storwize v7000 hardware components a storwize v7000 system consists of one or more machine type 2076 rack-mounted enclosures. There are several model types. The main differences among the model types are the following items: v the number of drives that an enclosure can hold. Drives are lo...

  • Page 20

    V the number of ports at the rear of the enclosure. Control enclosures have ethernet ports, fibre channel ports, and usb ports. Expansion enclosures do not have any of these ports. Components in the front of the enclosure this topic describes the components in the front of the enclosure. Drives an e...

  • Page 21

    1 fault led 2 activity led table 5 shows the status descriptions for the two leds. Table 5. Drive leds name description color activity indicates if the drive is ready or active. V if the led is on, the drive is ready to be used. V if the led is off, the drive is not ready. V if the led is flashing, ...

  • Page 22

    Enclosure end cap indicators this topic describes the indicators on the enclosure end cap. Figure 5 shows where the end caps are located on the front of an enclosure with 12 drives. The end caps are located in the same position for an enclosure with 24 drives. V 1 left end cap v 2 drives v 3 right e...

  • Page 23

    Table 6. Led descriptions name description color symbol power 1 the power led is the upper led. When the green led is lit, it indicates that the main power is available to the enclosure green fault 2 the fault led is the middle led. When the amber led is lit, it indicates that one of the enclosure c...

  • Page 24

    1 power supply unit 1 2 power supply unit 2 3 canister 1 4 canister 2 power supply unit and battery for the control enclosure the control enclosure contains two power supply units, each with an integrated battery. The two power supply units in the enclosure are installed with one unit top side up an...

  • Page 25

    Table 7 identifies the leds in the rear of the control enclosure. Table 7. Power supply unit leds in the rear of the control enclosure name color symbol ac power failure amber power supply ok green fan failure amber dc power failure amber battery failure amber + - battery state green + - see “proced...

  • Page 26

    There is a power switch on each of the power supply units. The switch must be on for the power supply unit to be operational. If the power switches are turned off, the power supply units stop providing power to the system. Figure 11 shows the locations of the leds 1 in the rear of the power supply u...

  • Page 27

    Each node canister has four fibre channel ports located on the left side of the canister as shown in figure 12. The ports are in two rows of two ports. The ports are numbered 1 - 4 from left to right and top to bottom. Note: the reference to the left and right locations applies to canister 1, which ...

  • Page 28

    Table 9. Fibre channel port led locations on canister 1 associated port led location led status port 3 3 first led between ports 1 and 3 1 speed port 1 1 second led between ports 1 and 3 2 speed port 3 3 third led between ports 1 and 3 3 link port 1 1 fourth led between ports 1 and 3 4 link port 4 4...

  • Page 29

    Note: the reference to the left and right locations applies to canister 1, which is the upper canister. The port locations are inverted for canister 2, which is the lower canister. The usb ports have no indicators. Ethernet ports and indicators ethernet ports are located side by side on the rear of ...

  • Page 30

    Table 11 provides a description of the two leds. Table 11. 1 gbps ethernet port leds name description color link speed (led on right of upper canister) the led is on when there is a link connection; otherwise, the led is off. Green activity (led on left of upper canister) the led is flashing when th...

  • Page 31

    Table 12. 10 gbps ethernet port leds name symbol description color activity tx/rx the led is flashing when there is activity on the link; otherwise, the led is off. Green link lnk the led is on when there is a link connection; otherwise, the led is off. Amber node canister sas ports and indicators t...

  • Page 32

    Node canister leds each node canister has three leds that provide status and identification for the node canister. The three leds are located in a horizontal row near the upper right of the canister 1 . Figure 18 shows the rear view of the node canister leds. Note: the reference to the left and righ...

  • Page 33

    Table 14. Node canister leds (continued) name description color symbol fault indicates if a fault is present and identifies which canister. V the on status indicates that the node is in service state or an error exists that might be preventing the code from starting. Do not assume that this status i...

  • Page 34

    The sas ports are numbered 1 on the left and 2 on the right as shown in figure 19. Use of port 1 is required. Use of port 2 is optional. Each port connects four data channels. Note: the reference to the left and right locations applies to canister 1, which is the upper canister. The port locations a...

  • Page 35

    Table 16. Expansion canister leds name description color symbol status indicates if the canister is active. V if the led is on, the canister is active. V if the led is off, the canister is not active. V if the led is flashing, there is a vital product data (vpd) error. Green fault indicates if a fau...

  • Page 36

    18 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 37

    Chapter 2. Best practices for troubleshooting taking advantage of certain configuration options, and ensuring vital system access information has been recorded, makes the process of troubleshooting easier. Record access information it is important that anyone who has responsibility for managing the ...

  • Page 38

    Table 17. Access information for your system (continued) item value notes control enclosure service ip address: node canister 1 control enclosure service ip address: node canister 2 the control enclosure superuser password (the default is passw0rd ) follow power management procedures access to your ...

  • Page 39

    Hardware replacement is detected. This mechanism is called call home. When this event is received, ibm automatically opens a problem report, and if appropriate, contacts you to verify if replacement parts are required. If you set up call home to ibm, ensure that the contact details that you configur...

  • Page 40

    If there are a number of unfixed alerts, fixing any one alert might become more difficult because of the effects of the other alerts. Keep your software up to date check for new code releases and update your code on a regular basis. This can be done using the management gui or check the ibm support ...

  • Page 41

    Support personnel also ask for your customer number, machine location, contact details, and the details of the problem. Chapter 2. Best practices for troubleshooting 23.

  • Page 42

    24 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 43

    Chapter 3. Understanding the storwize v7000 battery operation for the control enclosure storwize v7000 node canisters cache volume data and hold state information in volatile memory. If the power fails, the cache and state data is written to a local solid-state drive (ssd) in the canister. The batte...

  • Page 44

    Completed charging, then the system starts in service state and does not permit i/o operations to be restarted until the batteries are half charged. The recharging takes approximately 30 minutes. In a system with a failed battery, an ac power failure causes both canisters to save critical data and c...

  • Page 45

    V a battery has been powered on for three months without a maintenance discharge. V a battery has provided protection for saving critical data at least twice. V a battery has provided protection for at least 10 brown outs, which lasted up to 10 seconds each. A maintenance discharge takes approximate...

  • Page 46

    28 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 47

    Chapter 4. Understanding the medium errors and bad blocks a storage system returns a medium error response to a host when it is unable to successfully read a block. The storwize v7000 response to a host read follows this behavior. The volume virtualization that is provided extends the time when a me...

  • Page 48

    Table 18. Bad block errors (continued) error code description 1225 the system has failed to create a bad block because the system already has the maximum number of allowed bad blocks. The recommended actions for these alerts guide you in correcting the situation. Clear bad blocks by deallocating the...

  • Page 49

    Chapter 5. Storwize v7000 user interfaces for servicing your system storwize v7000 provides a number of user interfaces to troubleshoot, recover, or maintain your system. The interfaces provide various sets of facilities to help resolve situations that you might encounter. The interfaces for servici...

  • Page 50

    V mark an event as fixed. V filter the entries to show them by specific minutes, hours, or dates. V reset the date filter. V view the properties. Some events require a certain number of occurrences in 25 hours before they are displayed as unfixed. If they do not reach this threshold in 25 hours, the...

  • Page 51

    About this task you must use a supported web browser. Verify that you are using a supported web browser from the following website: www.Ibm.Com/storage/support/storwize/v7000 you can use the management gui to manage your system as soon as you have created a clustered system. Procedure 1. Start a sup...

  • Page 52

    The node canister might be in service state because it has a hardware issue, has corrupted data, or has lost its configuration data. Use the service assistant in the following situations: v when you cannot access the system from the management gui and you cannot access the storage storwize v7000 to ...

  • Page 53

    Procedure 1. Start a supported web browser and point your web browser to serviceaddress >/service for the node canister that you want to work on. For example, if you set a service address of 11.22.33.44 for a node canister, point your browser to 11.22.33.44/service. If you are unable to connect to t...

  • Page 54

    Service command-line interface use the service command-line interface (cli) to manage a node canister in a control enclosure using the task commands and information commands. For a full description of the commands and how to start an ssh command-line session, see the “command-line interface” topic i...

  • Page 55

    About this task when a usb flash drive is plugged into a node canister, the node canister code searches for a text file named satask.Txt in the root directory. If the code finds the file, it attempts to run a command that is specified in the file. When the command completes, a file called satask_res...

  • Page 56

    The initialization tool is available on the usb flash drive that is shipped with the control enclosures. The name of the application file is inittool.Exe . If you cannot locate the usb flash drive, you can download the application from the support website (search for initialization tool): www.Ibm.Co...

  • Page 57

    Satask chserviceip -default -resetpassword parameters -serviceip (optional) the ipv4 address for the service assistant. -gw (optional) the ipv4 gateway for the service assistant. -mask (optional) the ipv4 subnet for the service assistant. -serviceip_6 (optional) the ipv6 address for the service assi...

  • Page 58

    Description this command resets the service assistant password to the default value passw0rd . If the node canister is active in a system, the superuser password for the system is reset; otherwise, the superuser password is reset on the node canister. If the node canister becomes active in a system,...

  • Page 59

    Create cluster command use this command to create a storage system. Syntax satask mkcluster -clusterip ipv4 -gw ipv4 -mask ipv4 -name cluster_name satask mkcluster -clusterip_6 ipv6 -gw_6 ipv6 -prefix_6 int -name cluster_name parameters -clusterip (optional) the ipv4 address for ethernet port 1 on t...

  • Page 60

    42 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 61

    Chapter 6. Resolving a problem described here are some procedures to help resolve fault conditions that might exist on your system and which assume a basic understanding of the storwize v7000 system concepts. The following procedures are often used to find and resolve problems: v procedures that inv...

  • Page 62

    When you have logged on, select monitoring > events . Depending on how you choose to filter alerts, you might see only the alerts that require attention, alerts and messages that are not fixed, or all event types whether they are fixed or unfixed. Select the recommended alert, or any other alert, an...

  • Page 63

    Page 53; otherwise, go to “procedure: getting node canister and system information using a usb flash drive” on page 53 and obtain the state of each of the node canisters from the data that is returned. If there is not a node canister with a state of active, resolve the reason why it is not in active...

  • Page 64

    Problem: cannot initialize or create a system this topic helps if your attempt to create a system has failed. The failure is reported regardless of the method that you used to create a clustered storage system: v usb flash drive v management console v service assistant v service command line the cre...

  • Page 65

    1. Point your browser at the /service directory of the management ip address of the system. If your management ip address is 11.22.33.44 , point your web browser to 11.22.33.44/service . 2. Log into the service assistant. 3. The service assistant home page lists the node canister that can communicat...

  • Page 66

    V you cannot connect to the service assistant if the node canister is not able to start the storwize v7000 code. To verify that the leds indicate that the code is active, see “procedure: understanding the system status using the leds” on page 54. V the service assistant is configured on ethernet por...

  • Page 67

    A number of different conditions are reported as location errors. Each condition is indicated by different node error. To find out how to resolve the node error, go to “procedure: fixing node errors” on page 61. Be aware that after a node canister has been used in a system, the node canister must no...

  • Page 68

    V verify that the sas cabling to the expansion enclosure is correctly installed. To review the requirements, see “problem: sas cabling not valid” on page 49. Problem: control enclosure not detected if a control enclosure is not detected by the system, this procedure may help you resolve the problem....

  • Page 69

    You might encounter this problem during initial setup or when running commands if you are using your own usb flash drive rather than the usb flash drive that was packaged with your order. If you encounter this situation, verify the following items: v that an satask_result.Html file is in the root di...

  • Page 70

    Procedure: identifying which enclosure or canister to service use this procedure to identify which enclosure or canister must be serviced. About this task procedure use the following options to identify an enclosure. An enclosure is identified by its id and serial number. V the id is shown on the lc...

  • Page 71

    Procedure use the following management gui functions to find a more detailed status: v monitoring > system details v pools > mdisks by pools v volumes > volumes v monitoring > events , and then use the filtering options to display alerts, messages, or event types. Procedure: getting node canister an...

  • Page 72

    Satask_result.Html file. Delete this file if you no longer want the previous output. Procedure 1. Insert the usb flash drive in one of the usb ports of the node canister from which you want to collect data. 2. The node canister fault led flashes while information is collected and written to the usb ...

  • Page 73

    Table 20. Power-supply unit leds power supply ok ac failure fan failure dc failure status action on on on on communication failure between the power supply unit and the enclosure chassis replace the power supply unit. If failure is still present, replace the enclosure chassis. Off off off off no ac ...

  • Page 74

    Table 20. Power-supply unit leds (continued) power supply ok ac failure fan failure dc failure status action off on off on no ac power to this power supply 1. Check that the switch on the power supply unit is on. 2. Check that the ac power is on. 3. Reseat and replace the power cable. On off off off...

  • Page 75

    Table 21. Power leds (continued) power led status description slow flashing (1 hz) power is available, but the canister is in standby mode. Try to start the node canister by reseating it. Go to “procedure: reseating a node canister” on page 65. Fast flashing (2 hz) the canister is running its power-...

  • Page 76

    Table 22. System status and fault leds (continued) system status led fault led status action on on code is active and is in starting state. However, it does not have enough resources to form the clustered system. The node canister cannot become active in a clustered system. There are no detected pro...

  • Page 77

    Table 23. Control enclosure battery leds (continued) battery good + - battery fault + - description action off on nonrecoverable battery fault. Replace the battery. If replacing the battery does not fix the issue, replace the power supply unit. Off flashing recoverable battery fault. None flashing f...

  • Page 78

    About this task attention: do not remove the system data from a node unless instructed to do so by a service procedure. Do not use this procedure to remove the system data from the only online node canister in a system. If the system data is removed or lost from all node canisters in the system, the...

  • Page 79

    Procedure: fixing node errors to fix node errors that are detected by node canisters in your system, use this procedure. About this task node errors are reported in the service assistant when a node detects erroneous conditions in a node canister. Procedure 1. Carry out “procedure: getting node cani...

  • Page 80

    V use the service assistant when you can connect to the service assistant on either the node canister that you want to configure or on a node canister that can connect to the node canister that you want to configure: 1. Make the node canister that you want to configure the current node. 2. Select ch...

  • Page 81

    6. The system detects the usb flash drive, reads the satask.Txt file, runs the command, and writes the results to the usb flash drive. The satask.Txt file is deleted after the command is run. 7. Wait for the fault led on the node canister to stop flashing before removing the usb flash drive. 8. Remo...

  • Page 82

    6. Point a supported browser to the management ip address that you specified to start the management gui. The management gui logon panel is displayed. 7. Log on as superuser. Use passw0rd for the password. 8. Follow the on-screen instructions. Results attention: without a usb flash drive to service ...

  • Page 83

    7. Set the service address of the canister to one that can be accessed on the network as soon as possible. 8. Wait for the action to complete. 9. Disconnect your personal computer. 10. Reconnect the node canister to the ethernet network. Procedure: reseating a node canister use this procedure to res...

  • Page 84

    2. Shut down the system by using the management gui. Click monitoring > system details . From the actions menu, select shut down system . 3. Wait for the power led on both node canisters in all control enclosures to start flashing, which indicates that the shutdown operation has completed. The follo...

  • Page 85

    Logs option from the navigation. You can collect a support package or copy an individual file from the node canister. Follow the instructions to collect the information. Procedure: rescuing node canister software from another node (node rescue) use this procedure to perform a node rescue. About this...

  • Page 86

    A. Verify that the host adapter is in good state. You can unload and load the device driver and see the operating system utilities to verify that the device driver is installed, loaded, and operating correctly. San problem determination about this task san failures might cause storwize v7000 volumes...

  • Page 87

    Replace is a longwave sfp transceiver, for example, you must provide a suitable replacement. Removing the wrong sfp transceiver could result in loss of data access. 4. Perform the fibre channel switch service procedures for a failing fibre channel link. This might involve replacing the sfp transceiv...

  • Page 88

    70 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 89

    Chapter 7. Recovery procedures this topic describes these recovery procedures: recover a system and back up and restore a system configuration. Recover system procedure the recover system procedure recovers the entire system if the block cluster state has been lost from all nodes.The recover system ...

  • Page 90

    When to run the recover system procedure attempt a recover procedure only after a complete and thorough investigation of the cause of the system failure. Attempt to resolve those issues by using other service procedures. Attention: if you experience failures at any time while running the recover sys...

  • Page 91

    V quorum drive identifiers in the format: enclosure_serial >: drive slot id >[] (7 characters, colon, 1 or 2 numbers, open square bracket, 22 characters, close square bracket), for example, 01234a9:21[11s1234567890123456789] v quorum mdisk identifier in the format: wwpn/lun (16 hexadecimal digits fo...

  • Page 92

    Data from those nodes. This action acknowledges the data loss and puts the nodes into the required candidate state. Removing system information for node canisters with error code 550 or error code 578 using the service assistant the system recovery procedure works only when all node canisters are in...

  • Page 93

    Attention: this service action has serious implications if not performed properly. If at any time an error is encountered not covered by this procedure, stop and call ibm support. Note: the web browser must not block pop-up windows, otherwise progress windows cannot open. Run the recovery from any n...

  • Page 94

    T3 recovery completed with errors: one or more of the volumes are offline because there was fast write data in the cache. To bring the volumes online, see “recovering from offline vdisks using the cli” for details. V t3 failed call ibm support. Do not attempt any further action. Verify the environme...

  • Page 95

    What to check after running the system recovery several tasks must be performed before you use the system. The recovery procedure performs a recreation of the old system from the quorum data. However, some things cannot be restored, such as cached data or system data managing in-flight i/o. This lat...

  • Page 96

    Backing up and restoring the system configuration you can back up and restore the configuration data for the system after preliminary tasks are completed. Configuration data for the system provides information about your system and the objects that are defined in it. The backup and restore functions...

  • Page 97

    V for each node entry, make a note of the value of following properties; io_group_id, canister_id, enclosure_serial_number. V use the cli sainfo lsservicenodes command and the adata to determine which node canisters previously belonged in each io group. Restoring the system configuration should be p...

  • Page 98

    Configuration> procedure also known as a tier 4 (t4) recovery. Both of these procedures require a recent backup of the configuration data. Perform the following steps to back up your configuration data: procedure 1. Back up all of the application data that you stored on your volumes using your prefe...

  • Page 99

    The cluster_ip is the ip address or dns name of the system and offclusterstorage is the location where you want to store the backup files. Tip: to maintain controlled access to your configuration data, copy the backup files to a location that is password-protected. Restoring the system configuration...

  • Page 100

    All nodes previously in this system must have a node status of candidate and have no errors listed against them. Note: a node that is powered off might not show up in this list of nodes for the system. Diagnose hardware problems directly on the node using the service assistant ip address and by phys...

  • Page 101

    The file can be either a local copy of the configuration backup xml file that you saved when backing up the configuration or an up-to-date file on one of the nodes. Configuration data is automatically backed up daily at 01:00 system time on the configuration node. Download and check the configuratio...

  • Page 102

    This cli command creates a log file in the /tmp directory of the configuration node. The name of the log file is svc.Config.Restore.Execute.Log . 17. Issue the following command to copy the log file to another server that is accessible to the system: pscp superuser@ cluster_ip :/tmp/svc.Config.Resto...

  • Page 103

    Chapter 8. Replacing parts you can remove and replace customer-replaceable units (crus) in control enclosures or expansion enclosures. Attention: if your system is powered on and performing i/o operations, go to the management gui and follow the fix procedures. Performing the replacement actions wit...

  • Page 104

    Attention: do not replace one type of node canister with another type. For example, do not replace a model 2076-112 node canister with a model 2076-312 node canister. Be aware of the following canister led states: v if both the power led and system status led are on, do not remove a node canister un...

  • Page 105

    7. Pull out the handle to its full extension. 8. Grasp canister and pull it out. 9. Insert the new canister into the slot with the handle pointing towards the center of the slot. Insert the unit in the same orientation as the one that you removed. 10. Push the canister back into the slot until the h...

  • Page 106

    Be careful when you are replacing the hardware components that are located in the back of the system that you do not inadvertently disturb or remove any cables that you are not instructed to remove. Be aware of the following canister led states: v if the power led is on, do not remove an expansion c...

  • Page 107

    6. Pull out the handle to its full extension. 7. Grasp canister and pull it out. 8. Insert the new canister into the slot with the handle pointing towards the center of the slot. Insert the unit in the same orientation as the one that you removed. 9. Push the canister back into the slot until the ha...

  • Page 108

    Caution: some laser products contain an embedded class 3a or class 3b laser diode. Note the following information: laser radiation when open. Do not stare into the beam, do not view directly with optical instruments, and avoid direct exposure to the beam. (c030) about this task perform the following...

  • Page 109

    Replacing a power supply unit for a control enclosure you can replace either of the two 764 watt hot-swap redundant power supplies in the control enclosure. These redundant power supplies operate in parallel, one continuing to power the canister if the other fails. Chapter 8. Replacing parts 91.

  • Page 110

    Before you begin danger when working on or around the system, observe the following precautions: electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v if ibm supplied a power cord(s), connect power to this unit only with the ibm prov...

  • Page 111

    Attention: a powered-on enclosure must not have a power supply removed for more than five minutes because the cooling does not function correctly with an empty slot. Ensure that you have read and understood all these instructions and have the replacement available, and unpacked, before you remove th...

  • Page 112

    B. Grip the handle to pull the power supply out of the enclosure as shown in figure 30. 6. Insert the replacement power supply unit into the enclosure with the handle pointing towards the center of the enclosure. Insert the unit in the same orientation as the one that you removed. Svc00633 figure 29...

  • Page 113

    7. Push the power supply unit back into the enclosure until the handle starts to move. 8. Finish inserting the power supply unit into the enclosure by closing the handle until the locking catch clicks into place. 9. Reattach the power cable and cable retention bracket. 10. Turn on the power switch t...

  • Page 114

    Before you begin danger when working on or around the system, observe the following precautions: electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v if ibm supplied a power cord(s), connect power to this unit only with the ibm prov...

  • Page 115

    Attention: a powered-on enclosure must not have a power supply removed for more than five minutes because the cooling does not function correctly with an empty slot. Ensure that you have read and understood all these instructions and have the replacement available, and unpacked, before you remove th...

  • Page 116

    B. Grip the handle to pull the power supply out of the enclosure as shown in figure 32. 6. Insert the replacement power supply unit into the enclosure with the handle pointing towards the center of the enclosure. Insert the unit in the same orientation as the one that you removed. Svc00633 figure 31...

  • Page 117

    7. Push the power supply unit back into the enclosure until the handle starts to move. 8. Finish inserting the power supply unit in the enclosure by closing the handle until the locking catch clicks into place. 9. Reattach the power cable and cable retention bracket. 10. Turn on the power switch to ...

  • Page 118

    Before you begin danger when working on or around the system, observe the following precautions: electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v if ibm supplied a power cord(s), connect power to this unit only with the ibm prov...

  • Page 119

    Attention: if your system is powered on and performing i/o operations, go to the management gui and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or loss of access to data. Even though many of these procedures ar...

  • Page 120

    A. Press the catch to release the handle 1 . B. Lift the handle on the battery 2 . C. Lift the battery out of the power supply unit 3 . 4. Install the replacement battery. Attention: the replacement battery has protective end caps that must be removed prior to use. A. Remove the battery from the pac...

  • Page 121

    D. Place the replacement battery in the opening on top of the power supply in its proper orientation. E. Press the battery to seat the connector. F. Place the handle in its downward location 5. Push the power supply unit back into the enclosure until the handle starts to move. 6. Finish inserting th...

  • Page 122

    The drives can be distinguished from the blank carriers by the color-coded striping on the drive. The drives are marked with an orange striping. The blank carriers are marked with a blue striping. To replace the drive assembly or blank carrier, perform the following steps: procedure 1. Read the safe...

  • Page 123

    4. Pull out the drive. 5. Push the new drive back into the slot until the handle starts to move. 6. Finish inserting the drive by closing the handle until the locking catch clicks into place. Replacing a 2.5" drive assembly or blank carrier this topic describes how to remove a 2.5" drive assembly or...

  • Page 124

    3. Open the handle to the full extension. 4. Pull out the drive. 5. Push the new drive back into the slot until the handle starts to move. 6. Finish inserting the drive by closing the handle until the locking catch clicks into place. Svc00614 figure 36. Unlocking the 2.5" drive svc00615 figure 37. R...

  • Page 125

    Replacing enclosure end caps to replace enclosure end caps, use this procedure. About this task attention: the left end cap is printed with information that helps identify the enclosure. V machine type and model v enclosure serial number v its machine part number the information on the end cap shoul...

  • Page 126

    3. Plug the replacement cable into the specific port. 4. Ensure that the sas cable is fully inserted. A click is heard when the cable is successfully inserted. Replacing a control enclosure chassis this topic describes how to replace a control enclosure chassis. Before you begin note: ensure that yo...

  • Page 127

    Danger when working on or around the system, observe the following precautions: electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v if ibm supplied a power cord(s), connect power to this unit only with the ibm provided power cord. ...

  • Page 128

    Attention: perform this procedure only if instructed to do so by a service action or the ibm support center. If you have a single control enclosure, this procedure requires that you shut down your system to replace the control enclosure. If you have more than one control enclosure, you can keep part...

  • Page 129

    Stopsystem -force -node node id > c. Wait for the shutdown to complete. 5. Verify that it is safe to remove the power from the enclosure. For each of the canisters, verify the status of the system status led. If the led is lit on either of the canisters, do not continue because the system is still o...

  • Page 130

    24. Write the old enclosure machine type and model (mtm) and serial number on the repair identification (rid) tag that is supplied. Attach the tag to the left flange at the back of the enclosure. 25. Turn on the power to the enclosure using the switches on the power supply units. The node canisters ...

  • Page 131

    Verify that it is offline and managed and that the serial number is correct. 28. From the actions menu, select remove enclosure and confirm the action. The physical hardware has already been removed. You can ignore the messages about removing the hardware. Verify that the original enclosure is no lo...

  • Page 132

    Danger when working on or around the system, observe the following precautions: electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v if ibm supplied a power cord(s), connect power to this unit only with the ibm provided power cord. ...

  • Page 133

    Attention: if your system is powered on and performing i/o operations, go the management gui and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or access to data. Even though many of these procedures are hot-swapp...

  • Page 134

    14. Replace the end caps. Use the new right end cap and use the left end cap that you removed in step 9 on page 115. Using the left end cap that you removed preserves the model and serial number identification. 15. Reinstall the drives in the new enclosure. The drives must be inserted back into the ...

  • Page 135

    4. Working from the front of the rack cabinet, remove the clamping screw from the rail assembly on both sides of the rack cabinet. 5. From one side of the rack cabinet, grip the rail and slide the rail pieces together to shorten the rail. 6. Disengage the rail location pins 2 . 7. From the other sid...

  • Page 136

    Table 24. Replaceable units part part number applicable models fru or customer replaced 2u24 enclosure chassis (empty chassis) 85y5897 124, 224, 324 fru 2u12 enclosure chassis (empty chassis) 85y5896 112, 212, 312 fru type 100 node canister 85y5899 112, 124 customer replaced type 300 node canister w...

  • Page 137

    Table 24. Replaceable units (continued) part part number applicable models fru or customer replaced 2.8 m power cord (group 1 including the united states) 39m5081 all customer replaced 2.8 m power cord (argentina) 39m5068 all customer replaced 2.8 m power cord (china) 39m5206 all customer replaced 2...

  • Page 138

    Table 24. Replaceable units (continued) part part number applicable models fru or customer replaced 3.5" 7.2 k nearline sas - 3 tb in carrier assembly 85y6187 112, 212, 312 customer replaced blank 2.5" carrier 85y5893 124, 224, 324 customer replaced blank 3.5" carrier 85y5894 112, 212, 312 customer ...

  • Page 139

    Chapter 9. Event reporting events that are detected are saved in an event log. As soon as an entry is made in this event log, the condition is analyzed. If any service activity is required, a notification is sent. Event reporting process the following methods are used to notify you and the ibm suppo...

  • Page 140

    Managing the event log the event log has a limited size. After it is full, newer entries replace entries that are no longer required. To avoid having a repeated event that fills the event log, some records in the event log refer to multiple occurrences of the same event. When event log entries are c...

  • Page 141

    Event notifications the storwize v7000 product can use simple network management protocol (snmp) traps, syslog messages, emails and call homes to notify you and ibm(r) remote technical support when significant events are detected. Any combination of these notification methods can be used simultaneou...

  • Page 142

    When the code is loaded, additional testing takes place, which ensures that all of the required hardware and code components are installed and functioning correctly. Understanding the error codes error codes are generated by the event-log analysis and system configuration code. Error codes help you ...

  • Page 143

    Table 27. Informational events (continued) event id notification type description 980343 w all ports in this host are now offline. 980349 i a node has been successfully added to the cluster (system). 980350 i the node is now a functional member of the cluster (system). 980351 i a noncritical hardwar...

  • Page 144

    Table 27. Informational events (continued) event id notification type description 981022 i managed disk offline imminent, offline prevention started 981025 i drive firmware download started 981026 i drive fpga download started 981101 i sas discovery occurred; no configuration changes were detected. ...

  • Page 145

    Table 27. Informational events (continued) event id notification type description 984509 i the component firmware update paused to allow the battery charging to finish. 984511 i the update for the component firmware paused because the system was put into maintenance mode. 984512 i a component firmwa...

  • Page 146

    Table 27. Informational events (continued) event id notification type description 986206 w a mirror disk repair is complete and the differences are marked as medium errors. 986207 i the mirror disk repair has been started. 986208 w a mirror copy repair, using the set medium error option, cannot comp...

  • Page 147

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 009052 w the following causes are possible: v the node is missing. V the node is no longer a functional member of the system. 1196 009053 e a node has been missing for 30 minutes. 1195 009100 w the ...

  • Page 148

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 009191 w a trial of a licensable feature will expire in 10 days. 3084 009192 w a trial of a licensable feature will expire in 15 days. 3085 009193 w a trial of a licensable feature will expire in 45...

  • Page 149

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 010030 e a managed disk error recovery procedure (erp) has occurred. The node or controller reported the following: v sense v key v code v qualifier 1370 010031 e one or more mdisks on a controller ...

  • Page 150

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 010067 w too many enclosures were presented to a cluster (system). 1200 010068 e the solid-state drive (ssd) format was corrupted. 1204 010069 e the block size for the solid-state drive (ssd) was in...

  • Page 151

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 029002 e the system failed to create a bad block because mdisk already has the maximum number of allowed bad blocks. 1226 029003 e the system failed to create a bad block because the clustered syste...

  • Page 152

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 045027 e the drive slot is not running at 6 gbps 1686 045028 e the drive slot is dropping frames. 1686 045029 e the drive is visible through only one sas port. 1686 045031 e the drive power control ...

  • Page 153

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 045066 e the fru identity of the enclosure is not valid. 1008 045067 w a new enclosure fru was detected and needs to be configured. 1041 045068 e the internal device on a node canister was excluded ...

  • Page 154

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 062001 w unable to mirror medium error during volume copy synchronization 1950 062002 w the mirrored volume is offline because the data cannot be synchronized. 1870 062003 w the repair process for t...

  • Page 155

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 073007 w there are fewer fibre channel ports operational than are configured. 1061 073305 w one or more fibre channel ports are running at a speed that is lower than the last saved speed. 1065 07331...

  • Page 156

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 073700 e fc adapter missing. 1045 073701 e fc adapter failed. 1046 073702 e fc adapter pci error. 1046 073703 e fc adapter degraded. 1045 073704 w fewer fibre channel ports operational. 1061 073705 ...

  • Page 157

    Table 28. Error event ids and error codes (continued) event id notification type condition error code 076403 e one of the two power supply units in the node is without power. 1097 076502 e degraded pcie lanes on a high-speed sas adapter. 1121 076503 e a pci bus error occurred on a high-speed sas ada...

  • Page 158

    Error occurs because part of the hardware has failed or the system detects that the scode is corrupt. If it is possible to communicate with the canister with a node error, an alert that describes the error is logged in the event log. If the system cannot communicate with the node canister, a node mi...

  • Page 159

    500 incorrect enclosure explanation: the node canister has saved cluster information, which indicates that the canister is now located in a different enclosure from where it was previously used. Using the node canister in this state might corrupt the data held on the enclosure drives. User response:...

  • Page 160

    Possible cause—frus or other: v enclosure midplane (100%) 504 no enclosure identity and partner node matches. Explanation: the enclosure vital product data indicates that the enclosure midplane has been replaced. This node canister and the other node canister in the enclosure were previously operati...

  • Page 161

    Relocate the nodes to the correct location. 1. Check the status of the other node in the enclosure. It should show node error 506. Unless it also shows error 507, check the errors on the other node and follow the corresponding procedures to resolve the errors. 2. If the other node in the enclosure i...

  • Page 162

    523 the internal disk file system is damaged. Explanation: the node startup procedures have found problems with the file system on the internal disk of the node. User response: follow troubleshooting procedures to reload the software. 1. Follow the procedures to rescue the software of a node from an...

  • Page 163

    If a lun on an external storage system is the missing quorum disk, it is listed it is listed as wwwwwwwwwwwwwwww/ll, where wwwwwwwwwwwwwwww is a worldwide port name (wwpn) on the storage system that contains the missing quorum disk and ll is the logical unit number (lun). User response: follow troub...

  • Page 164

    6. If you are unable to find a storwize v7000 node canister with the same wwnn as the node canister showing the error, use the san monitoring tools to determine whether there is another device on the san with the same wwnn. This device should not be using a wwnn assigned to a storwize v7000, so you ...

  • Page 165

    Possible cause—frus or other: v none 578 the state data was not saved following a power loss. Explanation: on startup, the node was unable to read its state data. When this happens, it expects to be automatically added back into a cluster. However, if it has not joined a cluster in 60 sec, it raises...

  • Page 166

    V if the error persists for more than an hour when the ambient temperature is normal, use the remove and replace procedures to replace the battery. Possible cause—frus or other cause v canister battery 654 the canister battery’s temperature is too high explanation: the canister battery’s temperature...

  • Page 167

    1. Wait for the node to automatically fix the error when sufficient charge becomes available. 2. If possible, determine why one battery is not charging. Use the battery status shown in the node canister hardware details and the indicator leds on the psus in the enclosure to diagnose the problem. If ...

  • Page 168

    1. If possible, this noncritical node error should be serviced using the management gui and running the recommended actions for the service error code. 2. As the adapter is located on the system board, replace the node canister using the remove and replace procedures. There are a number of possibili...

  • Page 169

    704 fewer fibre channel ports operational. Explanation: a fibre channel port that was previously operational is no longer operational. The physical link is down. This node error does not, in itself, stop the node canister becoming active in the system. However, the fibre channel network might be bei...

  • Page 170

    Active in a clustered system. A fibre channel i/o port might be established on either a fc platform port or an ethernet platform port using fiber channel over ethernet (fcoe). Data: three numeric values are listed: v the id of the first fc i/o port that does not have connectivity. This is a decimal ...

  • Page 171

    713 a sas adapter is degraded. Explanation: a sas adapter is degraded. The adapter is located on the node canister system board. Data: v a number indicating the adapter location. Location 0 indicates that the adapter integrated into the system board is being reported. User response: 1. If possible, ...

  • Page 172

    724 fewer ethernet ports active. Explanation: an ethernet port that was previously operational is no longer operational. The physical link is down. Data: three numeric values are listed: v the id of the first unexpected inactive port. This is a decimal number. V the ports that are expected to be act...

  • Page 173

    Because of a lack of cluster resources is reported on the node canister. Data: v a number indicating the adapter location. Location 0 indicates that the adapter integrated into the system board is being reported. User response: 1. If possible, this noncritical node error should be serviced using the...

  • Page 174

    860 fibre channel network fabric is too big. Explanation: the number of fibre channel (fc) logins made to the node canister exceeds the allowed limit. The node canister continues to operate, but only communicates with the logins made before the limit was reached. The order in which other devices log...

  • Page 175

    1189 the node is held in the service state. Explanation: the cluster is reporting that a node is not operational because of critical node error 690. See the details of node error 690 for more information. User response: see node error 690. 1202 a solid-state drive is missing from the configuration. ...

  • Page 176

    158 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 177

    Appendix. Accessibility features for ibm storwize v7000 accessibility features help users who have a disability, such as restricted mobility or limited vision, to use information technology products successfully. Accessibility features these are the major accessibility features in storwize v7000: v ...

  • Page 178

    160 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 179

    Notices this information was developed for products and services offered in the u.S.A. Ibm may not offer the products, services, or features discussed in this document in other countries. Consult your local ibm representative for information on the products and services currently available in your a...

  • Page 180

    Ibm may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs ...

  • Page 181

    This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to ibm, for the purposes of developing, using, marketing or distrib...

  • Page 182

    This device complies with part 15 of the fcc rules. Operation is subject to the following two conditions: (1) this device might not cause harmful interference, and (2) this device must accept any interference received, including interference that might cause undesired operation. Industry canada comp...

  • Page 183

    Um dieses sicherzustellen, sind die geräte wie in den handbüchern beschrieben zu installieren und zu betreiben. Des weiteren dürfen auch nur von der ibm empfohlene kabel angeschlossen werden. Ibm übernimmt keine verantwortung für die einhaltung der schutzanforderungen, wenn das produkt ohne zustimmu...

  • Page 184

    Taiwan class a compliance statement taiwan contact information this topic contains the product service contact information for taiwan. Ibm taiwan product service contact information: ibm taiwan corporation 3f, no 7, song ren rd., taipei taiwan tel: 0800-016-888 japan vcci council class a statement t...

  • Page 185

    This explains the jeita statement for greater than 20 a per phase. Korean communications commission class a statement this explains the korean communications commission (kcc) statement. Russia electromagnetic interference class a statement this statement explains the russia electromagnetic interfere...

  • Page 186

    168 storwize v7000: troubleshooting, recovery, and maintenance guide.

  • Page 188

    Printed in usa gc27-2291-05.