R/Evolution 2000 Series Troubleshooting Manual

Summary of 2000 Series

  • Page 1

    2000 series troubleshooting guide p/n 83-00004287-12 revision a may 2008.

  • Page 2

    Copyright protected material 2002-2008. All rights reserved. R/evolution and the r/evolution logo are trademarks of dot hill systems corp. All other trademarks and registered trademarks are proprietary to their respective owners. The material in this document is for information only and is subject t...

  • Page 3: Contents

    3 contents preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1. System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 architecture overview . . . . . . . ....

  • Page 4

    4 r/evolution 2000 series troubleshooting guide • may 2008 3. Troubleshooting using system leds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 led names and locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 using leds to check ...

  • Page 5

    Contents 5 resetting expander error counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 disabling or enabling a phy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 disabling or enabling phy isolation . . . . . . . . . . . . . . . . . . . . ....

  • Page 6

    6 r/evolution 2000 series troubleshooting guide • may 2008 cooling fan sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 temperature sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 power-and-cooli...

  • Page 7

    Contents 7 updating disk drive firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 removing and replacing a drive module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 replacing a drive module when the virtual disk is rebuilding . . . . . . . . . ....

  • Page 8

    8 r/evolution 2000 series troubleshooting guide • may 2008 set protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 show debug-log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

  • Page 9: Preface

    9 preface this guide describes how to diagnose and troubleshoot a r/evolution™ storage system, and how to identify, remove, and replace field-replaceable units (frus). It also describes critical, warning, and informational events that can occur during system operation. This guide applies to the foll...

  • Page 10

    10 r/evolution 2000 series troubleshooting guide • may 2008 typographic conventions related documentation typeface 1 1 the fonts used in your viewer might differ. Meaning examples aabbcc123 book title, new term, or emphasized word see the release notes. A virtual disk (vdisk) can .... You must .... ...

  • Page 11: System Architecture

    11 ch a p t e r 1 system architecture this chapter describes the r/evolution™ storage system architecture. Prior to troubleshooting any system, it is important to understand the architecture, including each of the system components, how they relate to each other, and how data passes through the syst...

  • Page 12

    12 r/evolution 2000 series troubleshooting guide • may 2008 frus include: ■ chassis-and-midplane. An enclosure’s 2u metal chassis and its midplane circuit board comprise a single fru. All other frus connect and interact through the midplane. ■ drive module. An enclosure can contain 12 sata or sas dr...

  • Page 13

    Chapter 1 system architecture 13 enclosure id display the enclosure id (eid) display provides a visual single-digit identifier for each enclosure in a storage system. The eid display is located on the left ear, as viewed from the front of the chassis. For a storage system that includes a controller ...

  • Page 14

    14 r/evolution 2000 series troubleshooting guide • may 2008 drive modules the drive module has a front bezel with a latch that is used to insert or remove the drive module. When any component of a drive module fails, the entire module is replaced. Each drive module is inserted into a drive slot (or ...

  • Page 15

    Chapter 1 system architecture 15 controller modules a controller module is a fru that contains two connected circuit boards: a raid i/o module and a host interface module (him). The raid i/o module is a hot-pluggable board that mates with the enclosure midplane and provides all raid controller funct...

  • Page 16

    16 r/evolution 2000 series troubleshooting guide • may 2008 power supply unit each 750-watt, ac power supply unit (psu) is auto-sensing and runs in a load- balanced configuration to ensure that the load is distributed evenly across both power supplies. Cooling fans the cooling fans are integrated in...

  • Page 17

    Chapter 1 system architecture 17 airflow is controlled and optimized over the power supply by using the power supply chassis as the air-duct for the power supply, ensuring that there are no dead air spaces in the power supply core and increasing the velocity flow (lfm) by controlling the cross secti...

  • Page 18

    18 r/evolution 2000 series troubleshooting guide • may 2008.

  • Page 19: Fault Isolation Methodology

    19 ch a p t e r 2 fault isolation methodology the r/evolution storage system provides many ways to isolate faults within the system. This chapter presents the basic methodology used to locate faults and the associated frus. The basic fault isolation steps are: ■ gather fault information ■ determine ...

  • Page 20

    20 r/evolution 2000 series troubleshooting guide • may 2008 use raidar to verify any faults found while viewing the leds. Raidar is also a good tool to use in determining where the fault is occurring if the leds cannot be viewed due to the location of the system. Raidar provides you with a visual re...

  • Page 21

    21 ch a p t e r 3 troubleshooting using system leds the first step in troubleshooting your storage system is to check the status of its leds. System leds can help you identify the fru that is experiencing a fault. This chapter includes the following topics: ■ “led names and locations” on page 21 ■ “...

  • Page 22

    22 r/evolution 2000 series troubleshooting guide • may 2008 figure 3-2 2730 controller module leds figure 3-3 2330 controller module leds figure 3-4 2530 controller module leds 10/100 base-t status activity dirty clean cache cli mui link speed link speed fc port 0 fc port 1 host link status host lin...

  • Page 23

    Chapter 3 troubleshooting using system leds 23 figure 3-5 expansion module leds figure 3-6 power-and-cooling module leds using leds to check system status check the enclosure status leds periodically or after you have received an error notification. If a yellow led is on, the enclosure has experienc...

  • Page 24

    24 r/evolution 2000 series troubleshooting guide • may 2008 using enclosure status leds during normal operation, the fru ok led is green and the other enclosure- status leds are off. If the fru ok led is off , the enclosure is not powered on. If the enclosure should be powered on, verify that its po...

  • Page 25

    Chapter 3 troubleshooting using system leds 25 using controller module host port leds during normal operation, when a controller module host port is connected to a data host, the port’s host link status led and host link activity led are green. For fc, if the link speed is set to 2 gbit/sec the host...

  • Page 26

    26 r/evolution 2000 series troubleshooting guide • may 2008 4. Move the sfp and cable to a port with a known good link status. This step isolates the problem to the external data path (sfp, host cable, and host- side devices) or to the controller module port. Is the host link status led on? ■ yes – ...

  • Page 27

    Chapter 3 troubleshooting using system leds 27 isolating a host-side connection fault on a sas storage system during normal operation, when a controller module host port is connected to a data host, the port’s host link status led and host link activity led are green. If there is i/o activity, the h...

  • Page 28

    28 r/evolution 2000 series troubleshooting guide • may 2008 is the host link status led on? ■ yes – you have isolated the fault to the hba. Replace the hba. ■ no – it is likely that the controller module needs to be replaced. 6. Move the cable back to its original port. Is the host link status led o...

  • Page 29

    Chapter 3 troubleshooting using system leds 29 isolating a host-side connection fault on an iscsi storage system this procedure requires scheduled downtime. Note – do not perform more than one step at a time. Changing more than one variable at a time can complicate the troubleshooting process. 1. Ha...

  • Page 30

    30 r/evolution 2000 series troubleshooting guide • may 2008 6. Replace the hba/nic with a known good hba/nic, or move the host side cable to a known good hba/nic. Is the host link status led on? ■ yes – you have isolated the fault to the hba/nic. Replace the hba/nic. ■ no – it is likely that the con...

  • Page 31

    Chapter 3 troubleshooting using system leds 31 4. Move the expansion cable to a port on the raid enclosure with a known good link status. This step isolates the problem to the expansion cable or to the controller module’s expansion port. Is the expansion port status led on? ■ yes – you now know that...

  • Page 32

    32 r/evolution 2000 series troubleshooting guide • may 2008 using controller module status leds during normal operation, the fru ok led is green, the cache status led can be green or off, and the other controller module status leds are off. If the fru ok led is off, either: ■ the controller module i...

  • Page 33

    Chapter 3 troubleshooting using system leds 33 using power-and-cooling module leds during normal operation, the ac power good led is green. If the ac power good led is off, the module is not receiving adequate power. Verify that the power cord is properly connected and check the power source it is c...

  • Page 34

    34 r/evolution 2000 series troubleshooting guide • may 2008.

  • Page 35: Troubleshooting Using Raidar

    35 ch a p t e r 4 troubleshooting using raidar this chapter describes how to use raidar to troubleshoot your storage system and its frus. It also describes solutions to problems you might experience when using raidar. Topics covered in this chapter include: ■ “problems using raidar to access a stora...

  • Page 36

    36 r/evolution 2000 series troubleshooting guide • may 2008 problems using raidar to access a storage system the following table lists problems you might encounter when using raidar to access a storage system. Table 4-1 problems using raidar to access a storage system problem solution you cannot acc...

  • Page 37

    Chapter 4 troubleshooting using raidar 37 determining storage system status and verifying faults the system summary page shows you the overall status of the storage system. To view storage system status: 1. Select monitor > status > status summary. 2. Check the status icon at the upper left corner o...

  • Page 38

    38 r/evolution 2000 series troubleshooting guide • may 2008 4. Look for red text in the panels. Red text indicates where the fault is occurring. In figure 4-1 for example, the panels indicate a fault related to controller module b. 5. To gather more details regarding the failure, click linked text n...

  • Page 39

    Chapter 4 troubleshooting using raidar 39 3. Click your browser’s refresh button to ensure that current data is displayed. 4. In the host-generated i/o & bandwidth totals for all virtual disks panel, verify that both indicators display 0 (no activity). Clearing metadata from leftover disk drives a d...

  • Page 40

    40 r/evolution 2000 series troubleshooting guide • may 2008 isolating faulty disk drives when a drive fault occurs, basic troubleshooting actions are: ■ identify the faulty drive ■ review the drive error statistics ■ review the event log ■ replace the faulty drive ■ reconstruct the associated virtua...

  • Page 41

    Chapter 4 troubleshooting using raidar 41 reviewing disk drive error statistics the disk error stats page provides specific drive fault information. It shows a graphical representation of the enclosures and disks installed in the system. The disk error stats page can be used to gather drive informat...

  • Page 42

    42 r/evolution 2000 series troubleshooting guide • may 2008 capturing error trend data to capture error trend data for one or more drives: 1. Perform the procedure in “reviewing disk drive error statistics” on page 41. 2. Create a baseline by clearing the current error statistics. To clear the stati...

  • Page 43

    Chapter 4 troubleshooting using raidar 43 reviewing the event logs if all the steps in “identifying a faulty disk drive” on page 40 and “reviewing disk drive error statistics” on page 41 have been performed, you have determined the following: ■ a disk drive has encountered a fault ■ the location of ...

  • Page 44

    44 r/evolution 2000 series troubleshooting guide • may 2008 ■ if two drives fail and only one properly sized spare is available, an event indicates that reconstruction is about to start. The reconstruct utility starts to run, using the spare, but its progress remains at 0% until a second properly si...

  • Page 45

    Chapter 4 troubleshooting using raidar 45 isolating data path faults when isolating data path faults, you must first isolate the fault to an internal data path or an external data path. This will help to target your troubleshooting efforts. Internal data paths include the following: ■ controller to ...

  • Page 46

    46 r/evolution 2000 series troubleshooting guide • may 2008 problem phys can cause a host or controller to continually rescan drives, which disrupts i/o or causes i/o errors. I/o errors can result in a failed drive, causing a virtual disk to become critical or causing complete loss of a virtual disk...

  • Page 47

    Chapter 4 troubleshooting using raidar 47 checking phy status raidar's expander status page includes an expander controller phy detail panel. This panel shows the internal data paths that show the data paths for the storage controller, expander controller, disks, and expansion ports. Review this pag...

  • Page 48

    48 r/evolution 2000 series troubleshooting guide • may 2008 ■ disabled – the phy has been disabled by a diagnostic manage user or by the system. ■ non-critical – the phy is not coming to a ready state or the phy at the other end of the cable is disabled. 3. Not used – the module is not installed. ■ ...

  • Page 49

    Chapter 4 troubleshooting using raidar 49 ■ crc error count – in a sequence of sas transfers (frames), the data is protected by a cyclic redundancy check (crc) value. This error count specifies the number of times the computed crc does not match the crc stored in the frame, which indicates that the ...

  • Page 50

    50 r/evolution 2000 series troubleshooting guide • may 2008 reviewing the event log for disabled phys if the fault isolation firmware disables a phy, the event log shows a message like the following:. When a phy has been disabled manually, the event log shows a similar message with a different reaso...

  • Page 51

    Chapter 4 troubleshooting using raidar 51 isolating external data path faults on an fc storage system to troubleshoot external data path faults, perform the following steps: 1. Select monitor > status > advanced settings > host port status. This page provides a graphical representation of controller...

  • Page 52

    52 r/evolution 2000 series troubleshooting guide • may 2008 isolating external data path faults on an iscsi storage system to troubleshoot external data path faults, perform the following steps: 1. Select monitor > status > advanced settings > host port status. This page provides a graphical represe...

  • Page 53

    Chapter 4 troubleshooting using raidar 53 isolating external data path faults on a sas storage system to troubleshoot external data path faults, perform the following steps: 1. Select monitor > status > advanced settings > host port status. This page provides a graphical representation of controller...

  • Page 54

    54 r/evolution 2000 series troubleshooting guide • may 2008 resetting a host channel on an fc storage system for a fibre channel system using loop topology, you might need to reset a host port (channel) to fix a host connection or configuration problem. As an advanced manage user, you can use this c...

  • Page 55

    Chapter 4 troubleshooting using raidar 55 resetting expander error counters if phys have errors, you can reset expander error counters and then observe error activity during normal operation. If a phy continues to accumulate errors you can disable it in the expander controller phy detail panel. To r...

  • Page 56

    56 r/evolution 2000 series troubleshooting guide • may 2008 using recovery utilities this section describes recovering data from a virtual disk that is quarantined or offline (failed). Removing a virtual disk from quarantine the quarantine icon indicates that a previously fault-tolerant virtual disk...

  • Page 57

    Chapter 4 troubleshooting using raidar 57 caution – if the virtual disk does not have enough drives to continue operation, when a dequarantine is done, the virtual disk goes offline and its data cannot be recovered. To remove a virtual disk from quarantine: 1. Select manage > utilities > recovery ut...

  • Page 58

    58 r/evolution 2000 series troubleshooting guide • may 2008 to enable and use trust vdisk: 1. Select manage > utilities > recovery utilities > enable trust vdisk. 2. Select enabled. 3. Click enable/disable trust vdisk. The option remains enabled until you trust a virtual disk or restart the storage ...

  • Page 59

    Chapter 4 troubleshooting using raidar 59 problems scheduling tasks if your task does not run at the times you specified, check the schedule specifications. It is possible to create conflicting specifications. ■ start time is the first time the task will run. ■ if you use the between option, the sta...

  • Page 60

    60 r/evolution 2000 series troubleshooting guide • may 2008 affect of changing the date and time resetting the storage system date or time might affect scheduled tasks. Because the schedule begins with the start time, no schedules will run until the date and time are set. If the system is configured...

  • Page 61

    Chapter 4 troubleshooting using raidar 61 selecting individual events for notification as described in the reference guide, you can configure how and under what conditions the storage system alerts you when specific events occur. In addition to selecting event categories, as a diagnostic manage user...

  • Page 62

    62 r/evolution 2000 series troubleshooting guide • may 2008 ■ informational status events – represent device status changes related to the storage system’s status that usually do not require attention. ■ informational configuration events – represent device status changes related to the storage syst...

  • Page 63

    Chapter 4 troubleshooting using raidar 63 correcting enclosure ids when installing a system with drive enclosures attached, the enclosure ids might differ from the physical cabling order. This is because the controller might have been previously attached to some of the same enclosures and it attempt...

  • Page 64

    64 r/evolution 2000 series troubleshooting guide • may 2008.

  • Page 65

    65 ch a p t e r 5 troubleshooting using event logs event logs capture reported events from components throughout the storage system. Each event consists of an event code, the date and time the event occurred, which controller reported the event, and a description of what occurred. This chapter inclu...

  • Page 66

    66 r/evolution 2000 series troubleshooting guide • may 2008 viewing the event log in raidar some of the key warning and error events included in the event log during operation include the following: ■ disk detected error ■ disk channel error ■ drive down ■ virtual disk critical ■ virtual disk offlin...

  • Page 67

    Chapter 5 troubleshooting using event logs 67 2. Click one of the following buttons in the select event table to view panel to see the corresponding events. For a dual-controller system: for a single-controller system: the page shows up to 200 events for a single controller or up to 400 events for b...

  • Page 68

    68 r/evolution 2000 series troubleshooting guide • may 2008 for example: viewing an event log saved from raidar you can save event log data to a file on your network as described in “saving log information to a file” on page 70. A saved log file has the following sections: ■ contact information and ...

  • Page 69

    Chapter 5 troubleshooting using event logs 69 for example: reviewing event logs when reviewing events, do the following: 1. Review the critical/warning events. Identify the primary events and any that might be the cause of the primary event. For example, an over temperature event could cause a drive...

  • Page 70

    70 r/evolution 2000 series troubleshooting guide • may 2008 saving log information to a file you can save the following types of log information to a file: ■ device status summary, which includes basic status and configuration information for the system. ■ event logs from both controllers when in ac...

  • Page 71

    Chapter 5 troubleshooting using event logs 71 8. If prompted to specify the file location and name, do so using a .Logs extension. The default file name is store.Logs . If you intend to capture multiple event logs, be sure to name the files appropriately so that they can be identified later. 9. If y...

  • Page 72

    72 r/evolution 2000 series troubleshooting guide • may 2008 ■ no debug tracing – collects no debug data. ■ custom debug tracing – shows that specific events are selected for inclusion in the log. This is the default. If no events are selected, this option is not displayed. 3. Click change debug logg...

  • Page 73

    73 ch a p t e r 6 voltage and temperature warnings the storage system provides voltage and temperature warnings, which are generally input or environmental conditions. Voltage warnings can occur if the input voltage is too low or if a fru is receiving too little or too much power from the power-and-...

  • Page 74

    74 r/evolution 2000 series troubleshooting guide • may 2008 sensor locations the storage system monitors conditions at different points within each enclosure to alert you to problems. Power, cooling fan, temperature, and voltage sensors are located at key points in the enclosure. In in each controll...

  • Page 75

    Chapter 6 voltage and temperature warnings 75 the fan speed remains under the 4000 rpm threshold, the internal enclosure temperature may continue to rise. Replace the power-and-cooling module reporting the fault. During a shutdown, the cooling fans do not shut off. This allows the enclosure to conti...

  • Page 76

    76 r/evolution 2000 series troubleshooting guide • may 2008 when a power supply sensor goes out of range, the fault/id led illuminates amber and an event is logged to the event log. To view the controller enclosure’s temperature status, in raidar, as an advanced manage user: ● select monitor > statu...

  • Page 77

    Chapter 6 voltage and temperature warnings 77 power-and-cooling module voltage sensors power supply voltage sensors ensure that an enclosure’s power supply voltage is within normal ranges. There are three voltage sensors per power-and-cooling module. Table 6-5 voltage sensor descriptions sensor even...

  • Page 78

    78 r/evolution 2000 series troubleshooting guide • may 2008.

  • Page 79

    79 ch a p t e r 7 troubleshooting and replacing frus this chapter describes how to troubleshoot and replace field-replaceable units. A field-replaceable unit (fru) is a system component that is designed to be replaced onsite. This chapter contains the following sections: ■ “static electricity precau...

  • Page 80

    80 r/evolution 2000 series troubleshooting guide • may 2008 static electricity precautions to prevent damaging a fru, make sure you follow these static electricity precautions: ■ remove plastic, vinyl, and foam from the work area. ■ wear an antistatic wrist strap, attached to a ground. ■ before hand...

  • Page 81

    Chapter 7 troubleshooting and replacing frus 81 table 7-1 lists the faults you might encounter with a controller module or expansion module. Table 7-1 controller module or expansion module faults problem solution fru ok led is off • verify that the controller module is properly seated in the slot an...

  • Page 82

    82 r/evolution 2000 series troubleshooting guide • may 2008 removing and replacing a controller or expansion module in a dual-controller configuration, controller and expansion modules are hot- swappable, which means you can replace one module without halting i/o to the storage system or powering it...

  • Page 83

    Chapter 7 troubleshooting and replacing frus 83 the configuration file does not include configuration data for virtual disks and volumes. You do not need to save this data before replacing a controller or expansion module because the data is saved as metadata in the first sectors of associated disk ...

  • Page 84

    84 r/evolution 2000 series troubleshooting guide • may 2008 shutting down a controller module shut down a controller module before you remove it from an enclosure, or before you power off its enclosure for maintenance, repair, or a move. Shutting down a controller module halts i/o to that module, en...

  • Page 85

    Chapter 7 troubleshooting and replacing frus 85 removing a controller module or expansion module as long as the other module in the enclosure you are removing remains online and active, you can remove a module without powering down the enclosure; however you must shut down a controller module as des...

  • Page 86

    86 r/evolution 2000 series troubleshooting guide • may 2008 a. Select manage > general config > enclosure management. B. Click illuminate locator led. 5. For the controller module, locate the enclosure whose unit locator led (front) is blinking, and within it, the module whose ok to remove led is bl...

  • Page 87

    Chapter 7 troubleshooting and replacing frus 87 9. Pull the module straight out of the enclosure. Replacing a controller module or expansion module you can install a controller module or expansion module into an enclosure that is powered on. Caution – when replacing a controller module, ensure that ...

  • Page 88

    88 r/evolution 2000 series troubleshooting guide • may 2008 to install a controller module or an expansion module: 1. Follow all static electricity precautions as described in “static electricity precautions” on page 80. 2. Loosen the thumbscrews; press the latches downward. 3. Slide the controller ...

  • Page 89

    Chapter 7 troubleshooting and replacing frus 89 fault/service required if the fault/service required yellow led is illuminated, the module has not gone online and likely failed its self-test. Try to put the module online (see “shutting down a controller module” on page 84) or check for errors that w...

  • Page 90

    90 r/evolution 2000 series troubleshooting guide • may 2008 reorder the enclosure ids. To minimize issues with enclosure ids, always move a complete set of expansion modules and reconnect them in the same order as they were connected to the original controller module. To rescan, as an advanced manag...

  • Page 91

    Chapter 7 troubleshooting and replacing frus 91 disabling partner firmware upgrade the partner firmware upgrade option is enabled by default in raidar. Only disable this function if told to do so by a service technician. 1. Select manage > general config > system configuration. 2. For partner firmwa...

  • Page 92

    92 r/evolution 2000 series troubleshooting guide • may 2008 5. Review the current and new software versions, and then click proceed with code update. A code load progress window is displayed to show the progress of the update, which can take several minutes to complete. Do not power off the storage ...

  • Page 93

    Chapter 7 troubleshooting and replacing frus 93 removing and replacing an sfp module this section provides steps to remove and replace an sfp module. Caution – mishandling fiber-optic cables can degrade performance. Do not twist, fold, pinch, or step on fiber-optic cables. Do not bend the fiber-opti...

  • Page 94

    94 r/evolution 2000 series troubleshooting guide • may 2008 2. The sfp is held in place by a small wire bail actuator; flip the actuator up and gently pull on it to remove the sfp from the controller. Installing an sfp module to install an sfp module, perform the following steps: 1. If the sfp has a...

  • Page 95

    Chapter 7 troubleshooting and replacing frus 95 identifying cable faults when identifying cable faults you must remember that there are two sides of the controller: the input/output to the host and the input/output to the drive enclosures. It is also important to remember that identifying a cable fa...

  • Page 96

    96 r/evolution 2000 series troubleshooting guide • may 2008 ■ if at least 15 seconds elapses between disconnecting a cable and connecting it to a different port in the same enclosure or in a different enclosure, no further action is required. Identifying drive module faults when identifying faults i...

  • Page 97

    Chapter 7 troubleshooting and replacing frus 97 when a disk detects an error, it reports it to the controller by returning a scsi sense key, and if appropriate, additional information. This information is recorded in the raidar event log. Table 7-2 lists some of the most common scsi sense key descri...

  • Page 98

    98 r/evolution 2000 series troubleshooting guide • may 2008 example below is an example of an error reported in the event log: the drive in slot 10 of enclosure 1 reported a sense key error of 2 and an asc/ascq of 04/11. Disk drive errors in general media errors (sense key 3), recovery errors (sense...

  • Page 99

    Chapter 7 troubleshooting and replacing frus 99 disk channel errors disk channel errors are similar to disk-detected errors, except they are detected by the controllers instead of the disk drive. Some disk channel errors are displayed as text strings. Others are displayed as hexadecimal codes. If th...

  • Page 100

    100 r/evolution 2000 series troubleshooting guide • may 2008 identifying faulty drive modules to identify faulty drive modules, perform the following steps: 1. Does the fault involve a single drive? ■ if yes, perform steps step 2 through step 4. ■ if an entire enclosure of disk drives is faulty, che...

  • Page 101

    Chapter 7 troubleshooting and replacing frus 101 updating disk drive firmware you can update disk drive firmware by loading a firmware update file obtained from the disk drive manufacturer or your reseller. Note – updating the firmware of disk drives in a virtual disk risks the loss of data and caus...

  • Page 102

    102 r/evolution 2000 series troubleshooting guide • may 2008 6. Stop host i/o by either disconnecting data cables from the storage system controllers or powering down all hosts connected to the system. To update disk drive firmware: 1. Select manage > update software > disk drive firmware > update f...

  • Page 103

    Chapter 7 troubleshooting and replacing frus 103 8. To start the firmware update, click start firmware update. To cancel the firmware update, click cancel. The file is transferred to the controller where it is temporarily stored prior to download to the disk drives. Once the firmware update process ...

  • Page 104

    104 r/evolution 2000 series troubleshooting guide • may 2008 removing and replacing a drive module a drive module consists of a disk drive in a sled. Drive modules are hot-swappable, which means they can be replaced without halting i/o to the storage system or powering it off. Caution – to prevent a...

  • Page 105

    Chapter 7 troubleshooting and replacing frus 105 ■ replace the defective drive and make the new drive a global spare while the rebuilding process continues. This procedure installs the new drive and assigns it as a global spare so that an automatic rebuild can occur if a drive module fails on anothe...

  • Page 106

    106 r/evolution 2000 series troubleshooting guide • may 2008 4. Replace the failed module by following the instructions in “removing a drive module” on page 106. You can also use the cli show enclosure-status command. If the drive status is “absent” the drive might have failed, or it has been remove...

  • Page 107

    Chapter 7 troubleshooting and replacing frus 107 4. Wait 20 seconds for the internal disks to stop spinning. 5. Pull the drive module out of the enclosure. Installing a drive module to install the a drive module, perform the following steps: 1. Follow all static electricity precautions as described ...

  • Page 108

    108 r/evolution 2000 series troubleshooting guide • may 2008 table 7-5 disk drive status status action online the vdisk is online and does not have fault tolerant attributes. None fault tolerant the vdisk is online and fault tolerant. None offline the vdisk is offline either because of initializatio...

  • Page 109

    Chapter 7 troubleshooting and replacing frus 109 6. After replacing a failed drive, save the configuration settings as described in “saving configuration settings” on page 82. The saved configuration includes configuration information for all the drive modules in the virtual disk. When you save the ...

  • Page 110

    110 r/evolution 2000 series troubleshooting guide • may 2008 c. Data hosts last (if they had been powered down for maintenance purposes) 2. In raidar, select monitor > status > vdisk status to display the virtual disk overview panel. This panel displays an icon for each virtual disk with information...

  • Page 111

    Chapter 7 troubleshooting and replacing frus 111 failover causes a virtual disk to become critical when one of its drives “disappears.” • in general, controller failover is not supported if a disk drive is in a drive enclosure that is connected with only one cable to the controller enclosure. This i...

  • Page 112

    112 r/evolution 2000 series troubleshooting guide • may 2008 clearing metadata from a disk drive all of the member disk drives in a virtual disk contain metadata in the first sectors. The storage system uses the metadata to identify virtual disk members after restarting or replacing enclosures. Clea...

  • Page 113

    Chapter 7 troubleshooting and replacing frus 113 caution – because removing the power-and-cooling module significantly disrupts the enclosure’s airflow, do not remove the power-and-cooling module until you have the replacement module. Table 7-7 lists possible power-and-cooling module faults. Table 7...

  • Page 114

    114 r/evolution 2000 series troubleshooting guide • may 2008 removing and replacing a power-and-cooling module a single power-and-cooling module is sufficient to maintain operation of the enclosure. It is not necessary to halt operations and completely power off the enclosure when replacing only one...

  • Page 115

    Chapter 7 troubleshooting and replacing frus 115 installing a power-and-cooling module to install a power-and-cooling module, perform the following steps: 1. Slide the module into the slot as far as it will go. 2. Press the latch upward to engage the module; turn the thumbscrews finger-tight. 3. Rec...

  • Page 116

    116 r/evolution 2000 series troubleshooting guide • may 2008 replacing an enclosure the enclosure consists of an enclosure’s metal housing and the midplane that connects controller/expansion modules, drive modules, and power-and-cooling modules. This fru replaces an enclosure that has been damaged o...

  • Page 117

    117 a p p e n d i x a troubleshooting using the cli this appendix briefly describes cli commands that are useful for troubleshooting storage system problems. For detailed information about command syntax and using the cli, see the cli reference guide. Topics covered in this appendix include: ■ “view...

  • Page 118

    118 r/evolution 2000 series troubleshooting guide • may 2008 viewing command help to view brief descriptions of all commands that are available to the user level you logged in as, type: to view help for a specific command, type either: to view information about the syntax to use for specifying disk ...

  • Page 119

    Appendix a troubleshooting using the cli 119 ping tests communication with a remote host. The remote host is specified by ip address. Ping sends icmp echo response packets and waits for replies. For details about using ping , see the cli reference guide. Rescan when installing a system with drive en...

  • Page 120

    120 r/evolution 2000 series troubleshooting guide • may 2008 note – if the storage system is connected to a microsoft windows host, the following event is recorded in the windows event log: initiator failed to connect to the target. For details about using restart , see the cli reference guide. Rest...

  • Page 121

    Appendix a troubleshooting using the cli 121 set expander-fault-isolation when fault isolation is enabled, the expander controller will isolate phys that fail to meet certain criteria. When fault isolation is disabled, the errors are noted in the logs but the phys are not isolated. For details about...

  • Page 122

    122 r/evolution 2000 series troubleshooting guide • may 2008 show debug-log note – this command should only be used by service technicians, or with the advice of a service technician. Shows the debug logs for the storage controller (sc), the management controller (mc), the semaphore trace, task logs...

  • Page 123

    Appendix a troubleshooting using the cli 123 show events shows events for an enclosure, including events from each management controller and each storage controller. A separate set of event numbers is maintained for each controller module. Each event number is prefixed with a letter identifying the ...

  • Page 124

    124 r/evolution 2000 series troubleshooting guide • may 2008 show redundancy-mode shows the redundancy status of the system. For details about using show redundancy-mode , see the cli reference guide. Trust enables an offline virtual disk to be brought online for emergency data collection only. It m...

  • Page 125

    Appendix a troubleshooting using the cli 125 problems scheduling tasks there are two parts to scheduling tasks: you must create the task and then create the schedule to run the task. Create the task there are three tasks you can create: takesnapshot , resetsnapshot , and volumecopy . Perform the ope...

  • Page 126

    126 r/evolution 2000 series troubleshooting guide • may 2008 errors associated with scheduling tasks the following table describes error messages associated with scheduling tasks. Missing parameter data error if you try to use a command that has a name parameter and the cli displays “error: the comm...

  • Page 127: Index

    127 index a air management module, installing , 110 architecture, system overview , 11 b bad block list size, displaying , 42 reassignments, displaying , 42 boot handshake , 89 c cables identifying faults drive enclosure side , 95 host side , 95 cache clearing , 118 size , 82 cli help, view command ...

  • Page 128

    128 r/evolution 2000 series troubleshooting guide • may 2008 no response count , 41 non-media errors , 42 reviewing error statistics , 41 capturing trend data , 42 spin-up retires , 41 understanding errors , 96 updating firmware , 101 disk drives, scan for changes , 63, 90 disk error stats , 41 driv...

  • Page 129

    Index 129 i i/o checking status , 38 displaying timeout count , 41 icons, system status , 37 informational events , 65 enabling , 65 selecting to monitor , 61 installing air management modules , 110 controller modules , 87 drive modules , 107 expansion modules , 87 power-and-cooling modules , 115 in...

  • Page 130

    130 r/evolution 2000 series troubleshooting guide • may 2008 saving log information , 70 scheduling tasks , 59 scsi enclosure services. See ses sensors cooling fan , 74 locating , 74 power supply , 74 temperature , 75 voltage , 77 ses displaying firmware version , 47 setting the time , 89 shutting d...