- DL manuals
- R/Evolution
- Server
- 2000 Series
- Troubleshooting Manual
R/Evolution 2000 Series Troubleshooting Manual - page 80
80
R/Evolution 2000 Series Troubleshooting Guide • May 2008
Static Electricity Precautions
To prevent damaging a FRU, make sure you follow these static electricity
precautions:
■
Remove plastic, vinyl, and foam from the work area.
■
Wear an antistatic wrist strap, attached to a ground.
■
Before handling a FRU, discharge any static electricity by touching a ground
surface.
■
Do not remove a FRU from its antistatic protective bag until you are ready to
install it.
■
When removing a FRU from a controller enclosure, immediately place the FRU
in an antistatic bag and in antistatic packaging.
■
Handle a FRU only by its edges and avoid touching the circuitry.
■
Do not slide a FRU over any surface.
■
Limit body movement (which builds up static electricity) during FRU
installation.
Identifying Controller or Expansion Module Faults
The controller and expansion modules contain subcomponents that require the
replacement of the entire FRU should they fail. Each controller and expansion
module contains LEDs that can be used to identify a fault. Additionally, you can use
RAIDar to locate and isolate controller and expansion module faults. (See
“Troubleshooting Using RAIDar” on page 35.)
Note –
When troubleshooting, ensure that you review the reported events carefully.
The controller module is often the FRU reporting faults, but is not always the FRU
where the fault is occurring.
Summary of 2000 Series
Page 1
2000 series troubleshooting guide p/n 83-00004287-12 revision a may 2008.
Page 2
Copyright protected material 2002-2008. All rights reserved. R/evolution and the r/evolution logo are trademarks of dot hill systems corp. All other trademarks and registered trademarks are proprietary to their respective owners. The material in this document is for information only and is subject t...
Page 3: Contents
3 contents preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1. System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 architecture overview . . . . . . . ....
Page 4
4 r/evolution 2000 series troubleshooting guide • may 2008 3. Troubleshooting using system leds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 led names and locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 using leds to check ...
Page 5
Contents 5 resetting expander error counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 disabling or enabling a phy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 disabling or enabling phy isolation . . . . . . . . . . . . . . . . . . . . ....
Page 6
6 r/evolution 2000 series troubleshooting guide • may 2008 cooling fan sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 temperature sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 power-and-cooli...
Page 7
Contents 7 updating disk drive firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 removing and replacing a drive module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 replacing a drive module when the virtual disk is rebuilding . . . . . . . . . ....
Page 8
8 r/evolution 2000 series troubleshooting guide • may 2008 set protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 show debug-log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
Page 9: Preface
9 preface this guide describes how to diagnose and troubleshoot a r/evolution™ storage system, and how to identify, remove, and replace field-replaceable units (frus). It also describes critical, warning, and informational events that can occur during system operation. This guide applies to the foll...
Page 10
10 r/evolution 2000 series troubleshooting guide • may 2008 typographic conventions related documentation typeface 1 1 the fonts used in your viewer might differ. Meaning examples aabbcc123 book title, new term, or emphasized word see the release notes. A virtual disk (vdisk) can .... You must .... ...
Page 11: System Architecture
11 ch a p t e r 1 system architecture this chapter describes the r/evolution™ storage system architecture. Prior to troubleshooting any system, it is important to understand the architecture, including each of the system components, how they relate to each other, and how data passes through the syst...
Page 12
12 r/evolution 2000 series troubleshooting guide • may 2008 frus include: ■ chassis-and-midplane. An enclosure’s 2u metal chassis and its midplane circuit board comprise a single fru. All other frus connect and interact through the midplane. ■ drive module. An enclosure can contain 12 sata or sas dr...
Page 13
Chapter 1 system architecture 13 enclosure id display the enclosure id (eid) display provides a visual single-digit identifier for each enclosure in a storage system. The eid display is located on the left ear, as viewed from the front of the chassis. For a storage system that includes a controller ...
Page 14
14 r/evolution 2000 series troubleshooting guide • may 2008 drive modules the drive module has a front bezel with a latch that is used to insert or remove the drive module. When any component of a drive module fails, the entire module is replaced. Each drive module is inserted into a drive slot (or ...
Page 15
Chapter 1 system architecture 15 controller modules a controller module is a fru that contains two connected circuit boards: a raid i/o module and a host interface module (him). The raid i/o module is a hot-pluggable board that mates with the enclosure midplane and provides all raid controller funct...
Page 16
16 r/evolution 2000 series troubleshooting guide • may 2008 power supply unit each 750-watt, ac power supply unit (psu) is auto-sensing and runs in a load- balanced configuration to ensure that the load is distributed evenly across both power supplies. Cooling fans the cooling fans are integrated in...
Page 17
Chapter 1 system architecture 17 airflow is controlled and optimized over the power supply by using the power supply chassis as the air-duct for the power supply, ensuring that there are no dead air spaces in the power supply core and increasing the velocity flow (lfm) by controlling the cross secti...
Page 18
18 r/evolution 2000 series troubleshooting guide • may 2008.
Page 19: Fault Isolation Methodology
19 ch a p t e r 2 fault isolation methodology the r/evolution storage system provides many ways to isolate faults within the system. This chapter presents the basic methodology used to locate faults and the associated frus. The basic fault isolation steps are: ■ gather fault information ■ determine ...
Page 20
20 r/evolution 2000 series troubleshooting guide • may 2008 use raidar to verify any faults found while viewing the leds. Raidar is also a good tool to use in determining where the fault is occurring if the leds cannot be viewed due to the location of the system. Raidar provides you with a visual re...
Page 21
21 ch a p t e r 3 troubleshooting using system leds the first step in troubleshooting your storage system is to check the status of its leds. System leds can help you identify the fru that is experiencing a fault. This chapter includes the following topics: ■ “led names and locations” on page 21 ■ “...
Page 22
22 r/evolution 2000 series troubleshooting guide • may 2008 figure 3-2 2730 controller module leds figure 3-3 2330 controller module leds figure 3-4 2530 controller module leds 10/100 base-t status activity dirty clean cache cli mui link speed link speed fc port 0 fc port 1 host link status host lin...
Page 23
Chapter 3 troubleshooting using system leds 23 figure 3-5 expansion module leds figure 3-6 power-and-cooling module leds using leds to check system status check the enclosure status leds periodically or after you have received an error notification. If a yellow led is on, the enclosure has experienc...
Page 24
24 r/evolution 2000 series troubleshooting guide • may 2008 using enclosure status leds during normal operation, the fru ok led is green and the other enclosure- status leds are off. If the fru ok led is off , the enclosure is not powered on. If the enclosure should be powered on, verify that its po...
Page 25
Chapter 3 troubleshooting using system leds 25 using controller module host port leds during normal operation, when a controller module host port is connected to a data host, the port’s host link status led and host link activity led are green. For fc, if the link speed is set to 2 gbit/sec the host...
Page 26
26 r/evolution 2000 series troubleshooting guide • may 2008 4. Move the sfp and cable to a port with a known good link status. This step isolates the problem to the external data path (sfp, host cable, and host- side devices) or to the controller module port. Is the host link status led on? ■ yes – ...
Page 27
Chapter 3 troubleshooting using system leds 27 isolating a host-side connection fault on a sas storage system during normal operation, when a controller module host port is connected to a data host, the port’s host link status led and host link activity led are green. If there is i/o activity, the h...
Page 28
28 r/evolution 2000 series troubleshooting guide • may 2008 is the host link status led on? ■ yes – you have isolated the fault to the hba. Replace the hba. ■ no – it is likely that the controller module needs to be replaced. 6. Move the cable back to its original port. Is the host link status led o...
Page 29
Chapter 3 troubleshooting using system leds 29 isolating a host-side connection fault on an iscsi storage system this procedure requires scheduled downtime. Note – do not perform more than one step at a time. Changing more than one variable at a time can complicate the troubleshooting process. 1. Ha...
Page 30
30 r/evolution 2000 series troubleshooting guide • may 2008 6. Replace the hba/nic with a known good hba/nic, or move the host side cable to a known good hba/nic. Is the host link status led on? ■ yes – you have isolated the fault to the hba/nic. Replace the hba/nic. ■ no – it is likely that the con...
Page 31
Chapter 3 troubleshooting using system leds 31 4. Move the expansion cable to a port on the raid enclosure with a known good link status. This step isolates the problem to the expansion cable or to the controller module’s expansion port. Is the expansion port status led on? ■ yes – you now know that...
Page 32
32 r/evolution 2000 series troubleshooting guide • may 2008 using controller module status leds during normal operation, the fru ok led is green, the cache status led can be green or off, and the other controller module status leds are off. If the fru ok led is off, either: ■ the controller module i...
Page 33
Chapter 3 troubleshooting using system leds 33 using power-and-cooling module leds during normal operation, the ac power good led is green. If the ac power good led is off, the module is not receiving adequate power. Verify that the power cord is properly connected and check the power source it is c...
Page 34
34 r/evolution 2000 series troubleshooting guide • may 2008.
Page 35: Troubleshooting Using Raidar
35 ch a p t e r 4 troubleshooting using raidar this chapter describes how to use raidar to troubleshoot your storage system and its frus. It also describes solutions to problems you might experience when using raidar. Topics covered in this chapter include: ■ “problems using raidar to access a stora...
Page 36
36 r/evolution 2000 series troubleshooting guide • may 2008 problems using raidar to access a storage system the following table lists problems you might encounter when using raidar to access a storage system. Table 4-1 problems using raidar to access a storage system problem solution you cannot acc...
Page 37
Chapter 4 troubleshooting using raidar 37 determining storage system status and verifying faults the system summary page shows you the overall status of the storage system. To view storage system status: 1. Select monitor > status > status summary. 2. Check the status icon at the upper left corner o...
Page 38
38 r/evolution 2000 series troubleshooting guide • may 2008 4. Look for red text in the panels. Red text indicates where the fault is occurring. In figure 4-1 for example, the panels indicate a fault related to controller module b. 5. To gather more details regarding the failure, click linked text n...
Page 39
Chapter 4 troubleshooting using raidar 39 3. Click your browser’s refresh button to ensure that current data is displayed. 4. In the host-generated i/o & bandwidth totals for all virtual disks panel, verify that both indicators display 0 (no activity). Clearing metadata from leftover disk drives a d...
Page 40
40 r/evolution 2000 series troubleshooting guide • may 2008 isolating faulty disk drives when a drive fault occurs, basic troubleshooting actions are: ■ identify the faulty drive ■ review the drive error statistics ■ review the event log ■ replace the faulty drive ■ reconstruct the associated virtua...
Page 41
Chapter 4 troubleshooting using raidar 41 reviewing disk drive error statistics the disk error stats page provides specific drive fault information. It shows a graphical representation of the enclosures and disks installed in the system. The disk error stats page can be used to gather drive informat...
Page 42
42 r/evolution 2000 series troubleshooting guide • may 2008 capturing error trend data to capture error trend data for one or more drives: 1. Perform the procedure in “reviewing disk drive error statistics” on page 41. 2. Create a baseline by clearing the current error statistics. To clear the stati...
Page 43
Chapter 4 troubleshooting using raidar 43 reviewing the event logs if all the steps in “identifying a faulty disk drive” on page 40 and “reviewing disk drive error statistics” on page 41 have been performed, you have determined the following: ■ a disk drive has encountered a fault ■ the location of ...
Page 44
44 r/evolution 2000 series troubleshooting guide • may 2008 ■ if two drives fail and only one properly sized spare is available, an event indicates that reconstruction is about to start. The reconstruct utility starts to run, using the spare, but its progress remains at 0% until a second properly si...
Page 45
Chapter 4 troubleshooting using raidar 45 isolating data path faults when isolating data path faults, you must first isolate the fault to an internal data path or an external data path. This will help to target your troubleshooting efforts. Internal data paths include the following: ■ controller to ...
Page 46
46 r/evolution 2000 series troubleshooting guide • may 2008 problem phys can cause a host or controller to continually rescan drives, which disrupts i/o or causes i/o errors. I/o errors can result in a failed drive, causing a virtual disk to become critical or causing complete loss of a virtual disk...
Page 47
Chapter 4 troubleshooting using raidar 47 checking phy status raidar's expander status page includes an expander controller phy detail panel. This panel shows the internal data paths that show the data paths for the storage controller, expander controller, disks, and expansion ports. Review this pag...
Page 48
48 r/evolution 2000 series troubleshooting guide • may 2008 ■ disabled – the phy has been disabled by a diagnostic manage user or by the system. ■ non-critical – the phy is not coming to a ready state or the phy at the other end of the cable is disabled. 3. Not used – the module is not installed. ■ ...
Page 49
Chapter 4 troubleshooting using raidar 49 ■ crc error count – in a sequence of sas transfers (frames), the data is protected by a cyclic redundancy check (crc) value. This error count specifies the number of times the computed crc does not match the crc stored in the frame, which indicates that the ...
Page 50
50 r/evolution 2000 series troubleshooting guide • may 2008 reviewing the event log for disabled phys if the fault isolation firmware disables a phy, the event log shows a message like the following:. When a phy has been disabled manually, the event log shows a similar message with a different reaso...
Page 51
Chapter 4 troubleshooting using raidar 51 isolating external data path faults on an fc storage system to troubleshoot external data path faults, perform the following steps: 1. Select monitor > status > advanced settings > host port status. This page provides a graphical representation of controller...
Page 52
52 r/evolution 2000 series troubleshooting guide • may 2008 isolating external data path faults on an iscsi storage system to troubleshoot external data path faults, perform the following steps: 1. Select monitor > status > advanced settings > host port status. This page provides a graphical represe...
Page 53
Chapter 4 troubleshooting using raidar 53 isolating external data path faults on a sas storage system to troubleshoot external data path faults, perform the following steps: 1. Select monitor > status > advanced settings > host port status. This page provides a graphical representation of controller...
Page 54
54 r/evolution 2000 series troubleshooting guide • may 2008 resetting a host channel on an fc storage system for a fibre channel system using loop topology, you might need to reset a host port (channel) to fix a host connection or configuration problem. As an advanced manage user, you can use this c...
Page 55
Chapter 4 troubleshooting using raidar 55 resetting expander error counters if phys have errors, you can reset expander error counters and then observe error activity during normal operation. If a phy continues to accumulate errors you can disable it in the expander controller phy detail panel. To r...
Page 56
56 r/evolution 2000 series troubleshooting guide • may 2008 using recovery utilities this section describes recovering data from a virtual disk that is quarantined or offline (failed). Removing a virtual disk from quarantine the quarantine icon indicates that a previously fault-tolerant virtual disk...
Page 57
Chapter 4 troubleshooting using raidar 57 caution – if the virtual disk does not have enough drives to continue operation, when a dequarantine is done, the virtual disk goes offline and its data cannot be recovered. To remove a virtual disk from quarantine: 1. Select manage > utilities > recovery ut...
Page 58
58 r/evolution 2000 series troubleshooting guide • may 2008 to enable and use trust vdisk: 1. Select manage > utilities > recovery utilities > enable trust vdisk. 2. Select enabled. 3. Click enable/disable trust vdisk. The option remains enabled until you trust a virtual disk or restart the storage ...
Page 59
Chapter 4 troubleshooting using raidar 59 problems scheduling tasks if your task does not run at the times you specified, check the schedule specifications. It is possible to create conflicting specifications. ■ start time is the first time the task will run. ■ if you use the between option, the sta...
Page 60
60 r/evolution 2000 series troubleshooting guide • may 2008 affect of changing the date and time resetting the storage system date or time might affect scheduled tasks. Because the schedule begins with the start time, no schedules will run until the date and time are set. If the system is configured...
Page 61
Chapter 4 troubleshooting using raidar 61 selecting individual events for notification as described in the reference guide, you can configure how and under what conditions the storage system alerts you when specific events occur. In addition to selecting event categories, as a diagnostic manage user...
Page 62
62 r/evolution 2000 series troubleshooting guide • may 2008 ■ informational status events – represent device status changes related to the storage system’s status that usually do not require attention. ■ informational configuration events – represent device status changes related to the storage syst...
Page 63
Chapter 4 troubleshooting using raidar 63 correcting enclosure ids when installing a system with drive enclosures attached, the enclosure ids might differ from the physical cabling order. This is because the controller might have been previously attached to some of the same enclosures and it attempt...
Page 64
64 r/evolution 2000 series troubleshooting guide • may 2008.
Page 65
65 ch a p t e r 5 troubleshooting using event logs event logs capture reported events from components throughout the storage system. Each event consists of an event code, the date and time the event occurred, which controller reported the event, and a description of what occurred. This chapter inclu...
Page 66
66 r/evolution 2000 series troubleshooting guide • may 2008 viewing the event log in raidar some of the key warning and error events included in the event log during operation include the following: ■ disk detected error ■ disk channel error ■ drive down ■ virtual disk critical ■ virtual disk offlin...
Page 67
Chapter 5 troubleshooting using event logs 67 2. Click one of the following buttons in the select event table to view panel to see the corresponding events. For a dual-controller system: for a single-controller system: the page shows up to 200 events for a single controller or up to 400 events for b...
Page 68
68 r/evolution 2000 series troubleshooting guide • may 2008 for example: viewing an event log saved from raidar you can save event log data to a file on your network as described in “saving log information to a file” on page 70. A saved log file has the following sections: ■ contact information and ...
Page 69
Chapter 5 troubleshooting using event logs 69 for example: reviewing event logs when reviewing events, do the following: 1. Review the critical/warning events. Identify the primary events and any that might be the cause of the primary event. For example, an over temperature event could cause a drive...
Page 70
70 r/evolution 2000 series troubleshooting guide • may 2008 saving log information to a file you can save the following types of log information to a file: ■ device status summary, which includes basic status and configuration information for the system. ■ event logs from both controllers when in ac...
Page 71
Chapter 5 troubleshooting using event logs 71 8. If prompted to specify the file location and name, do so using a .Logs extension. The default file name is store.Logs . If you intend to capture multiple event logs, be sure to name the files appropriately so that they can be identified later. 9. If y...
Page 72
72 r/evolution 2000 series troubleshooting guide • may 2008 ■ no debug tracing – collects no debug data. ■ custom debug tracing – shows that specific events are selected for inclusion in the log. This is the default. If no events are selected, this option is not displayed. 3. Click change debug logg...
Page 73
73 ch a p t e r 6 voltage and temperature warnings the storage system provides voltage and temperature warnings, which are generally input or environmental conditions. Voltage warnings can occur if the input voltage is too low or if a fru is receiving too little or too much power from the power-and-...
Page 74
74 r/evolution 2000 series troubleshooting guide • may 2008 sensor locations the storage system monitors conditions at different points within each enclosure to alert you to problems. Power, cooling fan, temperature, and voltage sensors are located at key points in the enclosure. In in each controll...
Page 75
Chapter 6 voltage and temperature warnings 75 the fan speed remains under the 4000 rpm threshold, the internal enclosure temperature may continue to rise. Replace the power-and-cooling module reporting the fault. During a shutdown, the cooling fans do not shut off. This allows the enclosure to conti...
Page 76
76 r/evolution 2000 series troubleshooting guide • may 2008 when a power supply sensor goes out of range, the fault/id led illuminates amber and an event is logged to the event log. To view the controller enclosure’s temperature status, in raidar, as an advanced manage user: ● select monitor > statu...
Page 77
Chapter 6 voltage and temperature warnings 77 power-and-cooling module voltage sensors power supply voltage sensors ensure that an enclosure’s power supply voltage is within normal ranges. There are three voltage sensors per power-and-cooling module. Table 6-5 voltage sensor descriptions sensor even...
Page 78
78 r/evolution 2000 series troubleshooting guide • may 2008.
Page 79
79 ch a p t e r 7 troubleshooting and replacing frus this chapter describes how to troubleshoot and replace field-replaceable units. A field-replaceable unit (fru) is a system component that is designed to be replaced onsite. This chapter contains the following sections: ■ “static electricity precau...
Page 80
80 r/evolution 2000 series troubleshooting guide • may 2008 static electricity precautions to prevent damaging a fru, make sure you follow these static electricity precautions: ■ remove plastic, vinyl, and foam from the work area. ■ wear an antistatic wrist strap, attached to a ground. ■ before hand...
Page 81
Chapter 7 troubleshooting and replacing frus 81 table 7-1 lists the faults you might encounter with a controller module or expansion module. Table 7-1 controller module or expansion module faults problem solution fru ok led is off • verify that the controller module is properly seated in the slot an...
Page 82
82 r/evolution 2000 series troubleshooting guide • may 2008 removing and replacing a controller or expansion module in a dual-controller configuration, controller and expansion modules are hot- swappable, which means you can replace one module without halting i/o to the storage system or powering it...
Page 83
Chapter 7 troubleshooting and replacing frus 83 the configuration file does not include configuration data for virtual disks and volumes. You do not need to save this data before replacing a controller or expansion module because the data is saved as metadata in the first sectors of associated disk ...
Page 84
84 r/evolution 2000 series troubleshooting guide • may 2008 shutting down a controller module shut down a controller module before you remove it from an enclosure, or before you power off its enclosure for maintenance, repair, or a move. Shutting down a controller module halts i/o to that module, en...
Page 85
Chapter 7 troubleshooting and replacing frus 85 removing a controller module or expansion module as long as the other module in the enclosure you are removing remains online and active, you can remove a module without powering down the enclosure; however you must shut down a controller module as des...
Page 86
86 r/evolution 2000 series troubleshooting guide • may 2008 a. Select manage > general config > enclosure management. B. Click illuminate locator led. 5. For the controller module, locate the enclosure whose unit locator led (front) is blinking, and within it, the module whose ok to remove led is bl...
Page 87
Chapter 7 troubleshooting and replacing frus 87 9. Pull the module straight out of the enclosure. Replacing a controller module or expansion module you can install a controller module or expansion module into an enclosure that is powered on. Caution – when replacing a controller module, ensure that ...
Page 88
88 r/evolution 2000 series troubleshooting guide • may 2008 to install a controller module or an expansion module: 1. Follow all static electricity precautions as described in “static electricity precautions” on page 80. 2. Loosen the thumbscrews; press the latches downward. 3. Slide the controller ...
Page 89
Chapter 7 troubleshooting and replacing frus 89 fault/service required if the fault/service required yellow led is illuminated, the module has not gone online and likely failed its self-test. Try to put the module online (see “shutting down a controller module” on page 84) or check for errors that w...
Page 90
90 r/evolution 2000 series troubleshooting guide • may 2008 reorder the enclosure ids. To minimize issues with enclosure ids, always move a complete set of expansion modules and reconnect them in the same order as they were connected to the original controller module. To rescan, as an advanced manag...
Page 91
Chapter 7 troubleshooting and replacing frus 91 disabling partner firmware upgrade the partner firmware upgrade option is enabled by default in raidar. Only disable this function if told to do so by a service technician. 1. Select manage > general config > system configuration. 2. For partner firmwa...
Page 92
92 r/evolution 2000 series troubleshooting guide • may 2008 5. Review the current and new software versions, and then click proceed with code update. A code load progress window is displayed to show the progress of the update, which can take several minutes to complete. Do not power off the storage ...
Page 93
Chapter 7 troubleshooting and replacing frus 93 removing and replacing an sfp module this section provides steps to remove and replace an sfp module. Caution – mishandling fiber-optic cables can degrade performance. Do not twist, fold, pinch, or step on fiber-optic cables. Do not bend the fiber-opti...
Page 94
94 r/evolution 2000 series troubleshooting guide • may 2008 2. The sfp is held in place by a small wire bail actuator; flip the actuator up and gently pull on it to remove the sfp from the controller. Installing an sfp module to install an sfp module, perform the following steps: 1. If the sfp has a...
Page 95
Chapter 7 troubleshooting and replacing frus 95 identifying cable faults when identifying cable faults you must remember that there are two sides of the controller: the input/output to the host and the input/output to the drive enclosures. It is also important to remember that identifying a cable fa...
Page 96
96 r/evolution 2000 series troubleshooting guide • may 2008 ■ if at least 15 seconds elapses between disconnecting a cable and connecting it to a different port in the same enclosure or in a different enclosure, no further action is required. Identifying drive module faults when identifying faults i...
Page 97
Chapter 7 troubleshooting and replacing frus 97 when a disk detects an error, it reports it to the controller by returning a scsi sense key, and if appropriate, additional information. This information is recorded in the raidar event log. Table 7-2 lists some of the most common scsi sense key descri...
Page 98
98 r/evolution 2000 series troubleshooting guide • may 2008 example below is an example of an error reported in the event log: the drive in slot 10 of enclosure 1 reported a sense key error of 2 and an asc/ascq of 04/11. Disk drive errors in general media errors (sense key 3), recovery errors (sense...
Page 99
Chapter 7 troubleshooting and replacing frus 99 disk channel errors disk channel errors are similar to disk-detected errors, except they are detected by the controllers instead of the disk drive. Some disk channel errors are displayed as text strings. Others are displayed as hexadecimal codes. If th...
Page 100
100 r/evolution 2000 series troubleshooting guide • may 2008 identifying faulty drive modules to identify faulty drive modules, perform the following steps: 1. Does the fault involve a single drive? ■ if yes, perform steps step 2 through step 4. ■ if an entire enclosure of disk drives is faulty, che...
Page 101
Chapter 7 troubleshooting and replacing frus 101 updating disk drive firmware you can update disk drive firmware by loading a firmware update file obtained from the disk drive manufacturer or your reseller. Note – updating the firmware of disk drives in a virtual disk risks the loss of data and caus...
Page 102
102 r/evolution 2000 series troubleshooting guide • may 2008 6. Stop host i/o by either disconnecting data cables from the storage system controllers or powering down all hosts connected to the system. To update disk drive firmware: 1. Select manage > update software > disk drive firmware > update f...
Page 103
Chapter 7 troubleshooting and replacing frus 103 8. To start the firmware update, click start firmware update. To cancel the firmware update, click cancel. The file is transferred to the controller where it is temporarily stored prior to download to the disk drives. Once the firmware update process ...
Page 104
104 r/evolution 2000 series troubleshooting guide • may 2008 removing and replacing a drive module a drive module consists of a disk drive in a sled. Drive modules are hot-swappable, which means they can be replaced without halting i/o to the storage system or powering it off. Caution – to prevent a...
Page 105
Chapter 7 troubleshooting and replacing frus 105 ■ replace the defective drive and make the new drive a global spare while the rebuilding process continues. This procedure installs the new drive and assigns it as a global spare so that an automatic rebuild can occur if a drive module fails on anothe...
Page 106
106 r/evolution 2000 series troubleshooting guide • may 2008 4. Replace the failed module by following the instructions in “removing a drive module” on page 106. You can also use the cli show enclosure-status command. If the drive status is “absent” the drive might have failed, or it has been remove...
Page 107
Chapter 7 troubleshooting and replacing frus 107 4. Wait 20 seconds for the internal disks to stop spinning. 5. Pull the drive module out of the enclosure. Installing a drive module to install the a drive module, perform the following steps: 1. Follow all static electricity precautions as described ...
Page 108
108 r/evolution 2000 series troubleshooting guide • may 2008 table 7-5 disk drive status status action online the vdisk is online and does not have fault tolerant attributes. None fault tolerant the vdisk is online and fault tolerant. None offline the vdisk is offline either because of initializatio...
Page 109
Chapter 7 troubleshooting and replacing frus 109 6. After replacing a failed drive, save the configuration settings as described in “saving configuration settings” on page 82. The saved configuration includes configuration information for all the drive modules in the virtual disk. When you save the ...
Page 110
110 r/evolution 2000 series troubleshooting guide • may 2008 c. Data hosts last (if they had been powered down for maintenance purposes) 2. In raidar, select monitor > status > vdisk status to display the virtual disk overview panel. This panel displays an icon for each virtual disk with information...
Page 111
Chapter 7 troubleshooting and replacing frus 111 failover causes a virtual disk to become critical when one of its drives “disappears.” • in general, controller failover is not supported if a disk drive is in a drive enclosure that is connected with only one cable to the controller enclosure. This i...
Page 112
112 r/evolution 2000 series troubleshooting guide • may 2008 clearing metadata from a disk drive all of the member disk drives in a virtual disk contain metadata in the first sectors. The storage system uses the metadata to identify virtual disk members after restarting or replacing enclosures. Clea...
Page 113
Chapter 7 troubleshooting and replacing frus 113 caution – because removing the power-and-cooling module significantly disrupts the enclosure’s airflow, do not remove the power-and-cooling module until you have the replacement module. Table 7-7 lists possible power-and-cooling module faults. Table 7...
Page 114
114 r/evolution 2000 series troubleshooting guide • may 2008 removing and replacing a power-and-cooling module a single power-and-cooling module is sufficient to maintain operation of the enclosure. It is not necessary to halt operations and completely power off the enclosure when replacing only one...
Page 115
Chapter 7 troubleshooting and replacing frus 115 installing a power-and-cooling module to install a power-and-cooling module, perform the following steps: 1. Slide the module into the slot as far as it will go. 2. Press the latch upward to engage the module; turn the thumbscrews finger-tight. 3. Rec...
Page 116
116 r/evolution 2000 series troubleshooting guide • may 2008 replacing an enclosure the enclosure consists of an enclosure’s metal housing and the midplane that connects controller/expansion modules, drive modules, and power-and-cooling modules. This fru replaces an enclosure that has been damaged o...
Page 117
117 a p p e n d i x a troubleshooting using the cli this appendix briefly describes cli commands that are useful for troubleshooting storage system problems. For detailed information about command syntax and using the cli, see the cli reference guide. Topics covered in this appendix include: ■ “view...
Page 118
118 r/evolution 2000 series troubleshooting guide • may 2008 viewing command help to view brief descriptions of all commands that are available to the user level you logged in as, type: to view help for a specific command, type either: to view information about the syntax to use for specifying disk ...
Page 119
Appendix a troubleshooting using the cli 119 ping tests communication with a remote host. The remote host is specified by ip address. Ping sends icmp echo response packets and waits for replies. For details about using ping , see the cli reference guide. Rescan when installing a system with drive en...
Page 120
120 r/evolution 2000 series troubleshooting guide • may 2008 note – if the storage system is connected to a microsoft windows host, the following event is recorded in the windows event log: initiator failed to connect to the target. For details about using restart , see the cli reference guide. Rest...
Page 121
Appendix a troubleshooting using the cli 121 set expander-fault-isolation when fault isolation is enabled, the expander controller will isolate phys that fail to meet certain criteria. When fault isolation is disabled, the errors are noted in the logs but the phys are not isolated. For details about...
Page 122
122 r/evolution 2000 series troubleshooting guide • may 2008 show debug-log note – this command should only be used by service technicians, or with the advice of a service technician. Shows the debug logs for the storage controller (sc), the management controller (mc), the semaphore trace, task logs...
Page 123
Appendix a troubleshooting using the cli 123 show events shows events for an enclosure, including events from each management controller and each storage controller. A separate set of event numbers is maintained for each controller module. Each event number is prefixed with a letter identifying the ...
Page 124
124 r/evolution 2000 series troubleshooting guide • may 2008 show redundancy-mode shows the redundancy status of the system. For details about using show redundancy-mode , see the cli reference guide. Trust enables an offline virtual disk to be brought online for emergency data collection only. It m...
Page 125
Appendix a troubleshooting using the cli 125 problems scheduling tasks there are two parts to scheduling tasks: you must create the task and then create the schedule to run the task. Create the task there are three tasks you can create: takesnapshot , resetsnapshot , and volumecopy . Perform the ope...
Page 126
126 r/evolution 2000 series troubleshooting guide • may 2008 errors associated with scheduling tasks the following table describes error messages associated with scheduling tasks. Missing parameter data error if you try to use a command that has a name parameter and the cli displays “error: the comm...
Page 127: Index
127 index a air management module, installing , 110 architecture, system overview , 11 b bad block list size, displaying , 42 reassignments, displaying , 42 boot handshake , 89 c cables identifying faults drive enclosure side , 95 host side , 95 cache clearing , 118 size , 82 cli help, view command ...
Page 128
128 r/evolution 2000 series troubleshooting guide • may 2008 no response count , 41 non-media errors , 42 reviewing error statistics , 41 capturing trend data , 42 spin-up retires , 41 understanding errors , 96 updating firmware , 101 disk drives, scan for changes , 63, 90 disk error stats , 41 driv...
Page 129
Index 129 i i/o checking status , 38 displaying timeout count , 41 icons, system status , 37 informational events , 65 enabling , 65 selecting to monitor , 61 installing air management modules , 110 controller modules , 87 drive modules , 107 expansion modules , 87 power-and-cooling modules , 115 in...
Page 130
130 r/evolution 2000 series troubleshooting guide • may 2008 saving log information , 70 scheduling tasks , 59 scsi enclosure services. See ses sensors cooling fan , 74 locating , 74 power supply , 74 temperature , 75 voltage , 77 ses displaying firmware version , 47 setting the time , 89 shutting d...