Template:Infobox bms

1 Fail scenarios[edit | edit source]

How to find failures.

Recommendation: daily do following steps on Admin PC:
Start CMC and check Status of the SAN's. Connect to iLO Server 1: Check HW status Connect to iLO Server 2: Check HW status Connect to iLO Admin Server: Check HW status


If system doesn't work as expected:

1. Step: Check the storage system
The CMC is the first instance to get information about the storage system. The status of the storage system should be checked for failures.
2. Step: If necessary check (all) server HW.
Login to server iLO port.
3. Step: Check virtualized system (hypervisor)
Start viClient and connect to vCenter. Check if server is running or not.
3. Step:
Check virtual machines.

1.1 SAN: Harddiskfailure[edit | edit source]

Symptoms: Error message shown in CMC, e.g.:

File:CMC showing PredictiveFailure.PNG
Solution: 
Don't turn off the node, just replace the failed harddisk with a new one.

1.2 SAN: Complete node failure[edit | edit source]

Symptoms: Error message shown in CMC, e.g.:

Check Error state with CMC.
Contact HP or Brueckner how to proceed.


1.3 Core switch failure[edit | edit source]

Possible symptoms: Error indicator on backside of switch.

To check: 
Start browser and connect to http://10.17.22.192 (HP-Switch 1, east) or 
http://10.17.22.191 (HP-Switch 2, west) and navigate to status.
Solution
Replace switch and restore configuration: See procedure "Restore switch configuration" [[1]]


1.4 Server failure[edit | edit source]

See [[2]]

1.5 Operator Station failure[edit | edit source]

See instruction to Backup/Restore Operator Station: Windows_7_Image.doc or in KIWI: [[3]]

Remark: Currently the image is not saved to DVD but on an USB-Drive.

1.6 Firewall failure[edit | edit source]

To troubleshooting connection problems, please check following :

•           Power LED is steady green.
•           HDD LED flashes orange some times.
•           Link LED of each used network port, especially the WAN-port is steady green.
•           On normal bootup / operation please ensure, that there is no usb-device connected to the appliance.

Test of internet connectivity :

•           Remove WAN-cable from the appliance and plug it into a notebook.
•           Temporary configure the network settings of the notebook to the same as configured in the appliance (DHCP or fixed IP)
•           Open command box. You should be able to ping ip adress 195.145.159.52 and 195.145.159.10
•           Ports 691, 692 and 693 should be open. To check enter following commands:
telnet 195.145.159.52 691 telnet 195.145.159.52 692 telnet 195.145.159.52 693 With port 691 and 692 you should get a blank screen. With port 693 you should get an errormessage after some time.
telnet 195.145.159.10 691
telnet 195.145.159.10 692
telnet 195.145.159.10 693
With port 691 you should get a blank screen.
With port 692 and 693 you should get an errormessage after some time.


Test of operating system of firewall:

•           Connect notebook with serial cable to the appliance.
•           Start Hyperterminal (you can also use newest version of PUTTY with serial support) with following settings:
- 19200 bit/s
- 8 Databits
- No Parity
- 1 stopbit
- no flowcontroll
•           Restart the appliance.
•           After approx. 1-2 min. you should see the bootup messages. Please note all errors.
•           After a few minutes you should see a login prompt like BenRou login:.


Restoration of FW

Restoration is done by inserting the FW Configuration USB-stick and booting the firewall.
Attention: This should only be done after contacting Brückner.


1.7 Admin Station Failures[edit | edit source]

To check status start browser and connect to iLO service port: http://10.30.20.172 and check "System Status":

File:AdminPC iLO Drives.PNG

2 Examples[edit | edit source]

2.1 "Degraded RAM" on server 2[edit | edit source]

2.1.1 Diagnosis[edit | edit source]

Alarm was discovered during a periodic check of the server hardware via iLO.

After connecting to iLO port of server 2: 10.30.20.171 and navigate network you saw:

File:HP S2 MemmoryDegraded.PNG



2.1.2 First Step: Replace RAM[edit | edit source]

After this HP-China service was called. They first wanted to upgrade all the firmware,
they allway want to do this. But as we didn't believe in this step so we instisted to exchange the RAM.
So they followed this request.

After restart of server 2 Connection with viClient to vCenter showed that server 2 was disconnected.

2.1.3 Further problems[edit | edit source]

Server 2 has two IP adresses: 10.30.20.102 and 10.30.20.103. Ping command executed on Admin Server in a command windows showed that these IP's where not responding.

Connection to iLO showed that contrary to Server 1 no port entries:


File:Server2 ilo Network IntegratedNicMacAddresses.PNG


For comparision server 1:


File:Server1 ilO Network IntegratedNicMacAddresses.PNG

2.1.4 Solution[edit | edit source]

In this case even an log-file output didn't help. It looked like the system just didn't see the 10GB networkcard.

The solution was to clean the server take out the card and tighten it fast afterwards.

So it seems the problem was an conntact-problem.


2.2 "Predictive Failure" of a SAN-Disk[edit | edit source]

See "SAN: Harddiskfailure" above.