Task #10729
closedBroken disk on dlib33x.dom0.research-infrastructures.eu
100%
Description
From the nagios check:
CRITICAL: mdstat:[md3(931.39 GiB raid1):F:sdg1:_U, md2(931.39 GiB raid1):UU, md1(931.15 GiB raid1):UU, md0(242.81 MiB raid1):UU]
Updated by Tommaso Piccioli over 7 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 50
From the dmesg:
[Mon Dec 18 01:57:03 2017] sd 0:0:6:0: [sdg] Device not ready
...
[Mon Dec 18 01:57:03 2017] md/raid1:md3: Disk failure on sdg1, disabling device.
...
[Mon Dec 18 01:58:23 2017] sd 0:0:6:0: Attached scsi generic sg6 type 0
[Mon Dec 18 01:58:23 2017] sd 0:0:6:0: [sdi] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
...
[Mon Dec 18 01:58:23 2017] sdi: sdi1
The disk disappeared to the OS for a minute and appeared again as new.
I added a different disk to the raid (sdf1) that is in recovery/resync now, in the meantime I will check the old sdg (now sdi).
Updated by Tommaso Piccioli over 7 years ago
From the idrac log:
2017-12-18T01:55:05-0600 PDR3
Disk 6 in Backplane 1 of Integrated RAID Controller 1 is not functioning correctly.
2017-12-18T01:55:05-0600 PDR87
Disk 6 in Backplane 1 of Integrated RAID Controller 1 was reset.
2017-12-18T01:55:05-0600 PDR5
Disk 6 in Backplane 1 of Integrated RAID Controller 1 is removed.
2017-12-18T01:55:40-0600 PDR8
Disk 6 in Backplane 1 of Integrated RAID Controller 1 is inserted.
Still checking the Disk 6 from the OS.
Updated by Andrea Dell'Amico over 7 years ago
That's worrisome, the omsa tools should have reported them to nagios. Or maybe the problem lasted too little time to be reported?
Updated by Tommaso Piccioli over 7 years ago
resync done, the disk sdi seems to be OK (2/3 tested)
Updated by Tommaso Piccioli over 7 years ago
- Status changed from In Progress to Closed
- % Done changed from 50 to 100