Project

General

Profile

Actions

Incident #11870

closed

Two huge problems on the oVirt cluster

Added by Andrea Dell'Amico about 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Immediate
Category:
System Application
Start date:
Jun 01, 2018
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Development, Pre-Production, Production

Description

Yesterday, a node upgrade + restart crashed the gluster file system. It happened because even if the restarted node did not have any non synchronized bricks, some other nodes had. oVirt does not alert in that situation, while it stops the procedure if the to-be-restarted node has not sync bricks itself.

The gluster failure caused the shutdown of all the VMs configured on oVirt: the DNS resolver, the authoritative DNS server, the SMTP relay, the VPN gateways.


Related issues

Related to D4Science Infrastructure - Task #11873: Fix the networking bug introduced by cloud-init on Ubuntu 16.04 oVirt guestsClosed_InfraScience Systems EngineerJun 04, 2018

Actions
Actions #1

Updated by Andrea Dell'Amico about 7 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

I managed to fix the gluster file system explicitly starting again the volumes. After that, the VMs did not start correctly. As I was in a rush, I rebuild from scratch the most important VMs.
Then, today, I investigated what's happened: there's a bug in the cloud-init service, that resets the interface configuration to DHCP. As we do not have a password to access from the console, the only way to fix the problem is to cold mount the VM disks and fix both the network configuration and cloud-init.

Now all the VMs are operative again. I wrote all the troubleshooting steps needed to restart glusgter, here: https://support.d4science.org/projects/aginfraplut/wiki/Gluster_management and the ones needed to fix the VMs here: https://support.d4science.org/projects/aginfraplut/wiki/Virtual_Machines_Management

I still have to add a task to the base playbook to fix the cloud-init behaviour on the newly created VMs.

Actions #2

Updated by Andrea Dell'Amico about 7 years ago

  • Related to Task #11873: Fix the networking bug introduced by cloud-init on Ubuntu 16.04 oVirt guests added
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)