Incident #12326: GARR OpenStack dashboard and CT1 regions are unreachable - D4Science Infrastructure - D4science

Actions

Copy link

Incident #12326

closed

GARR OpenStack dashboard and CT1 regions are unreachable

Added by Andrea Dell'Amico almost 8 years ago. Updated almost 8 years ago.

Status:

Closed

Priority:

High

Assignee:

_InfraScience Systems Engineer

Category:

System Application

Target version:

No Sprint

Start date:

Aug 16, 2018

Due date:

% Done:

100%

Estimated time:

Infrastructure:

Development, Pre-Production, Production

Description

Since, yesterday, all the VMs hosted at GARR CT1 region are not reachable, and so it is the OpenStack management dashboard.
I opened a ticket to the GARR support yesterday evening.

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

I just asked again.

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

Status changed from New to In Progress
% Done changed from 0 to 30

Actions

Copy link

Updated by Pasquale Pagano almost 8 years ago

With the operations performed so far, are the DM clusters working again (even with reduced capacities)?

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

Pasquale Pagano wrote:

With the operations performed so far, are the DM clusters working again (even with reduced capacities)?

The dataminer service endpoint should be changed from dataminer.garr.d4science.org to dataminer-cluster1.d4science.org, but I don't know how to do it.

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

% Done changed from 30 to 40

Actions

Copy link

Updated by Andrea Dell'Amico almost 8 years ago

The GARR staff found that migrating a VM that has connection problems restores the networking

VMs that are now working:

ns-cache-ct1
ns-auth-ct1
vocbench-1
kibana-dnet
dataminer-2.1.3-4.7.0-2
dataminer-2.1.3-4.7.0-6
dataminer-2.1.3-4.7.0-4
dataminer-2.1.3-4.7.0-3

Actions

Copy link

#10

Updated by Andrea Dell'Amico almost 8 years ago

The situation worsened again. I asked the GARR staff for news again today, they did not answer at all yesterday.

Actions

Copy link

#11

Updated by Andrea Dell'Amico almost 8 years ago

Update: All the servers but the ones in error state are up and reachable again.

The up to date list of VMs in error state is:

dli-elasticsearch5-4
dli-elasticsearch5-1
rstudio-8 90.147.166.171
rstudio-9 90.147.166.173
nextcloud-test

But still no answer from the GARR staff.

Actions

Copy link

#12

Updated by Andrea Dell'Amico almost 8 years ago

% Done changed from 40 to 70

Actions

Copy link

#13

Updated by Alessia Bardi almost 8 years ago

virtuoso-parthenos is still not responding, even if it was not in the list of VMs in error state

Actions

Copy link

#14

Updated by Andrea Dell'Amico almost 8 years ago

You're right. I see from It's stuck on the early boot phase. I tried to stop/start it, but without results. I'm adding it to the list I sent to the GARR ticket system.

Actions

Copy link

#15