Project

General

Profile

Actions

Incident #12326

closed

GARR OpenStack dashboard and CT1 regions are unreachable

Added by Andrea Dell'Amico almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
High
Assignee:
_InfraScience Systems Engineer
Category:
System Application
Target version:
Start date:
Aug 16, 2018
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Development, Pre-Production, Production

Description

Since, yesterday, all the VMs hosted at GARR CT1 region are not reachable, and so it is the OpenStack management dashboard.
I opened a ticket to the GARR support yesterday evening.

Actions #1

Updated by Andrea Dell'Amico almost 7 years ago

I just asked again.

Actions #2

Updated by Andrea Dell'Amico almost 7 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 30
Actions #4

Updated by Pasquale Pagano almost 7 years ago

With the operations performed so far, are the DM clusters working again (even with reduced capacities)?

Actions #5

Updated by Andrea Dell'Amico almost 7 years ago

Pasquale Pagano wrote:

With the operations performed so far, are the DM clusters working again (even with reduced capacities)?

The dataminer service endpoint should be changed from dataminer.garr.d4science.org to dataminer-cluster1.d4science.org, but I don't know how to do it.

Actions #6

Updated by Andrea Dell'Amico almost 7 years ago

  • % Done changed from 30 to 40
Actions #9

Updated by Andrea Dell'Amico almost 7 years ago

The GARR staff found that migrating a VM that has connection problems restores the networking

VMs that are now working:

ns-cache-ct1
ns-auth-ct1
vocbench-1
kibana-dnet
dataminer-2.1.3-4.7.0-2
dataminer-2.1.3-4.7.0-6
dataminer-2.1.3-4.7.0-4
dataminer-2.1.3-4.7.0-3

Actions #10

Updated by Andrea Dell'Amico almost 7 years ago

The situation worsened again. I asked the GARR staff for news again today, they did not answer at all yesterday.

Actions #11

Updated by Andrea Dell'Amico almost 7 years ago

Update: All the servers but the ones in error state are up and reachable again.

The up to date list of VMs in error state is:

dli-elasticsearch5-4
dli-elasticsearch5-1
rstudio-8 90.147.166.171
rstudio-9 90.147.166.173
nextcloud-test

But still no answer from the GARR staff.

Actions #12

Updated by Andrea Dell'Amico almost 7 years ago

  • % Done changed from 40 to 70
Actions #13

Updated by Alessia Bardi almost 7 years ago

virtuoso-parthenos is still not responding, even if it was not in the list of VMs in error state

Actions #14

Updated by Andrea Dell'Amico almost 7 years ago

You're right. I see from It's stuck on the early boot phase. I tried to stop/start it, but without results. I'm adding it to the list I sent to the GARR ticket system.

Actions #15

Updated by Andrea Dell'Amico almost 7 years ago

  • % Done changed from 70 to 90

The situation is back to normal but for the virtuoso instance, the GARR staff is investigating the problem.

Actions #16

Updated by Andrea Dell'Amico almost 7 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 90 to 100

The virtuoso instance is back online too, I just asked Alessia to check if it's working correctly.

Actions #17

Updated by Andrea Dell'Amico almost 7 years ago

  • Status changed from Feedback to Closed
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)