Project

General

Profile

Actions

Support #12737

closed

Social-ISTI- mongoR4-si heartbeat fails from mongoR2-si

Added by Roberto Cirillo over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
High
Assignee:
_InfraScience Systems Engineer
Category:
System Application
Start date:
Oct 19, 2018
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

The mongodb server mongoR4-si.isti.cnr.it is often restarted because of communication problem with the primary node: mongoR2-si.isti.cnr.it.

The error reported in the mongoR4 log is the following:

Wed Oct 17 11:48:02.981 [conn6735] end connection 146.48.85.144:36635 (3 connections now open)
Wed Oct 17 11:48:03.110 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017
Wed Oct 17 11:48:03.121 [signalProcessingThread] shutdown: closing all files...
Wed Oct 17 11:48:03.136 [signalProcessingThread] closeAllFiles() finished
Wed Oct 17 11:48:03.136 [signalProcessingThread] journalCleanup...
Wed Oct 17 11:48:03.136 [signalProcessingThread] removeJournalFiles
Wed Oct 17 11:48:03.236 [signalProcessingThread] shutdown: removing fs lock...
Wed Oct 17 11:48:03.236 dbexit: really exiting now


***** SERVER RESTARTED ***** 

while the error reported on mongoR2-si.isti.cnr.it is the following:

Wed Oct 17 11:48:04.917 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying
Wed Oct 17 11:48:04.919 [rsHealthPoll] replSet info mongoR4-si.isti.cnr.it:27017 is down (or slow to respond):
Wed Oct 17 11:48:04.919 [rsHealthPoll] replSet member mongoR4-si.isti.cnr.it:27017 is now in state DOWN
Wed Oct 17 11:48:06.069 [conn16451613] end connection 146.48.85.140:50491 (7 connections now open)
Wed Oct 17 11:48:06.922 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying
Wed Oct 17 11:48:08.924 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying
Wed Oct 17 11:48:10.925 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying 

I've notice in the recent past, other problems between the mongoR4-si.isti.cnr.it and the primary node that was mongoR3 and mongoR4.
I've verified this kind of problem only on mongoR4 server: mongoR3, mongoR4 were never restarted in the last months while mongoR4 is restarted more than one time as day.

Actions #1

Updated by Andrea Dell'Amico over 6 years ago

  • Status changed from New to In Progress

We had no new alerts since when @tommaso.piccioli@isti.cnr.it disabled ipv6 on the hypervisor that had it enabled, am I correct?

Actions #2

Updated by Roberto Cirillo over 6 years ago

It's not correct I guess.
Latest mongo restart has happened yesterday as reported in the mongo log:

Tue Nov  6 22:52:03.714 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017
Tue Nov  6 22:52:03.754 [signalProcessingThread] shutdown: closing all files...
Tue Nov  6 22:52:03.770 [signalProcessingThread] closeAllFiles() finished
Tue Nov  6 22:52:03.770 [signalProcessingThread] journalCleanup...
Tue Nov  6 22:52:03.770 [signalProcessingThread] removeJournalFiles
Tue Nov  6 22:52:03.856 [signalProcessingThread] shutdown: removing fs lock...
Tue Nov  6 22:52:03.856 dbexit: really exiting now


***** SERVER RESTARTED *****

So I think the problem is still there.

I've also notice another interesting thing:the previous restarts has happened more or less at the same time:

Mon Nov  5 22:52:03.307 [conn13216] end connection 146.48.85.116:35020 (3 connections now open)
Mon Nov  5 22:52:03.435 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017
Mon Nov  5 22:52:03.441 [signalProcessingThread] shutdown: closing all files...
Mon Nov  5 22:52:03.456 [signalProcessingThread] closeAllFiles() finished
Mon Nov  5 22:52:03.456 [signalProcessingThread] journalCleanup...
Mon Nov  5 22:52:03.456 [signalProcessingThread] removeJournalFiles
Mon Nov  5 22:52:03.560 [signalProcessingThread] shutdown: removing fs lock...
Mon Nov  5 22:52:03.561 dbexit: really exiting now


***** SERVER RESTARTED *****

Sun Nov  4 22:52:10.454 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017
Sun Nov  4 22:52:10.454 [conn11245] end connection 146.48.85.145:37590 (0 connections now open)
Sun Nov  4 22:52:10.506 [signalProcessingThread] shutdown: closing all files...
Sun Nov  4 22:52:10.506 [signalProcessingThread] closeAllFiles() finished
Sun Nov  4 22:52:10.506 [signalProcessingThread] journalCleanup...
Sun Nov  4 22:52:10.506 [signalProcessingThread] removeJournalFiles
Sun Nov  4 22:52:10.608 [signalProcessingThread] shutdown: removing fs lock...
Sun Nov  4 22:52:10.608 dbexit: really exiting now


***** SERVER RESTARTED *****


Sat Nov  3 22:51:02.492 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017
Sat Nov  3 22:51:02.497 [signalProcessingThread] shutdown: closing all files...
Sat Nov  3 22:51:02.498 [signalProcessingThread] closeAllFiles() finished
Sat Nov  3 22:51:02.498 [signalProcessingThread] journalCleanup...
Sat Nov  3 22:51:02.498 [signalProcessingThread] removeJournalFiles
Sat Nov  3 22:51:02.611 [signalProcessingThread] shutdown: removing fs lock...
Sat Nov  3 22:51:02.611 dbexit: really exiting now


***** SERVER RESTARTED *****



Fri Nov  2 22:52:03.272 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017
Fri Nov  2 22:52:03.283 [signalProcessingThread] shutdown: closing all files...
Fri Nov  2 22:52:03.298 [signalProcessingThread] closeAllFiles() finished
Fri Nov  2 22:52:03.298 [signalProcessingThread] journalCleanup...
Fri Nov  2 22:52:03.298 [signalProcessingThread] removeJournalFiles
Fri Nov  2 22:52:03.394 [signalProcessingThread] shutdown: removing fs lock...
Fri Nov  2 22:52:03.394 dbexit: really exiting now


***** SERVER RESTARTED *****


Thu Nov  1 22:52:03.326 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017
Thu Nov  1 22:52:03.336 [signalProcessingThread] shutdown: closing all files...
Thu Nov  1 22:52:03.351 [signalProcessingThread] closeAllFiles() finished
Thu Nov  1 22:52:03.351 [signalProcessingThread] journalCleanup...
Thu Nov  1 22:52:03.351 [signalProcessingThread] removeJournalFiles
Thu Nov  1 22:52:03.446 [signalProcessingThread] shutdown: removing fs lock...
Thu Nov  1 22:52:03.446 dbexit: really exiting now


***** SERVER RESTARTED *****

What's going to happen on this machine at 22:52 ?

Actions #3

Updated by Andrea Dell'Amico over 6 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 100

As it has been verified that the restarts are caused by the backup system and it's the way the procedure is supposed to work, I guess that this ticket can be closed?

Actions #4

Updated by Roberto Cirillo over 6 years ago

  • Status changed from Feedback to Closed
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)