Support #12737
closed
Social-ISTI- mongoR4-si heartbeat fails from mongoR2-si
100%
Description
The mongodb server mongoR4-si.isti.cnr.it is often restarted because of communication problem with the primary node: mongoR2-si.isti.cnr.it.
The error reported in the mongoR4 log is the following:
Wed Oct 17 11:48:02.981 [conn6735] end connection 146.48.85.144:36635 (3 connections now open) Wed Oct 17 11:48:03.110 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017 Wed Oct 17 11:48:03.121 [signalProcessingThread] shutdown: closing all files... Wed Oct 17 11:48:03.136 [signalProcessingThread] closeAllFiles() finished Wed Oct 17 11:48:03.136 [signalProcessingThread] journalCleanup... Wed Oct 17 11:48:03.136 [signalProcessingThread] removeJournalFiles Wed Oct 17 11:48:03.236 [signalProcessingThread] shutdown: removing fs lock... Wed Oct 17 11:48:03.236 dbexit: really exiting now ***** SERVER RESTARTED *****
while the error reported on mongoR2-si.isti.cnr.it is the following:
Wed Oct 17 11:48:04.917 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying Wed Oct 17 11:48:04.919 [rsHealthPoll] replSet info mongoR4-si.isti.cnr.it:27017 is down (or slow to respond): Wed Oct 17 11:48:04.919 [rsHealthPoll] replSet member mongoR4-si.isti.cnr.it:27017 is now in state DOWN Wed Oct 17 11:48:06.069 [conn16451613] end connection 146.48.85.140:50491 (7 connections now open) Wed Oct 17 11:48:06.922 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying Wed Oct 17 11:48:08.924 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying Wed Oct 17 11:48:10.925 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying
I've notice in the recent past, other problems between the mongoR4-si.isti.cnr.it and the primary node that was mongoR3 and mongoR4.
I've verified this kind of problem only on mongoR4 server: mongoR3, mongoR4 were never restarted in the last months while mongoR4 is restarted more than one time as day.
Updated by Andrea Dell'Amico over 6 years ago
- Status changed from New to In Progress
We had no new alerts since when @tommaso.piccioli@isti.cnr.it disabled ipv6 on the hypervisor that had it enabled, am I correct?
Updated by Roberto Cirillo over 6 years ago
It's not correct I guess.
Latest mongo restart has happened yesterday as reported in the mongo log:
Tue Nov 6 22:52:03.714 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017 Tue Nov 6 22:52:03.754 [signalProcessingThread] shutdown: closing all files... Tue Nov 6 22:52:03.770 [signalProcessingThread] closeAllFiles() finished Tue Nov 6 22:52:03.770 [signalProcessingThread] journalCleanup... Tue Nov 6 22:52:03.770 [signalProcessingThread] removeJournalFiles Tue Nov 6 22:52:03.856 [signalProcessingThread] shutdown: removing fs lock... Tue Nov 6 22:52:03.856 dbexit: really exiting now ***** SERVER RESTARTED *****
So I think the problem is still there.
I've also notice another interesting thing:the previous restarts has happened more or less at the same time:
Mon Nov 5 22:52:03.307 [conn13216] end connection 146.48.85.116:35020 (3 connections now open) Mon Nov 5 22:52:03.435 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017 Mon Nov 5 22:52:03.441 [signalProcessingThread] shutdown: closing all files... Mon Nov 5 22:52:03.456 [signalProcessingThread] closeAllFiles() finished Mon Nov 5 22:52:03.456 [signalProcessingThread] journalCleanup... Mon Nov 5 22:52:03.456 [signalProcessingThread] removeJournalFiles Mon Nov 5 22:52:03.560 [signalProcessingThread] shutdown: removing fs lock... Mon Nov 5 22:52:03.561 dbexit: really exiting now ***** SERVER RESTARTED *****
Sun Nov 4 22:52:10.454 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017 Sun Nov 4 22:52:10.454 [conn11245] end connection 146.48.85.145:37590 (0 connections now open) Sun Nov 4 22:52:10.506 [signalProcessingThread] shutdown: closing all files... Sun Nov 4 22:52:10.506 [signalProcessingThread] closeAllFiles() finished Sun Nov 4 22:52:10.506 [signalProcessingThread] journalCleanup... Sun Nov 4 22:52:10.506 [signalProcessingThread] removeJournalFiles Sun Nov 4 22:52:10.608 [signalProcessingThread] shutdown: removing fs lock... Sun Nov 4 22:52:10.608 dbexit: really exiting now ***** SERVER RESTARTED *****
Sat Nov 3 22:51:02.492 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017 Sat Nov 3 22:51:02.497 [signalProcessingThread] shutdown: closing all files... Sat Nov 3 22:51:02.498 [signalProcessingThread] closeAllFiles() finished Sat Nov 3 22:51:02.498 [signalProcessingThread] journalCleanup... Sat Nov 3 22:51:02.498 [signalProcessingThread] removeJournalFiles Sat Nov 3 22:51:02.611 [signalProcessingThread] shutdown: removing fs lock... Sat Nov 3 22:51:02.611 dbexit: really exiting now ***** SERVER RESTARTED *****
Fri Nov 2 22:52:03.272 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017 Fri Nov 2 22:52:03.283 [signalProcessingThread] shutdown: closing all files... Fri Nov 2 22:52:03.298 [signalProcessingThread] closeAllFiles() finished Fri Nov 2 22:52:03.298 [signalProcessingThread] journalCleanup... Fri Nov 2 22:52:03.298 [signalProcessingThread] removeJournalFiles Fri Nov 2 22:52:03.394 [signalProcessingThread] shutdown: removing fs lock... Fri Nov 2 22:52:03.394 dbexit: really exiting now ***** SERVER RESTARTED *****
Thu Nov 1 22:52:03.326 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017 Thu Nov 1 22:52:03.336 [signalProcessingThread] shutdown: closing all files... Thu Nov 1 22:52:03.351 [signalProcessingThread] closeAllFiles() finished Thu Nov 1 22:52:03.351 [signalProcessingThread] journalCleanup... Thu Nov 1 22:52:03.351 [signalProcessingThread] removeJournalFiles Thu Nov 1 22:52:03.446 [signalProcessingThread] shutdown: removing fs lock... Thu Nov 1 22:52:03.446 dbexit: really exiting now ***** SERVER RESTARTED *****
What's going to happen on this machine at 22:52 ?
Updated by Andrea Dell'Amico over 6 years ago
- Status changed from In Progress to Feedback
- % Done changed from 0 to 100
As it has been verified that the restarts are caused by the backup system and it's the way the procedure is supposed to work, I guess that this ticket can be closed?
Updated by Roberto Cirillo over 6 years ago
- Status changed from Feedback to Closed