Actions
Support #12737
closed
Social-ISTI- mongoR4-si heartbeat fails from mongoR2-si
Status:
Closed
Priority:
High
Assignee:
_InfraScience Systems Engineer
Category:
System Application
Start date:
Oct 19, 2018
Due date:
% Done:
100%
Estimated time:
Infrastructure:
Production
Description
The mongodb server mongoR4-si.isti.cnr.it is often restarted because of communication problem with the primary node: mongoR2-si.isti.cnr.it.
The error reported in the mongoR4 log is the following:
Wed Oct 17 11:48:02.981 [conn6735] end connection 146.48.85.144:36635 (3 connections now open) Wed Oct 17 11:48:03.110 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongoR2-si.isti.cnr.it:27017 Wed Oct 17 11:48:03.121 [signalProcessingThread] shutdown: closing all files... Wed Oct 17 11:48:03.136 [signalProcessingThread] closeAllFiles() finished Wed Oct 17 11:48:03.136 [signalProcessingThread] journalCleanup... Wed Oct 17 11:48:03.136 [signalProcessingThread] removeJournalFiles Wed Oct 17 11:48:03.236 [signalProcessingThread] shutdown: removing fs lock... Wed Oct 17 11:48:03.236 dbexit: really exiting now ***** SERVER RESTARTED *****
while the error reported on mongoR2-si.isti.cnr.it is the following:
Wed Oct 17 11:48:04.917 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying Wed Oct 17 11:48:04.919 [rsHealthPoll] replSet info mongoR4-si.isti.cnr.it:27017 is down (or slow to respond): Wed Oct 17 11:48:04.919 [rsHealthPoll] replSet member mongoR4-si.isti.cnr.it:27017 is now in state DOWN Wed Oct 17 11:48:06.069 [conn16451613] end connection 146.48.85.140:50491 (7 connections now open) Wed Oct 17 11:48:06.922 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying Wed Oct 17 11:48:08.924 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying Wed Oct 17 11:48:10.925 [rsHealthPoll] replset info mongoR4-si.isti.cnr.it:27017 heartbeat failed, retrying
I've notice in the recent past, other problems between the mongoR4-si.isti.cnr.it and the primary node that was mongoR3 and mongoR4.
I've verified this kind of problem only on mongoR4 server: mongoR3, mongoR4 were never restarted in the last months while mongoR4 is restarted more than one time as day.
Actions