Incident #10306
closed
mongo3-p-d4s.d4science.org went out of memory more than once
100%
Description
I see that it was OOMed, and restarted after that. There are nagios alerts about the sync lag, that follow the restarts.
Related issues
Updated by Andrea Dell'Amico almost 8 years ago
- Related to Task #10279: Large file upload limit on the workspace added
Updated by Andrea Dell'Amico almost 8 years ago
- Priority changed from Normal to Urgent
Updated by Roberto Cirillo almost 8 years ago
- Status changed from New to In Progress
Updated by Roberto Cirillo almost 8 years ago
- Status changed from In Progress to Feedback
- Assignee changed from Roberto Cirillo to _InfraScience Systems Engineer
It cannot be the cause of the issue #10279 since it is a Secondary member and the writing are done on the primary member. But, of course, the high workload in these days could be the cause of the high lag on that node. The high lag was happened only on this node and not on mongo4 and mongo5 (both secondaries nodes) as I would have expected. This maybe is due to the fact that this node has a disk storage more slow compared to the other two secondaries nodes. Is it could be possible?
Anyway for preventing this kind of problem we should increase the HW resource of cluster nodes. Now we have 3 GB RAM and 4 CPU. I think we should increase the RAM to 4 GB on every node.
Updated by Andrea Dell'Amico almost 8 years ago
Roberto Cirillo wrote:
It cannot be the cause of the issue #10279 since it is a Secondary member and the writing are done on the primary member. But, of course, the high workload in these days could be the cause of the high lag on that node. The high lag was happened only on this node and not on mongo4 and mongo5 (both secondaries nodes) as I would have expected. This maybe is due to the fact that this node has a disk storage more slow compared to the other two secondaries nodes. Is it could be possible?
The lags followed the crash and the restart, so I linked them to the fact that it restarted. I did not check if the mongo server also crashed on mongo4 and mongo5.
Anyway for preventing this kind of problem we should increase the HW resource of cluster nodes. Now we have 3 GB RAM and 4 CPU. I think we should increase the RAM to 4 GB on every node.
It's doable. Both Tommaso and me will be back in office on Thursday, but we can manage to do it anyway. Not today, but tomorrow morning maybe?
Updated by Roberto Cirillo almost 8 years ago
Andrea Dell'Amico wrote:
Roberto Cirillo wrote:
It cannot be the cause of the issue #10279 since it is a Secondary member and the writing are done on the primary member. But, of course, the high workload in these days could be the cause of the high lag on that node. The high lag was happened only on this node and not on mongo4 and mongo5 (both secondaries nodes) as I would have expected. This maybe is due to the fact that this node has a disk storage more slow compared to the other two secondaries nodes. Is it could be possible?
The lags followed the crash and the restart, so I linked them to the fact that it restarted. I did not check if the mongo server also crashed on mongo4 and mongo5.
Anyway for preventing this kind of problem we should increase the HW resource of cluster nodes. Now we have 3 GB RAM and 4 CPU. I think we should increase the RAM to 4 GB on every node.
It's doable. Both Tommaso and me will be back in office on Thursday, but we can manage to do it anyway. Not today, but tomorrow morning maybe?
it's ok for me. If it's needed only a VM restart you can do it at any time, if it's needed more time, it's better to do this operation after 6:00 pm when the worload decreases.
Updated by Andrea Dell'Amico almost 8 years ago
Roberto Cirillo wrote:
it's ok for me. If it's needed only a VM restart you can do it at any time, if it's needed more time, it's better to do this operation after 6:00 pm when the worload decreases.
Only a restart is needed.
Updated by Tommaso Piccioli almost 8 years ago
I'm going to restart mongo3 with 4GB of ram just now
Updated by Tommaso Piccioli almost 8 years ago
- Status changed from Feedback to In Progress
- % Done changed from 0 to 20
Done for mongo3-p-d4s
Updated by Roberto Cirillo almost 8 years ago
- Status changed from In Progress to Closed
- % Done changed from 20 to 100
Thank you @tommaso.piccioli@isti.cnr.it I'm going to close this ticket.
Updated by Andrea Dell'Amico almost 8 years ago
- Status changed from Closed to In Progress
- % Done changed from 100 to 20
I'm reopening it. Let's increase the RAM amount on all the other nodes so that they are identical.
Updated by Roberto Cirillo almost 8 years ago
I think the problem is on mongo3 node. This node has the same configuration and the same workload of mongo2 but the load on mongo3 is very high while the load on mongo2 is very low.
I think this depends on the physical machine when mongo3 is running. If this is the case, I think we should transfer mongo3 on another machine.
Updated by Tommaso Piccioli almost 8 years ago
last hour news: there are connections to mongod on mongo3-p-d4s.d4science.org from the social-isti portal.
portal-si.isti.cnr.it shows established connection on port 27017 to all social-isti mongod nodes but mongoR4-si.isti.cnr.it and there are instead connections to mongo3-p-d4s
Updated by Tommaso Piccioli almost 8 years ago
Tommaso Piccioli wrote:
portal-si.isti.cnr.it shows established connection on port 27017 to all social-isti mongod nodes but mongoR4-si.isti.cnr.it and there are instead connections to mongo3-p-d4s
Could someone fix this and restart the service to stop these connections?
It is the jackrabbit tomcat app on portal-si.isti.cnr.it
Updated by Roberto Cirillo almost 8 years ago
Tommaso Piccioli wrote:
Tommaso Piccioli wrote:
portal-si.isti.cnr.it shows established connection on port 27017 to all social-isti mongod nodes but mongoR4-si.isti.cnr.it and there are instead connections to mongo3-p-d4s
Could someone fix this and restart the service to stop these connections?
It is the jackrabbit tomcat app on portal-si.isti.cnr.it
I've restarted the service. @tommaso.piccioli@isti.cnr.it, please, could you check if there are other connections from portal-si to mongo3?
Updated by Tommaso Piccioli almost 8 years ago
- Status changed from In Progress to Closed
- % Done changed from 20 to 100
no more connections from portal-si to mongo3.
mongo2-p-d4s and mongo4-p-d4s restarted with 4 GB of ram.