Project

General

Profile

Actions

Task #10662

closed

orientdb01-d4s.d4science.org keeps crashing badly

Added by Andrea Dell'Amico over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
Category:
Application
Target version:
Start date:
Dec 12, 2017
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

It stops in state that it's impossible for a nagios handler to restart it: the process must be brutally killed. The server runs for 6 days at most and then stops responding, always with the same error:

Error during WAL background flush
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
Error during WAL background flush
java.lang.OutOfMemoryError: Java heap space
Error during fuzzy checkpoint
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded

Can you investigate what's needed? More heap? a different GC configuration? a newer version? I don't want to spend time killing and restarting the process every few days.

Actions #2

Updated by Andrea Dell'Amico over 7 years ago

So they say that we need more heap and more memory. The tuning doc seems to have some good advice, you should try something with the dev instances maybe?

Actions #3

Updated by Luca Frosini over 7 years ago

Andrea Dell'Amico wrote:

So they say that we need more heap and more memory.

Yes, please provide 4G if possible

The tuning doc seems to have some good advice, you should try something with the dev instances maybe?

????

Actions #4

Updated by Andrea Dell'Amico over 7 years ago

  • % Done changed from 0 to 30

I just increased the RAM of the three orientdb production servers to 6GB each. You can comfortably add another GB of heap on each server, and maybe play with the disk buffer parameters to relieve pressure from the memory.

Actions #5

Updated by Luca Frosini over 7 years ago

  • Assignee changed from Luca Frosini to _InfraScience Systems Engineer

Launching the playbook startService script is overwritten.
This causes problems because instance 2 and 3 must be down.
Can you please fix it?

Actions #6

Updated by Andrea Dell'Amico over 7 years ago

@luca.frosini@isti.cnr.it just remove the two nodes from the inventory file. When we will be able to use them we'll add them again.

Actions #7

Updated by Andrea Dell'Amico over 7 years ago

  • Assignee changed from _InfraScience Systems Engineer to Luca Frosini
Actions #8

Updated by Luca Frosini over 7 years ago

thank you. I'm testing version 2.2.30 in dev and I hope to add all of them again soon

Actions #9

Updated by Luca Frosini over 7 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 30 to 100

For the moment the solution was adding -Dstorage.diskCache.bufferSize=7200

So we have:
orientdb_java_heap: '-Xms3072m -Xmx3072m -Dstorage.diskCache.bufferSize=7200'

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)