Task #10662
closed
orientdb01-d4s.d4science.org keeps crashing badly
Added by Andrea Dell'Amico over 7 years ago.
Updated over 7 years ago.
Infrastructure:
Production
Description
It stops in state that it's impossible for a nagios handler to restart it: the process must be brutally killed. The server runs for 6 days at most and then stops responding, always with the same error:
Error during WAL background flush
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
Error during WAL background flush
java.lang.OutOfMemoryError: Java heap space
Error during fuzzy checkpoint
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
Can you investigate what's needed? More heap? a different GC configuration? a newer version? I don't want to spend time killing and restarting the process every few days.
- Status changed from New to In Progress
So they say that we need more heap and more memory. The tuning doc seems to have some good advice, you should try something with the dev instances maybe?
Andrea Dell'Amico wrote:
So they say that we need more heap and more memory.
Yes, please provide 4G if possible
The tuning doc seems to have some good advice, you should try something with the dev instances maybe?
????
- % Done changed from 0 to 30
I just increased the RAM of the three orientdb production servers to 6GB each. You can comfortably add another GB of heap on each server, and maybe play with the disk buffer parameters to relieve pressure from the memory.
- Assignee changed from Luca Frosini to _InfraScience Systems Engineer
Launching the playbook startService script is overwritten.
This causes problems because instance 2 and 3 must be down.
Can you please fix it?
@luca.frosini@isti.cnr.it just remove the two nodes from the inventory file. When we will be able to use them we'll add them again.
- Assignee changed from _InfraScience Systems Engineer to Luca Frosini
thank you. I'm testing version 2.2.30 in dev and I hope to add all of them again soon
- Status changed from In Progress to Closed
- % Done changed from 30 to 100
For the moment the solution was adding -Dstorage.diskCache.bufferSize=7200
So we have:
orientdb_java_heap: '-Xms3072m -Xmx3072m -Dstorage.diskCache.bufferSize=7200'
Also available in: Atom
PDF