Task #10662
closedorientdb01-d4s.d4science.org keeps crashing badly
100%
Description
It stops in state that it's impossible for a nagios handler to restart it: the process must be brutally killed. The server runs for 6 days at most and then stops responding, always with the same error:
Error during WAL background flush java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded Error during WAL background flush java.lang.OutOfMemoryError: Java heap space Error during fuzzy checkpoint java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded
Can you investigate what's needed? More heap? a different GC configuration? a newer version? I don't want to spend time killing and restarting the process every few days.
Updated by Luca Frosini over 7 years ago
- Status changed from New to In Progress
Searching the error I found this solution:
Moreover I found:
https://stackoverflow.com/questions/40013369/orientdb-java-heap-error/
The answer of Oleksandr Gubchenko point to this:
http://orientdb.com/docs/last/Performance-Tuning.html
Updated by Andrea Dell'Amico over 7 years ago
So they say that we need more heap and more memory. The tuning doc seems to have some good advice, you should try something with the dev instances maybe?
Updated by Luca Frosini over 7 years ago
Andrea Dell'Amico wrote:
So they say that we need more heap and more memory.
Yes, please provide 4G if possible
The tuning doc seems to have some good advice, you should try something with the dev instances maybe?
????
Updated by Andrea Dell'Amico over 7 years ago
- % Done changed from 0 to 30
I just increased the RAM of the three orientdb production servers to 6GB each. You can comfortably add another GB of heap on each server, and maybe play with the disk buffer parameters to relieve pressure from the memory.
Updated by Luca Frosini over 7 years ago
- Assignee changed from Luca Frosini to _InfraScience Systems Engineer
Launching the playbook startService script is overwritten.
This causes problems because instance 2 and 3 must be down.
Can you please fix it?
Updated by Andrea Dell'Amico over 7 years ago
@luca.frosini@isti.cnr.it just remove the two nodes from the inventory file. When we will be able to use them we'll add them again.
Updated by Andrea Dell'Amico over 7 years ago
- Assignee changed from _InfraScience Systems Engineer to Luca Frosini
Updated by Luca Frosini over 7 years ago
thank you. I'm testing version 2.2.30 in dev and I hope to add all of them again soon
Updated by Luca Frosini over 7 years ago
- Status changed from In Progress to Closed
- % Done changed from 30 to 100
For the moment the solution was adding -Dstorage.diskCache.bufferSize=7200
So we have:
orientdb_java_heap: '-Xms3072m -Xmx3072m -Dstorage.diskCache.bufferSize=7200'