Incident #10910
closedThe accounting fallback logs are killing a lot of services
100%
Description
Here is a probably incomplete list (not all servers are monitored by nagios):
workspace-repository-prod1.d4science.org dataminer0-proto.d4science.org dataminer1-proto.d4science.org dataminer2-proto.d4science.org dataminer3-proto.d4science.org dataminer4-proto.d4science.org dataminer5-proto.d4science.org dataminer1-p-d4s.d4science.org dataminer2-p-d4s.d4science.org dataminer3-p-d4s.d4science.org geoserver-protectedareaimpactmaps.d4science.org geoserver1-protectedareaimpactmaps.d4science.org geoserver2-protectedareaimpactmaps.d4science.org thredds.d4science.org
The not working workspace is causing the failure of a lot of dataminer jobs.
Files
Related issues
Updated by Andrea Dell'Amico over 7 years ago
- Related to Incident #10895: accounting bloat on workspace-repository-prod1.d4science.org - AGAIN added
Updated by Andrea Dell'Amico over 7 years ago
- Related to Incident #10701: accounting bloat on workspace-repository-prod1.d4science.org added
Updated by Andrea Dell'Amico over 7 years ago
- Related to Incident #10651: dataminer3-p-d4s.d4science.org filled the disk added
Updated by Andrea Dell'Amico over 7 years ago
- Blocks Incident #10909: Regular Failure of Dataminer "Garr" (2 out of 3 execution attempts) - Internal Server Error added
Updated by Andrea Dell'Amico over 7 years ago
- Status changed from New to In Progress
Updated by Andrea Dell'Amico over 7 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
I stopped/cleaned up/restarted the services on all the above hosts. Things seems back to normal, a fallback
accounting file is appeared on the workspace but that one seems under control for now:
$ ls -l SmartGears/state/ total 72 -rw-r--r-- 1 gcube gcube 9375 Jan 13 15:24 _d4science.research-infrastructures.eu_SmartArea_SmartCamera.fallback.log
Updated by Andrea Dell'Amico over 7 years ago
- Status changed from Closed to In Progress
- % Done changed from 100 to 90
The disk on workspace-repository-prod1.d4science.org was full again. cleaned and restarted
Updated by Andrea Dell'Amico over 7 years ago
- File fix-accounting-crap.yml fix-accounting-crap.yml added
It happened again and I guess it will happen again in the next hours. I'm attaching a playbook that clean up the involved hosts. It have to be run from inside d4science-ghn-cluster
this way:
./run.sh fix-accounting-crap.yml -i inventory/hosts.production
Who does only have access as gcube
user can change the remote_user
directive into remote_user: gcube
and comment out the become
and become_user
occurrences.
Updated by Roberto Cirillo over 7 years ago
- Status changed from In Progress to Closed
- % Done changed from 90 to 100
I've upgraded the accounting libraries on "workspace-repository-prod1" as suggested by @luca.frosini@isti.cnr.it :
accounting-lib-3.2.0-4.10.0-162088.jar document-store-lib-2.2.0-4.10.0-162084.jar