Project

General

Profile

Actions

Incident #10910

closed

The accounting fallback logs are killing a lot of services

Added by Andrea Dell'Amico over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Immediate
Category:
System Application
Target version:
Start date:
Jan 13, 2018
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

Here is a probably incomplete list (not all servers are monitored by nagios):

workspace-repository-prod1.d4science.org
dataminer0-proto.d4science.org
dataminer1-proto.d4science.org
dataminer2-proto.d4science.org
dataminer3-proto.d4science.org
dataminer4-proto.d4science.org
dataminer5-proto.d4science.org
dataminer1-p-d4s.d4science.org
dataminer2-p-d4s.d4science.org
dataminer3-p-d4s.d4science.org
geoserver-protectedareaimpactmaps.d4science.org
geoserver1-protectedareaimpactmaps.d4science.org
geoserver2-protectedareaimpactmaps.d4science.org
thredds.d4science.org

The not working workspace is causing the failure of a lot of dataminer jobs.


Files

fix-accounting-crap.yml (1018 Bytes) fix-accounting-crap.yml Andrea Dell'Amico, Jan 14, 2018 02:19 PM

Related issues

Related to D4Science Infrastructure - Incident #10895: accounting bloat on workspace-repository-prod1.d4science.org - AGAINClosedRoberto CirilloJan 12, 2018

Actions
Related to D4Science Infrastructure - Incident #10701: accounting bloat on workspace-repository-prod1.d4science.orgClosedAndrea Dell'AmicoDec 14, 2017

Actions
Related to D4Science Infrastructure - Incident #10651: dataminer3-p-d4s.d4science.org filled the diskClosed_InfraScience Systems EngineerDec 11, 2017

Actions
Blocks D4Science Infrastructure - Incident #10909: Regular Failure of Dataminer "Garr" (2 out of 3 execution attempts) - Internal Server ErrorClosed_InfraScience Systems EngineerJan 13, 2018

Actions
Actions #1

Updated by Andrea Dell'Amico over 7 years ago

  • Related to Incident #10895: accounting bloat on workspace-repository-prod1.d4science.org - AGAIN added
Actions #2

Updated by Andrea Dell'Amico over 7 years ago

  • Related to Incident #10701: accounting bloat on workspace-repository-prod1.d4science.org added
Actions #3

Updated by Andrea Dell'Amico over 7 years ago

  • Related to Incident #10651: dataminer3-p-d4s.d4science.org filled the disk added
Actions #4

Updated by Andrea Dell'Amico over 7 years ago

  • Blocks Incident #10909: Regular Failure of Dataminer "Garr" (2 out of 3 execution attempts) - Internal Server Error added
Actions #5

Updated by Andrea Dell'Amico over 7 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Andrea Dell'Amico over 7 years ago

  • Description updated (diff)
Actions #7

Updated by Andrea Dell'Amico over 7 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

I stopped/cleaned up/restarted the services on all the above hosts. Things seems back to normal, a fallback accounting file is appeared on the workspace but that one seems under control for now:

$ ls -l SmartGears/state/
total 72
-rw-r--r-- 1 gcube gcube 9375 Jan 13 15:24 _d4science.research-infrastructures.eu_SmartArea_SmartCamera.fallback.log
Actions #8

Updated by Andrea Dell'Amico over 7 years ago

  • Status changed from Closed to In Progress
  • % Done changed from 100 to 90

The disk on workspace-repository-prod1.d4science.org was full again. cleaned and restarted

Actions #9

Updated by Andrea Dell'Amico over 7 years ago

It happened again and I guess it will happen again in the next hours. I'm attaching a playbook that clean up the involved hosts. It have to be run from inside d4science-ghn-cluster this way:

./run.sh fix-accounting-crap.yml -i inventory/hosts.production

Who does only have access as gcube user can change the remote_user directive into remote_user: gcube and comment out the become and become_user occurrences.

Actions #10

Updated by Roberto Cirillo over 7 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 90 to 100

I've upgraded the accounting libraries on "workspace-repository-prod1" as suggested by @luca.frosini@isti.cnr.it :

accounting-lib-3.2.0-4.10.0-162088.jar
document-store-lib-2.2.0-4.10.0-162084.jar
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)