Project

General

Profile

Actions

Incident #10651

closed

dataminer3-p-d4s.d4science.org filled the disk

Added by Andrea Dell'Amico over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
_InfraScience Systems Engineer
Category:
System Application
Target version:
Start date:
Dec 11, 2017
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

Under Smartgears/state there are 17GB worth of files:

-rw-r--r-- 1 gcube gcube     0 Dec 11 19:34 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log
-rw-r--r-- 1 gcube gcube  2.5G Dec 11 15:37 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003048956
-rw-r--r-- 1 gcube gcube  4.7G Dec 11 15:44 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003442540
-rw-r--r-- 1 gcube gcube  587M Dec 11 15:44 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003648955
-rw-r--r-- 1 gcube gcube  1.1G Dec 11 15:55 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513004248955
-rw-r--r-- 1 gcube gcube 1005M Dec 11 16:06 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513004848955
-rw-r--r-- 1 gcube gcube   91M Dec 11 18:15 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012543784
-rw-r--r-- 1 gcube gcube   71M Dec 11 18:15 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012550397
-rw-r--r-- 1 gcube gcube  132M Dec 11 18:16 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012560579
-rw-r--r-- 1 gcube gcube  129M Dec 11 18:16 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012571159
-rw-r--r-- 1 gcube gcube   64M Dec 11 18:16 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012576254
-rw-r--r-- 1 gcube gcube  606M Dec 11 18:17 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012624666
-rw-r--r-- 1 gcube gcube  213M Dec 11 18:17 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012648955
-rw-r--r-- 1 gcube gcube   50K Dec 11 18:26 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513013248955
-rw-r--r-- 1 gcube gcube   31K Dec 11 18:36 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513013848955
-rw-r--r-- 1 gcube gcube  1.4M Dec 11 18:56 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513015049010

(they are sparce files, so they are bigger than that:

 du -hs *
4.0G    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003048956
8.0G    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003442540
1.0G    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003648955
1.3G    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513004248955
1.0G    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513004848955
128M    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012543784
128M    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012550397
256M    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012560579
256M    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012571159
64M _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012576254
609M    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012624666
215M    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012648955
64K _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513013248955
64K _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513013848955
1.4M    _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513015049010

I removed 10GB of log files in the evening, but they have been filled in a couple of hours.


Related issues

Related to D4Science Infrastructure - Incident #10910: The accounting fallback logs are killing a lot of servicesClosedAndrea Dell'AmicoJan 13, 2018

Actions
Is duplicate of D4Science Infrastructure - Task #10654: Fallback files not managed correctly on dataminer3RejectedLuca FrosiniDec 12, 2017

Actions
Is duplicate of D4Science Infrastructure - Incident #10701: accounting bloat on workspace-repository-prod1.d4science.orgClosedAndrea Dell'AmicoDec 14, 2017

Actions
Actions #1

Updated by Roberto Cirillo over 7 years ago

  • Related to Task #10654: Fallback files not managed correctly on dataminer3 added
Actions #2

Updated by Roberto Cirillo over 7 years ago

  • Assignee changed from Roberto Cirillo to Luca Frosini

I'm going to assign this incident to @luca.frosini@isti.cnr.it I was already open the same issue here #10654.
I've notice the problem was related only on PARTHENOS_LAB scope. Now the cluster has been removed from that scope so I think the situation is going to be under control.
We should to understand way there are all that files in ELABORATION state and why that file are present only on dataminer3 and not on dataminer4, dataminer5

Actions #3

Updated by Luca Frosini over 7 years ago

  • Related to deleted (Task #10654: Fallback files not managed correctly on dataminer3)
Actions #4

Updated by Luca Frosini over 7 years ago

  • Is duplicate of Task #10654: Fallback files not managed correctly on dataminer3 added
Actions #5

Updated by Luca Frosini over 7 years ago

  • Status changed from New to In Progress

I made some random checks on the records in such a files. The checked records were accounted successfully on couchbase db. IHMO you can delete the files.

Actions #6

Updated by Roberto Cirillo over 7 years ago

I think it's better to understand why there are all that elaboration files stored in a single day.
Yesterday there were "only" 152 calls on dataminer3.
I've checked the following id extracted from a random fallback elaboration file. This id is correctly stored on couchbase but it is still present on elaboration files. In addition this id has 1716944 entries in the elaboration files stored yesterday on dataminer3.
So I think this id (and maybe many others) is going in a loop. @luca.frosini@isti.cnr.it is it possible?

Actions #7

Updated by Luca Frosini over 7 years ago

  • Status changed from In Progress to Rejected
Actions #8

Updated by Luca Frosini over 7 years ago

  • Status changed from Rejected to In Progress
Actions #9

Updated by Luca Frosini over 7 years ago

I have copied all fallback files and accounting.log files locally to investigate the issue.
You can remove the files if you want.

Actions #10

Updated by Luca Frosini over 7 years ago

  • Status changed from In Progress to Feedback
  • Assignee changed from Luca Frosini to _InfraScience Systems Engineer
Actions #11

Updated by Roberto Cirillo over 7 years ago

I've removed all fallback files and accounting logs from dataminer3-p-d4s

Actions #12

Updated by Pasquale Pagano over 7 years ago

  • Status changed from Feedback to Closed
Actions #14

Updated by Luca Frosini over 7 years ago

  • Is duplicate of Incident #10701: accounting bloat on workspace-repository-prod1.d4science.org added
Actions #15

Updated by Andrea Dell'Amico over 7 years ago

  • Related to Incident #10910: The accounting fallback logs are killing a lot of services added
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)