Incident #10651
closed
dataminer3-p-d4s.d4science.org filled the disk
100%
Description
Under Smartgears/state there are 17GB worth of files:
-rw-r--r-- 1 gcube gcube 0 Dec 11 19:34 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log -rw-r--r-- 1 gcube gcube 2.5G Dec 11 15:37 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003048956 -rw-r--r-- 1 gcube gcube 4.7G Dec 11 15:44 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003442540 -rw-r--r-- 1 gcube gcube 587M Dec 11 15:44 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003648955 -rw-r--r-- 1 gcube gcube 1.1G Dec 11 15:55 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513004248955 -rw-r--r-- 1 gcube gcube 1005M Dec 11 16:06 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513004848955 -rw-r--r-- 1 gcube gcube 91M Dec 11 18:15 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012543784 -rw-r--r-- 1 gcube gcube 71M Dec 11 18:15 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012550397 -rw-r--r-- 1 gcube gcube 132M Dec 11 18:16 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012560579 -rw-r--r-- 1 gcube gcube 129M Dec 11 18:16 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012571159 -rw-r--r-- 1 gcube gcube 64M Dec 11 18:16 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012576254 -rw-r--r-- 1 gcube gcube 606M Dec 11 18:17 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012624666 -rw-r--r-- 1 gcube gcube 213M Dec 11 18:17 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012648955 -rw-r--r-- 1 gcube gcube 50K Dec 11 18:26 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513013248955 -rw-r--r-- 1 gcube gcube 31K Dec 11 18:36 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513013848955 -rw-r--r-- 1 gcube gcube 1.4M Dec 11 18:56 _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513015049010
(they are sparce files, so they are bigger than that:
du -hs * 4.0G _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003048956 8.0G _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003442540 1.0G _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513003648955 1.3G _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513004248955 1.0G _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513004848955 128M _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012543784 128M _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012550397 256M _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012560579 256M _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012571159 64M _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012576254 609M _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012624666 215M _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513012648955 64K _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513013248955 64K _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513013848955 1.4M _d4science.research-infrastructures.eu_ParthenosVO_PARTHENOS_LAB.fallback.log.ELABORATION.1513015049010
I removed 10GB of log files in the evening, but they have been filled in a couple of hours.
Related issues
Updated by Roberto Cirillo over 7 years ago
- Related to Task #10654: Fallback files not managed correctly on dataminer3 added
Updated by Roberto Cirillo over 7 years ago
- Assignee changed from Roberto Cirillo to Luca Frosini
I'm going to assign this incident to @luca.frosini@isti.cnr.it I was already open the same issue here #10654.
I've notice the problem was related only on PARTHENOS_LAB scope. Now the cluster has been removed from that scope so I think the situation is going to be under control.
We should to understand way there are all that files in ELABORATION state and why that file are present only on dataminer3 and not on dataminer4, dataminer5
Updated by Luca Frosini over 7 years ago
- Related to deleted (Task #10654: Fallback files not managed correctly on dataminer3)
Updated by Luca Frosini over 7 years ago
- Is duplicate of Task #10654: Fallback files not managed correctly on dataminer3 added
Updated by Luca Frosini over 7 years ago
- Status changed from New to In Progress
I made some random checks on the records in such a files. The checked records were accounted successfully on couchbase db. IHMO you can delete the files.
Updated by Roberto Cirillo over 7 years ago
I think it's better to understand why there are all that elaboration files stored in a single day.
Yesterday there were "only" 152 calls on dataminer3.
I've checked the following id extracted from a random fallback elaboration file. This id is correctly stored on couchbase but it is still present on elaboration files. In addition this id has 1716944 entries in the elaboration files stored yesterday on dataminer3.
So I think this id (and maybe many others) is going in a loop. @luca.frosini@isti.cnr.it is it possible?
Updated by Luca Frosini over 7 years ago
- Status changed from In Progress to Rejected
Updated by Luca Frosini over 7 years ago
- Status changed from Rejected to In Progress
Updated by Luca Frosini over 7 years ago
I have copied all fallback files and accounting.log files locally to investigate the issue.
You can remove the files if you want.
Updated by Luca Frosini over 7 years ago
- Status changed from In Progress to Feedback
- Assignee changed from Luca Frosini to _InfraScience Systems Engineer
Updated by Roberto Cirillo over 7 years ago
I've removed all fallback files and accounting logs from dataminer3-p-d4s
Updated by Pasquale Pagano over 7 years ago
- Status changed from Feedback to Closed
Updated by Luca Frosini over 7 years ago
- Is duplicate of Incident #10701: accounting bloat on workspace-repository-prod1.d4science.org added
Updated by Andrea Dell'Amico over 7 years ago
- Related to Incident #10910: The accounting fallback logs are killing a lot of services added