Project

General

Profile

Actions

Incident #11147

closed

The proto dataminers are running out of space

Added by Andrea Dell'Amico over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
High
Category:
Application
Target version:
Start date:
Feb 09, 2018
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

There's data under /home/gcube/tomcat/webapps/wps/ecocfg that seems left by past computations, from February 6th and 7th while the most recent process I can see started Feb 8th.

Actions #1

Updated by Andrea Dell'Amico over 7 years ago

  • Project changed from 8 to D4Science Infrastructure
  • Category changed from Default to Application
  • Priority changed from Normal to High
  • Target version changed from Unsprintable to UnSprintable
Actions #2

Updated by Gianpaolo Coro over 7 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

The problem was due to long running and memory demanding processes run by @scarponi@isti.cnr.it . It seems that many of them were oom-killed and thus the DataMiner did not delete the input files, which were huge. @roberto.cirillo@isti.cnr.it has now deleted these files to stem the problem, but we should understand how to tackle it and if this is possible from the service. I will open a separate ticket for this. Overall, the current scenario is due to the fact that Paolo's processes are at the boundaries of our resources and should be re-designed perhaps.

For sake of completeness, I report the OOM-Kill messages we found in the system logs:

Feb  6 09:17:11 dataminer0-proto logger: algorithms-updater: the add command string is ./addAlgorithm.sh ARCHNER_IT NER_ALGORITHMS org.gcube.dataanalysis.executor.rscripts.ArchnerIt /d4science.research-infrastructures.eu/gCubeApps/
RPrototypingLab transducerers Y http://data.d4science.org/c1ZaT0ZOSi8wQlI4bDRQdDFHSmdFZCtMb3YvcDA5STdHbWJQNStIS0N6Yz0 "Archaeological NER web-service for Italian text. This named entity recognition web-service will find Artefact, C
olour, Material, Period, Place, Person, Site, Technique and Timespan entities" proto/software
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258536] R invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258540] R cpuset=/ mems_allowed=0


Feb  6 09:17:11 dataminer0-proto logger: algorithms-updater: the add command string is ./addAlgorithm.sh ARCHNER_IT NER_ALGORITHMS org.gcube.dataanalysis.executor.rscripts.ArchnerIt /d4science.research-infrastructures.eu/gCubeApps/
RPrototypingLab transducerers Y http://data.d4science.org/c1ZaT0ZOSi8wQlI4bDRQdDFHSmdFZCtMb3YvcDA5STdHbWJQNStIS0N6Yz0 "Archaeological NER web-service for Italian text. This named entity recognition web-service will find Artefact, C
olour, Material, Period, Place, Person, Site, Technique and Timespan entities" proto/software
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258536] R invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258540] R cpuset=/ mems_allowed=0
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258543] CPU: 0 PID: 23779 Comm: R Tainted: GF            3.13.0-32-generic #57-Ubuntu
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258545]  0000000000000000 ffff8803aebdfa80 ffffffff8171bcb4 ffff880004a32fe0
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258548]  ffff8803aebdfb08 ffffffff817165ef ffffffff81067886 ffff8803aebdfae0
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258551]  ffffffff810c75dc 0000000000000001 ffff8803e60ebe38 0000000000000000
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258553] Call Trace:
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258561]  [<ffffffff8171bcb4>] dump_stack+0x45/0x56
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258564]  [<ffffffff817165ef>] dump_header+0x7f/0x1f1
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258568]  [<ffffffff81067886>] ? put_online_cpus+0x56/0x80
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258570]  [<ffffffff810c75dc>] ? rcu_oom_notify+0xcc/0xf0
Feb  6 09:17:46 dataminer0-proto kernel: [32448367.258574]  [<ffffffff81151bfe>] oom_kill_process+0x1ce/0x330



Feb  7 07:58:18 dataminer0-proto logger: algorithms-updater: the add command string is ./addAlgorithm.sh ARCHNER_IT NER_ALGORITHMS org.gcube.dataanalysis.executor.rscripts.ArchnerIt /d4science.research-infrastructures.eu/gCubeApps/
RPrototypingLab transducerers Y http://data.d4science.org/c1ZaT0ZOSi8wQlI4bDRQdDFHSmdFZCtMb3YvcDA5STdHbWJQNStIS0N6Yz0 "Archaeological NER web-service for Italian text. This named entity recognition web-service will find Artefact, C
olour, Material, Period, Place, Person, Site, Technique and Timespan entities" proto/software
Feb  7 07:59:02 dataminer0-proto CRON[17445]: (gcube) CMD (/usr/local/bin/algorithms-updater > /home/gcube/wps_algorithms_install_log/algorithms_updater_cron.log 2>&1)
Feb  6 00:14:11 dataminer0-proto logger: algorithms-updater: algorithm ARCHNER_IT  is already present but a newer version exists
Feb  7 07:59:03 dataminer0-proto logger: algorithms-updater: another job still running, exiting.
Feb  7 07:59:06 dataminer0-proto logger: algorithms-updater: the adding of algorithm ARCHNER_IT  succeeded
Feb  7 07:59:06 dataminer0-proto logger: algorithms-updater: Exiting
Feb  7 07:59:14 dataminer0-proto kernel: [32530054.819562] R invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
Feb  7 07:59:14 dataminer0-proto kernel: [32530054.819566] R cpuset=/ mems_allowed=0
Actions #3

Updated by Andrea Dell'Amico over 7 years ago

Keep also in mind that the proto dataminers have half the RAM of the other production ones.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)