Incident #7472
closedsmartgears huge logs - disk full
100%
Description
We encountered problems with full disks in 3 VMs in the prepoduction. The problem was the logs of smartgears that were about 15-20GB per day (yesterday and today). The logs were full of connection timeouts to couchbase but even though the smartgears should roll logs and to to be such big
Updated by Alessandro Pieve over 8 years ago
- Status changed from New to In Progress
On wich VMs is the problem ?
Updated by Alessandro Pieve over 8 years ago
- Assignee changed from Alessandro Pieve to _InfraScience Systems Engineer
@tommaso.piccioli@isti.cnr.it You can open the ports to communicate with Couchbase? Thanks
Updated by Kostas Kakaletris over 8 years ago
VMs : dl20.di.uoa.gr, dl27.di.uoa.gr, dl28.di.uoa.gr
The checker should please forward me his/her key in order to add it to one of these VMs to check it there
thank you
Updated by Tommaso Piccioli over 8 years ago
- Assignee changed from _InfraScience Systems Engineer to Alessandro Pieve
I just opened the firewall, now I can see connections from:
88.197.53.12
88.197.53.15
88.197.53.20
16265 connections from 88.197.53.12 to the first node of our couchbase dev/preprod cluster, that's not normal, only very few connections from other host and from the same host to other couchbase nodes.
Updated by Alessandro Pieve over 8 years ago
you can take my key from https://manage.research-infrastructures.eu/
Updated by Kostas Kakaletris over 8 years ago
- Assignee changed from Alessandro Pieve to _InfraScience Systems Engineer
Added key on gcube@dl28.di.uoa.gr
dl27.di.uoa.gr and dl28.di.uoa.gr, that were the ones that totally crashed have ips 88.197.53.27 and 88.197.53.28. I do not know if they were blocked too. Problems started yesterday.
In any case we should avoid huge logs so these are 2 different issues I guess.
About 88.197.53.12 , it is a development VM.
Thank you
Updated by Alessandro Pieve over 8 years ago
- Assignee changed from _InfraScience Systems Engineer to Alessandro Pieve
I can not connect to dl28.di.uoa.gr
Updated by Alessandro Pieve over 8 years ago
- Assignee changed from Alessandro Pieve to _InfraScience Systems Engineer
Updated by Kostas Kakaletris over 8 years ago
@alessandro.pieve@isti.cnr.it can you please add me on skype (kkakas@yahoo.gr) to check your problem to connect to dl28? I added your key before but maybe is bery old/small?
thank you
Updated by Tommaso Piccioli over 8 years ago
since 15:10 today the entire network 88.197.53.0/24 is allowed to access the couchbase dev cluster
After that time I can see only these data connections
to couchbase01-d-d4s
21638 88.197.53.12
5 88.197.53.15
1 88.197.53.20
1 88.197.53.27
1 88.197.53.28
to couchbase02-d-d4s
4 88.197.53.12
5 88.197.53.15
1 88.197.53.20
1 88.197.53.27
1 88.197.53.28
something strange from dl12.di.uoa.gr, I think (more than 20000 connections established).
Updated by Kostas Kakaletris over 8 years ago
About 88.197.53.12 I cannot understand the reason but probably this should be a different ticket and for development infra? I will check it if i can figure out what may causing such connections in that specific devel vm.
dl28 that I was checking, was now connected to the couchbase so that was fixed after your firewall changes.
Still the main issue of this ticket is that the systems crashed because of huge smartgears logs. Alessandro is checking it on dl28. This is something we should avoid in production. Maybe I should change manual some parameter for logging on smartgears?
Updated by Alessandro Pieve over 8 years ago
- Assignee changed from _InfraScience Systems Engineer to Alessandro Pieve
I checked and now there are no more connections errors into dl28. ..
Updated by Roberto Cirillo over 8 years ago
@k.kakaletris@cite.gr I think you could set the root level to error to avoid this issue.
Anyway, in the next release we should manage better the logs produced by "com.couchbase" package. I'm going to open a dedicated ticket for this issue.
Please, if the problem is solved now close the ticket.
Updated by Kostas Kakaletris over 8 years ago
- Status changed from In Progress to Closed
Tommaso solved the firewall issue, Alessandro informed me about the file and parameter for changing the logging manual with max file size, so the ticket is solved successfully.
@tommaso.piccioli@isti.cnr.it about the thousand of connections from dl12 I will have to investigate farther because the many open connections crashed it. Maybe related with index and smartgear but I will create a new ticket for that after testing and gathering more information.
Thank you
Updated by Roberto Cirillo over 8 years ago
Thanks Kostas. About the dev node (dl12) have you checked if there is the last smartgears distribution? if not, you should try to upgrade it. Anyway, if the problem persist feel free to open another ticket.