Project

General

Profile

Actions

Incident #7472

closed

smartgears huge logs - disk full

Added by Kostas Kakaletris over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
High
Assignee:
Alessandro Pieve
Category:
Other
Target version:
Start date:
Mar 10, 2017
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Pre-Production

Description

We encountered problems with full disks in 3 VMs in the prepoduction. The problem was the logs of smartgears that were about 15-20GB per day (yesterday and today). The logs were full of connection timeouts to couchbase but even though the smartgears should roll logs and to to be such big

Actions #1

Updated by Alessandro Pieve over 8 years ago

  • Status changed from New to In Progress

On wich VMs is the problem ?

Actions #2

Updated by Alessandro Pieve over 8 years ago

  • Assignee changed from Alessandro Pieve to _InfraScience Systems Engineer

@tommaso.piccioli@isti.cnr.it You can open the ports to communicate with Couchbase? Thanks

Actions #3

Updated by Kostas Kakaletris over 8 years ago

VMs : dl20.di.uoa.gr, dl27.di.uoa.gr, dl28.di.uoa.gr

The checker should please forward me his/her key in order to add it to one of these VMs to check it there

thank you

Actions #4

Updated by Tommaso Piccioli over 8 years ago

  • Assignee changed from _InfraScience Systems Engineer to Alessandro Pieve

I just opened the firewall, now I can see connections from:

88.197.53.12
88.197.53.15
88.197.53.20

16265 connections from 88.197.53.12 to the first node of our couchbase dev/preprod cluster, that's not normal, only very few connections from other host and from the same host to other couchbase nodes.

Actions #5

Updated by Alessandro Pieve over 8 years ago

Actions #6

Updated by Kostas Kakaletris over 8 years ago

  • Assignee changed from Alessandro Pieve to _InfraScience Systems Engineer

Added key on gcube@dl28.di.uoa.gr

dl27.di.uoa.gr and dl28.di.uoa.gr, that were the ones that totally crashed have ips 88.197.53.27 and 88.197.53.28. I do not know if they were blocked too. Problems started yesterday.

In any case we should avoid huge logs so these are 2 different issues I guess.

About 88.197.53.12 , it is a development VM.

Thank you

Actions #7

Updated by Alessandro Pieve over 8 years ago

  • Assignee changed from _InfraScience Systems Engineer to Alessandro Pieve

I can not connect to dl28.di.uoa.gr

Actions #8

Updated by Alessandro Pieve over 8 years ago

  • Assignee changed from Alessandro Pieve to _InfraScience Systems Engineer
Actions #9

Updated by Kostas Kakaletris over 8 years ago

@alessandro.pieve@isti.cnr.it can you please add me on skype (kkakas@yahoo.gr) to check your problem to connect to dl28? I added your key before but maybe is bery old/small?

thank you

Actions #10

Updated by Tommaso Piccioli over 8 years ago

since 15:10 today the entire network 88.197.53.0/24 is allowed to access the couchbase dev cluster

After that time I can see only these data connections

to couchbase01-d-d4s
21638 88.197.53.12
5 88.197.53.15
1 88.197.53.20
1 88.197.53.27
1 88.197.53.28

to couchbase02-d-d4s
4 88.197.53.12
5 88.197.53.15
1 88.197.53.20
1 88.197.53.27
1 88.197.53.28

something strange from dl12.di.uoa.gr, I think (more than 20000 connections established).

Actions #11

Updated by Kostas Kakaletris over 8 years ago

About 88.197.53.12 I cannot understand the reason but probably this should be a different ticket and for development infra? I will check it if i can figure out what may causing such connections in that specific devel vm.

dl28 that I was checking, was now connected to the couchbase so that was fixed after your firewall changes.

Still the main issue of this ticket is that the systems crashed because of huge smartgears logs. Alessandro is checking it on dl28. This is something we should avoid in production. Maybe I should change manual some parameter for logging on smartgears?

Actions #12

Updated by Alessandro Pieve over 8 years ago

  • Assignee changed from _InfraScience Systems Engineer to Alessandro Pieve

I checked and now there are no more connections errors into dl28. ..

Actions #13

Updated by Roberto Cirillo over 8 years ago

@k.kakaletris@cite.gr I think you could set the root level to error to avoid this issue.
Anyway, in the next release we should manage better the logs produced by "com.couchbase" package. I'm going to open a dedicated ticket for this issue.
Please, if the problem is solved now close the ticket.

Actions #15

Updated by Kostas Kakaletris over 8 years ago

  • Status changed from In Progress to Closed

Tommaso solved the firewall issue, Alessandro informed me about the file and parameter for changing the logging manual with max file size, so the ticket is solved successfully.

@tommaso.piccioli@isti.cnr.it about the thousand of connections from dl12 I will have to investigate farther because the many open connections crashed it. Maybe related with index and smartgear but I will create a new ticket for that after testing and gathering more information.

Thank you

Actions #16

Updated by Roberto Cirillo over 8 years ago

Thanks Kostas. About the dev node (dl12) have you checked if there is the last smartgears distribution? if not, you should try to upgrade it. Anyway, if the problem persist feel free to open another ticket.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)