Incident #11632
closed
orientdb01-d4s receives a huge number of connection
Added by Luca Frosini about 7 years ago.
Updated about 7 years ago.
Assignee:
_InfraScience Systems Engineer
Infrastructure:
Production
Description
From logs I found:
2018-04-12 18:29:28:484 WARNI Reached maximum number of concurrent connections (max=1000, current=5488), reject incoming connection from /146.48.123.23:55173 [OServerNetworkListener]
2018-04-12 18:30:00:024 WARNI Reached maximum number of concurrent connections (max=1000, current=5528), reject incoming connection from /146.48.122.33:47036 [OServerNetworkListener]
146.48.122.33 : social-indexer.d4science.org
146.48.123.23 : monitoring.research-infrastructures.eu
I'll investigate the first to check if the problem is on smart-executor, but I don't understand the second
It's the nagios check. I don't know how the connectios aren't closed, it's the same http call since months: orientdb01-d4s.d4science.org:2480/studio/index.html
The check is currently failing with connection reset by peer
, it's behaving this way since a couple of days.
Before restarting the instance, I also deleted some structure so the token the application has already obtained are not valid anymore and orient reset the connection.
Can we stop Nagios check and restart it tomorrow? I'll restart social-indecer too.
I stopped the nagios check. The nagios check did not use any token btw, that URL should be public.
@andrea.dellamico@isti.cnr.it or @roberto.cirillo@isti.cnr.it can you provide me access to social-indexer.d4science.org
Luca Frosini wrote:
@andrea.dellamico@isti.cnr.it or @roberto.cirillo@isti.cnr.it can you provide me access to social-indexer.d4science.org
Done. You can access as gcube
social-indexer.d4science.org restarted. @andrea.dellamico@isti.cnr.it can you restart nagios? Thanks a lot
- Status changed from New to Closed
- % Done changed from 0 to 100
Done.
Can you share what the problem was?
@andrea.dellamico@isti.cnr.it I suspect a bug in Resource Registry. I already investigated the port type to interact with instances and I excluded it. I could be instead schema port type (so also the API used from nagios and haproxy to monitor the instance) but I still have to investigate it. I'll share the problem as soon as I will find it.
I found a potential bug in the resource registry but it should occur in the actual situation. Anyway, I added an additional line of code to solve it. Please note that in dev instance cannot be tested because there to few contexts to make it happens.
Also available in: Atom
PDF