Incident #11632
closed
orientdb01-d4s receives a huge number of connection
100%
Description
From logs I found:
2018-04-12 18:29:28:484 WARNI Reached maximum number of concurrent connections (max=1000, current=5488), reject incoming connection from /146.48.123.23:55173 [OServerNetworkListener] 2018-04-12 18:30:00:024 WARNI Reached maximum number of concurrent connections (max=1000, current=5528), reject incoming connection from /146.48.122.33:47036 [OServerNetworkListener]
146.48.122.33 : social-indexer.d4science.org
146.48.123.23 : monitoring.research-infrastructures.eu
I'll investigate the first to check if the problem is on smart-executor, but I don't understand the second
Updated by Andrea Dell'Amico about 7 years ago
It's the nagios check. I don't know how the connectios aren't closed, it's the same http call since months: orientdb01-d4s.d4science.org:2480/studio/index.html
Updated by Andrea Dell'Amico about 7 years ago
The check is currently failing with connection reset by peer
, it's behaving this way since a couple of days.
Updated by Luca Frosini about 7 years ago
Before restarting the instance, I also deleted some structure so the token the application has already obtained are not valid anymore and orient reset the connection.
Can we stop Nagios check and restart it tomorrow? I'll restart social-indecer too.
Updated by Andrea Dell'Amico about 7 years ago
I stopped the nagios check. The nagios check did not use any token btw, that URL should be public.
Updated by Luca Frosini about 7 years ago
@andrea.dellamico@isti.cnr.it or @roberto.cirillo@isti.cnr.it can you provide me access to social-indexer.d4science.org
Updated by Roberto Cirillo about 7 years ago
Luca Frosini wrote:
@andrea.dellamico@isti.cnr.it or @roberto.cirillo@isti.cnr.it can you provide me access to social-indexer.d4science.org
Done. You can access as gcube
Updated by Luca Frosini about 7 years ago
social-indexer.d4science.org restarted. @andrea.dellamico@isti.cnr.it can you restart nagios? Thanks a lot
Updated by Andrea Dell'Amico about 7 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
Done.
Can you share what the problem was?
Updated by Luca Frosini about 7 years ago
@andrea.dellamico@isti.cnr.it I suspect a bug in Resource Registry. I already investigated the port type to interact with instances and I excluded it. I could be instead schema port type (so also the API used from nagios and haproxy to monitor the instance) but I still have to investigate it. I'll share the problem as soon as I will find it.
Updated by Luca Frosini about 7 years ago
I found a potential bug in the resource registry but it should occur in the actual situation. Anyway, I added an additional line of code to solve it. Please note that in dev instance cannot be tested because there to few contexts to make it happens.