Task #1504
closed
Task #1502: Automate the dataminer installation as a loadbalanced service
Install a haproxy instance in front of the dev dataminer services
100%
Description
The haproxy instance can use the round robin balancer. It seems that nor cookies or sessions need to be provided.
Files
Updated by Andrea Dell'Amico over 9 years ago
- Target version changed from 197 to Computational Infrastructure upgrade to smartgears
Updated by Andrea Dell'Amico over 9 years ago
- Status changed from New to In Progress
Hostname and IP will be: dataminer-d-d4s.d4science.org 146.48.123.63
Updated by Andrea Dell'Amico over 9 years ago
- Subject changed from Install a haproxy instance in front of the dataminer services to Install a haproxy instance in front of the dev dataminer services
Updated by Andrea Dell'Amico over 9 years ago
- % Done changed from 0 to 40
Created the VM, starting the configuration.
Updated by Andrea Dell'Amico over 9 years ago
- Status changed from In Progress to Feedback
- % Done changed from 40 to 100
The haproxy balancer is ready.
The nginx logging configuration on the target host (dataminer2-d-d4s.d4science.org only) has been changed to log both the haproxy IP and the original one. The access_log line is now, for example:
146.48.123.63 forwarded for 146.48.123.149 - - [02/Dec/2015:14:46:54 +0100] "GET /wps/ HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36"
Updated by Andrea Dell'Amico over 9 years ago
The dataminer[1:2]-d-d4s.d4science.org nodes are still directly reachable on port 80, as requested.
Updated by Gianpaolo Coro over 9 years ago
Calls to dataminer-d-d4s.d4science.org are always sent to dataminer1. @andrea.dellamico@isti.cnr.it could you please give it a look?
Updated by Andrea Dell'Amico over 9 years ago
Can you try from a different IP? Even if it's a round robin configuration haproxy tries to deliver the requests to the same backend when the source does not change.
And a request: what URL should the load balance check to be sure that the service is working? I'm now testing /wps, but it's a 302.
Updated by Gianpaolo Coro over 9 years ago
I attach all the http links to test the balancer and the algorithms on Dataminer.
Updated by Andrea Dell'Amico over 9 years ago
Well, you need to chose one for the load balancer :). A light one possibly, they're executed once per second. We can use some of the others to produce a detailed nagios check, maybe.
Updated by Andrea Dell'Amico over 9 years ago
- Status changed from Feedback to In Progress
- % Done changed from 100 to 80
I've investigated the timeout occurrences a bit, and it seems a matter of tuning the client and server timeouts in the haproxy configuration.
I've raised them both, and now the query
seems to always succeed. We should test with longer ones, if possible: the client sends keepalive signals to the server, we need to set the timeout higher that that value.
Updated by Gianpaolo Coro over 9 years ago
According to the WPS specifications, timeout should be decided by the user's client. Processing can last also days and theoretically could be executed also in "synchronous" mode by the client. Is it possible to either set the timeout to infinite or to 10/20 days?
Updated by Andrea Dell'Amico over 9 years ago
There is no 'infinite timeout' in tcp or http.
We only need to ensure that the proxy keep alive timeout is longer than the client (or the standard tcp) one.
Updated by Gianpaolo Coro over 9 years ago
<>
Not according to Apache: https://hc.apache.org/httpcomponents-client-4.2.x/tutorial/html/connmgmt.html
Anyway, if infinite is not contemplated by haproxy, is it possible to set it to 20 days at least?
Updated by Andrea Dell'Amico over 9 years ago
Gianpaolo Coro wrote:
<>
Not according to Apache: https://hc.apache.org/httpcomponents-client-4.2.x/tutorial/html/connmgmt.htmlAnyway, if infinite is not contemplated by haproxy, is it possible to set it to 20 days at least?
That only says that the client will wait indefinitely. Servers have timeouts too. At http level, and tcp level. What we are interested in for this case is called http persistent connections and the way for them to work reliably is to have keepalive doing the job correctly.
Yesterday I raised the haproxy keepalive timeouts so that they should be much longer of the keepalive interval, and your tests that weren't reliable now are always completing.
We now need tests that last longer, and only if they fail because of timeouts we can try and raise them to an unreasonble high level.
Updated by Gianpaolo Coro over 9 years ago
Since we are talking about a computational service, the longest run I have lasts 7 days in the development environment. I don't know the computational time of all the possible algorithms we will integrate in the future.
Is it possible to quantify the current timeout or shall we wait 7 days to understand if another tuning is needed?
Updated by Andrea Dell'Amico over 9 years ago
Gianpaolo Coro wrote:
Since we are talking about a computational service, the longest run I have lasts 7 days in the development environment. I don't know the computational time of all the possible algorithms we will integrate in the future.
Is it possible to quantify the current timeout or shall we wait 7 days to understand if another tuning is needed?
I only have to check the client behaviour. A test lasting some then minutes is sufficient. If it fails I have a last haproxy configuration options to try, before falling back to increase the timeouts.
Updated by Andrea Dell'Amico over 9 years ago
- Status changed from In Progress to Feedback
- % Done changed from 80 to 100
It turned out that the http library used by the R jobs does not behave correctly with keep alive. So we are going to use a huge keepalive timeout (60 days) for the time being.
Updated by Gianpaolo Coro over 9 years ago
- Status changed from Feedback to Closed