Project

General

Profile

Actions

Incident #9757

closed

low upload speed to thredds-pre-d4s.d4science.org

Added by Paolo Fabriani almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
_InfraScience Systems Engineer
Category:
Other
Target version:
Start date:
Sep 22, 2017
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Pre-Production

Description

Shortly, we're experiencing very low upload network speed when transferring data from oscar-import.pre.d4science.org to thredds-pre-d4s.d4science.org. Conversely, the same usage scenario between oscar-import.pre.d4science.org and thredds-d-d4s.d4science.org works fine. Only the endpoint of the thredds server changes; in principle, both hosts should be configured the same way.

Here are the details, for a working usage session (i.e. with the thredds server in dev):

From the client host (oscar-import.pre.d4science.org) I'm uploading a big file (~10GB) to the thredds server using data transfer facilities. I'm using the following call:

curl --verbose -F uploadedFile=@/tmp/oscar-merger/test.nc --header gcube-token:[application-token] "http://thredds-pre-d4s.d4science.org:80/data-transfer-service/gcube/service/REST/FileUpload/thredds/public/netcdf/Oscar?on-existing-file=REWRITE&on-existing-dir=APPEND&create-dirs=true"

Hereafter what happens, seen from the client side:

  1. curl starts to transfer data at about ~100MB/second, for about 1~2 minutes. We've seen this with the "dstat" command.
  2. for about further 5', nothing can be noticed on the client. We guess nginx is putting the file somewhere (/tmp?).
  3. then, we guess the data transfer gets the request and starts writing the file to its final destination. Infact, the file appears in the thredds catalogue (http://thredds-pre-d4s.d4science.org/thredds/catalog/public/netcdf/Oscar/catalog.html); and the size increases constantly, up to completion. A 'Success' message is returned to the client. The response includes a submission time, which corresponds to the start of step 3, i.e. the exact time when data transfer receives the request.

The whole upload process takes about 12' in dev.

Now, moving to preprod (thredds-pre-d4s.d4science.org), we see a different behaviour for step 1:

  1. client starts to transfer data at ~100MB/second. After 15~20", speed drops to ~100KB. This means 1/1000x transfer speed. netstat shows Send-Q values always around 2100000; whereas in the working scenario Send-Q regularly drops to zero.

I don't think we've ever reached steps 2 and 3 in preprod; I've waited for 30' and saw no evidence of them (curl is still uploading data at 100k/sec), nor an error on the client.

Is there anything wrong/misconfigured on the preprod thredds server?
Your help is much appreciated.

Thank you.

Actions #1

Updated by Andrea Dell'Amico almost 8 years ago

@fabio.sinibaldi@isti.cnr.it can you see anything weird on the thredds side? I just verified that the nginx and tomcat configurations and they are identical on all the installations.

Actions #2

Updated by Fabio Sinibaldi almost 8 years ago

Paolo and I were trying to find out the difference together, no clues on the possible cause.
The only possible cause that comes to mind is that I remember we created from scratch thredds-d-d4s, while thredds-pre-d4s comes from the old installation (which ones served both preproduction and development infrastructures). So maybe a slightly difference between the 2 VMs (disks, net drivers.. ) might effect this behavior..?

Actions #3

Updated by Andrea Dell'Amico almost 8 years ago

  • Status changed from New to In Progress

I also verified the VMs allocation and the storage placement. The servers run on identical hardware, while the -pre data filesystem is using a faster storage server.

The only difference I see is the kernel version, it's newer on the dev instance. I can update it on the thredds-pre, a reboot will be needed. Let me know.

Actions #4

Updated by Paolo Fabriani almost 8 years ago

Of course, I've no objections to kernel update and reboot.

Actions #5

Updated by Andrea Dell'Amico almost 8 years ago

Done. Now the servers run the same kernel.

Actions #6

Updated by Paolo Fabriani almost 8 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

After some troubleshooting with Andrea and Fabio last Friday, the upload now works fine on thredds-pre-d4s.d4science.org.

Thank you.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)