It's very common that we need to transfer files between two different hosts such as backups. It is also an very simple task, we can use scp or rsync to complete the task well. But what if the file is very big, it may take some time to transfer it. How can we transfer a big file with high speed? Here we propose one solution.
Copy file
If we copy one uncompressed file, then we should follow below steps:
- Compress data
- Send it to another host
- Uncompress the data
- Verify the data integrity
This will be very efficient and it also saves bandwidth.
With ZIP+SCP
We can combine ZIP and SCP to achieve this.
gzip -c /home/yankay/data | ssh yankay01 "gunzip -c - > /home/yankay/data"
This command will use GZIP to compress /home/yankay/data and then send it to host yankay01 through ssh.
The file size of data is 1.1 GB, it becomes 183MB after compressed with Zip compression, the above command takes 45.6s, the average throughput is 24.7MB/s. Actually scp has compression capability as well, we can write the above command as :
scp -C -c blowfish /home/yankay/data yankay01:/home/yankay/data
The end result of both commands above is the same, the difference is that we use blowfish algorithm as the compression algorithm, it will be faster using the blowfish algorithm than the default algorithm.
The above command takes 45s again, the average throughput is 24MB/s which has no much improvement. It seems the bottleneck is not at the network side.
Then what is the bottleneck?
Performance analysis
We need to define some variables
- The compression ratio of the compression toll is CompressRatio
- The compression throughput is CompressSpeed MB/s
- The throughput of the network is NetSpeed MB/s
Because we use pipe, the performance of pipe depends on the performance of the slowest component, so the overall performance is:
Compression rate | Throughput | 100M/s | 62MB/s | ||
ZLIB | 35.80% | 9.6 | 9.6 | 9.6 | 9.6 |
LZO | 54.40% | 101.7 | 101.7 | 101.7 | 18.38235294 |
LIBLZF | 54.60% | 134.3 | 134.3 | 113.5531136 | 18.31501832 |
QUICKLZ | 54.90% | 183.4 | 182.1493625 | 112.9326047 | 18.21493625 |
FASTLZ | 56.20% | 134.4 | 134.4 | 110.3202847 | 17.79359431 |
SNAPPY | 59.80% | 189 | 167.2240803 | 103.6789298 | 16.72240803 |
NONE | 100% | 300 | 100 | 62 | 10 |
When the compression throughput is less than the network throughput, then the bottleneck is the compression, otherwise, it is the network.
We have our test data below:
speed=min(NetSpeed/CompressRadio,CompressSpeed)
We can find , when the network speed is 100M/s, QuickLZ has the best performance. If we use SSH as the data transfer protocol, it will not achieve the best performance. In 10M/s, all algorithms have almost the same performance, but QuickLZ has a relatively better performance.
For different data and hosts, the best algorithm is also different, but one thing can be sure, the bottleneck should be on network side.
Conclusion
According to above analysis, we should not use SSH as the network transfer protocol, we can use NC to improve the performance. And we can use qpress as the compression algorithm.
scp /usr/bin/qpress yankay01:/usr/bin/qpress ssh yankay01 "nc -l 12345 | qpress -dio > /home/yankay/data" & qpress -o /home/yankay/data |nc yankay01 12345
The first line above is to install the qpress on the remote machine, the second line is to listen to a port with NC, the third line is to compress and transfer the data.
It takes 2.8s to execute above commands, the average throughput is 402MB/s which will be 16 times faster than ZIP+SCP.
Source : http://www.yankay.com/linux%E5%A4%A7%E6%96%87%E4%BB%B6%E4%BC%A0%E8%BE%93/