Linux

如何使用 rsync 或 scp 有效地將文件從 machineB 和 machineC 複製到 machineA?

  • March 22, 2014

我正在執行我的 shell 腳本,在該腳本上將machineA文件複製machineB到.machineC``machineA

如果該文件不在 中machineB,那麼它肯定應該在其中machineC。所以我會嘗試從machineB第一個複制,如果它不在那裡,machineB那麼我會去machineC複制相同的文件。

在這個文件夾machineB裡面machineC會有一個像這樣的YYYYMMDD文件夾 -

/data/pe_t1_snapshot

因此,無論日期是上述文件夾中這種格式的最新日期YYYYMMDD- 我都會選擇該文件夾作為我需要開始復製文件的完整路徑 -

20140317所以假設如果這是裡面的最新日期文件夾,/data/pe_t1_snapshot那麼這將是我的完整路徑 -

/data/pe_t1_snapshot/20140317

從我需要開始將文件複製到machineBmachineC. 我需要從和復製400文件,每個文件大小都是.machineA``machineB``machineC``1.5 GB

目前我有我的下面的 shell 腳本,它在我使用scpover時工作正常,rsync但不知何故需要5 hours複製400machineA 中的文件,這對我來說太長了,我猜。:(

下面是我的shell腳本 -

#!/bin/bash

readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot
PRIMARY_PARTITION=(0 3 5 7 9)
SECONDARY_PARTITION=(1 2 4 6 8)

dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)

echo $dir1
echo $dir2

if [ "$dir1" = "$dir2" ]
then
   # delete all the files first
   rm -rf $PRIMARY/*
   # below for-loop copies one file at a time in PRIMARY folder
   for el in "${PRIMARY_PARTITION[@]}"
   do
       scp david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.
   done
   # delete all the files first
   rm -rf $SECONDARY/*
   # below for-loop copies one file at a time in SECONDARY folder
   for sl in "${SECONDARY_PARTITION[@]}"
   do
       scp david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.
   done
fi

我正在復製PRIMARY_PARTITION文件PRIMARY夾中的文件和SECONDARY_PARTITION文件SECONDARY夾中的文件machineA

現在我的問題是 - 我將如何使用rsync這里而不是scp(ing)文件?正如我所讀到的,這比文件rsync快得多。scp(ing)我想有與我的 shell 腳本中相同的邏輯rsync。我以前從未合作過,rsync所以遇到了一些問題。

誰能提供一個例子?

鑑於我的用rsync例會比 scp 更快嗎?如果不是,我可以嘗試哪些其他選項來加快文件傳輸速度?

更新:-

為了澄清 terdon 問題 -

在問題中,我僅顯示 10 個文件,例如 -

PRIMARY_PARTITION=(0 3 5 7 9)
SECONDARY_PARTITION=(1 2 4 6 8)

通常在PRIMARY_PARTITION數組中,我將有大約 150 個文件編號,然後SECONDARY_PARTITION我將有另外 200 個文件編號。

現在我需要做的是我在其中的任何文件編號PRIMARY_PARTITION,我需要去查找machineB目錄中的那些文件,如果文件已經存在,則將其複製到PRIMARY文件夾中,machineA但如果文件不存在machineB則它應該在那裡,machineC所以從machineC現在開始復製文件並將其PRIMARY放在machineA.

同樣,我需要為 做同樣的事情SECONDARY_PARTITION,我會去找到目錄中的那些文件machineB,如果有,則將其複製到machineA輔助目錄中,但如果它不存在machineB,那麼它應該在那裡,machineC因此將其複制machineC並放入在machineA二級目錄中。

所以我們擁有的所有文件編號都在 -PRIMARY_PARTITIONSECONDARY_PARTITION.

一般來說,我會PRIMARY_PARTITION喜歡SECONDARY_PARTITION這樣 -

PRIMARY_PARTITION=(0 548 272 4 544 276 8 556 280 12 552 284 16 256 564 20 260 560 24 264 572 28 268 568 516 304 32 512 308 36 524 312 40 520 316 44 288 532 48 292 528 52 296 540 56 300 536 60 68 608 340 64 336 76 348 72 344 84 324 80 320 92 332 88 328 576 372 100 580 368 96 584 380 108 588 376 104 356 592 116 352 596 112 364 600 124 360 604 120 136 408 140 412 128 400 132 404 152 392 156 396 144 384 148 388 440 168 444 172 432 160 436 164 424 184 428 188 416 176 420 180 204 476 200 472 196 468 192 464 220 460 216 456 212 452 208 448 508 236 504 232 500 228 496 224 492 252 488 248 484 244 480 240)

SECONDARY_PARTITION=(1101 1374 1641 1371 1647 1098 1635 1365 1095 1638 1089 1362 1659 1359 1119 1113 1662 1353 1350 1650 1110 1347 1653 1107 1134 1407 1611 1401 1131 1614 1602 1125 1398 1122 1605 1395 1389 1149 1626 1629 1146 1386 1617 1143 1383 1377 1623 1137 1305 1581 1578 1311 1299 1575 1302 1569 1599 1290 1593 1293 1590 1281 1587 1287 1551 1338 1341 1545 1071 1329 1542 1335 1539 1083 1566 1323 1086 1563 1326 1557 1074 1314 1317 1077 1554 1221 1494 1491 1218 1503 1230 1227 1497 1479 1239 1233 1473 1245 1485 1482 1242 1254 1527 1251 1521 1263 1533 1530 1257 1509 1269 1266 1506 1278 1518 1275 1515 1155 1425 1431 1158 1434 1161 1167 1437 1410 1170 1173 1413 1419 1179 1422 1182 1671 1458 1185 1665 1191 1461 1677 1194 1467 1470 1197 1674 1203 1443 1206 1446 1449 1209 1215 1455)

另一個更新:-

刪除後2>/dev/null,我再次執行腳本,但出現以下錯誤 -

ssh: Could not resolve hostname machineB : Name or service not known rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(605) [Receiver=3.0.9] ssh: Could not resolve hostname machineC : Name or service not known rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(605) [Receiver=3.0.9] ssh: Could not resolve hostname machineB : Name or service not known rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(605) [Receiver=3.0.9] ssh: Could not resolve hostname machineC : Name or service not known rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(605) [Receiver=3.0.9]

有什麼想法嗎?在執行 shell 腳本之前,我已經用實際名稱替換了machineB並且我的系統是 -machineC

root@machineA:/home/david# uname -a
Linux machineA 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

下面是我正在執行的 shell 腳本 -

#!/usr/bin/env bash

readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot
PRIMARY_PARTITION=(0 3 5 7 9)
SECONDARY_PARTITION=(1 2 4 6 8)

dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)

echo $dir1
echo $dir2

## Build your list of filenames before the loop. 
for n in "${PRIMARY_PARTITION[@]}"
do
   primary_files="$primary_files :$dir1"/t1_weekly_1680_"$n"_200003_5.data
done

## Repeat for $SECONDARY_PARTITION
for n in "${SECONDARY_PARTITION[@]}"
do
   secondary_files="$secondary_files :$dir2"/t1_weekly_1680_"$n"_200003_5.data
done

echo $primary_files
echo $secondary_files


if [ "$dir1" = "$dir2" ]
then
   find "$PRIMARY" -mindepth 1 -delete
   find "$SECONDARY" -mindepth 1 -delete

   rsync -avz david@${FILERS_LOCATION[0]}"${primary_files}" $PRIMARY/
   rsync -avz david@${FILERS_LOCATION[1]}"${primary_files}" $PRIMARY/

   ## Do the same for $secondary_partition files
   rsync -avz david@${FILERS_LOCATION[0]}"${secondary_files}" $SECONDARY/
   rsync -avz david@${FILERS_LOCATION[1]}"${secondary_files}" $SECONDARY/
fi

我懷疑可能是rsync語法不正確。因為如果我像這樣執行單個命令,那麼它工作正常 -

rsync -avz david@machineB":/data/pe_t1_snapshot/20140317/t1_weekly_1680_0_200003_5.data" /export/home/david/dist/primary

另一個小更新:-

如果我這樣跑——

root@machineA:/export/home/david# rsync -avz david@machineB':/data/pe_t1_snapshot/20140317/t1_weekly_1680_0_200003_5.data :/data/pe_t1_snapshot/20140317/t1_weekly_1680_1_200003_5.data' /data01/primary
receiving incremental file list
rsync: change_dir "/home/david/:/data/pe_t1_snapshot/20140317" failed: No such file or directory (2)
t1_weekly_1680_0_200003_5.data

sent 30 bytes  received 504982813 bytes  6196108.50 bytes/sec
total size is 1761988281  speedup is 3.49
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1536) [generator=3.0.9]

上面的命令應該將文件複製到/data01/primary目錄,但它只複製一個文件而不複製第二個文件。

但這工作正常,一個文件被複製 -

root@machineA:/export/home/david# rsync -avz david@machineB':/data/pe_t1_snapshot/20140317/t1_weekly_1680_0_200003_5.data' /data01/primary
receiving incremental file list
t1_weekly_1680_0_200003_5.data

sent 30 bytes  received 504982698 bytes  6351984.00 bytes/sec
total size is 1761988281  speedup is 3.49

您的腳本的主要問題是您scp為每個文件打開一個單獨的連接,這增加了很多不必要的成本。你可以嘗試這樣的事情:

#!/usr/bin/env bash

readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot

PRIMARY_PARTITION=(0 548 272 4 544 276 8 556 280 12 552 284 16 256 564 20 260 560 24 264 572)
SECONDARY_PARTITION=(1101 1374 1641 1371 1647 1098 1635 1365 1095 1638 1089 1362 1659 1359)

dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)

## Build your list of filenames before the loop. 
for n in "${PRIMARY_PARTITION[@]}"
do
   primary_files="$primary_files :$dir1"/t1_weekly_1680_"$n"_200003_5.data
done

## Repeat for $SECONDARY_PARTITION
for n in "${SECONDARY_PARTITION[@]}"
do
   secondary_files="$secondary_files :$dir2"/t1_weekly_1680_"$n"_200003_5.data
done

if [ "$dir1" = "$dir2" ]
then
   ## I am using find largely because the * 
   ## in rm -rf "$PRIMARY"/* screws up the syntax 
   ## highlighting on the site and it is a good habit to
   ## get into anyway. Feel free to use rm -rf in your script.
   find "$PRIMARY" -mindepth 1 -delete
   find "$SECONDARY" -mindepth 1 -delete

   ## rsync can be run with this format:
   ##   rsync user@dest:/target/path1 :/target/path2 :/target/pathN /dest/path
   #
   ## which is why I added the : in the loop above. So, these commands will 
   ## open only 2 conections per file list. First you will try to copy all $primary_partition
   ## files from machineA, then all $primary_partition files from machineB. 
   ## rsync will complain about files not found (which is why I'm redirecting standard
   ## error to /dev/null) but will continue. You then repeat the process for machineC.
   rsync -avz david@${FILERS_LOCATION[0]}"${primary_files}" $PRIMARY/ 2>/dev/null
   rsync -avz david@${FILERS_LOCATION[1]}"${primary_files}" $PRIMARY/ 2>/dev/null

   ## Do the same for $secondary_partition files
   rsync -avz david@${FILERS_LOCATION[0]}"${secondary_files}" $SECONDARY/ 2>/dev/null
   rsync -avz david@${FILERS_LOCATION[1]}"${secondary_files}" $SECONDARY/ 2>/dev/null
fi

引用自:https://unix.stackexchange.com/questions/120536