Directory
大量文件的目錄結構
我有數百萬個文件,其文件名是內容的 SHA256 雜湊值。由於性能原因,我不想將它們儲存在單個目錄中。我的想法是建立基於 SHA256 雜湊的目錄結構——例如,目錄
./A/A/A/A/A
將包含所有以 AAAAA 開頭的文件。例如,一個雜湊是
AAAAAFF02F52AA70E57EA3FD67019A7A919D373915AA30393936D9CC
有沒有辦法自動創建這樣的目錄結構(具有定義的子目錄級別 - 即 5 或 10)?如何自動創建此目錄結構?
這是一個解決您的問題的框架,您可能需要根據自己的需要進行定制。這裡有相當多的調試和測試文件創建。首先是測試日誌,然後是腳本。歡迎查詢。
我假設您的文件數約為 1600 萬(實際上是 16 * 1024 * 1024)。所以我選擇了一棵樹,第一級有 256 個目錄,每個第二級有 256 個目錄(所以該級別有 65,536 個目錄),然後每個目錄的文件平均為 256 個。如果您希望採用 4 或 5 級方案,則更改很小(詢問是否不明顯)。
我
awk
用來將 sha256 名稱轉換為mv
命令,並將它們通過管道傳輸到bash
(距離末尾約 2 行)。我建議您刪除,| bash
直到您確定命令符合您的期望。還有一些清理行 (rm -rf
) 和一些您希望清理或註釋掉的調試 (ls
和)。find
該腳本在所有數百萬個文件所在的目錄中執行,
"Tree"
變數是整個樹所在的位置。這需要在同一個文件系統上,因此 mv 只需要更改每個文件的目錄條目:如果源和目標位於不同的文件系統上,則 mv 將改為像 cp 一樣,程序將“永遠”執行。這是完整的測試執行:
/home/paul/SandBox/Toys/dirTree total 8 -rwxr-xr-x 1 paul paul 1841 Dec 12 23:59 myTree -rw-r--r-- 1 paul paul 32 Dec 12 23:59 myTree.log Making test files ... total 544 -rw-r--r-- 1 paul paul 17545 Dec 12 23:59 4E8A34A5010C507ADF81E3D9EEC6330A9E866D3B70857111D3A9DF5C5008BA9D -rw-r--r-- 1 paul paul 90655 Dec 12 23:59 590A853F3C97C05BB55BBBDFBA988210066807C188E54B78F340F01DD48C0AF5 -rw-r--r-- 1 paul paul 2685 Dec 12 23:59 5B02D3A74A2E4B433D0C7DEE57446460CCE6661E3CE59918B5243B903E3358A6 -rw-r--r-- 1 paul paul 2051 Dec 12 23:59 6F48C65B219CA78B8C7FF03F12A8E27E1A298A541481E9B3C86645B622DCB5B1 -rw-r--r-- 1 paul paul 13545 Dec 12 23:59 8FED5B546352BF30E3B98E7EB8EB916DA5E2814B4227461F01173547263BB257 -rw-r--r-- 1 paul paul 311346 Dec 12 23:59 A8BCA57679FE42C2902D2AE70804C4C93088079B167BC99A7827295FFB34D32E -rw-r--r-- 1 paul paul 2092 Dec 12 23:59 B31470EC1AF3204CF2327F12A48296F8161B51E3C30679EFA71E65AA882DCED4 -rw-r--r-- 1 paul paul 4602 Dec 12 23:59 C2BFA9351040ABA8F36990D0C3E3E32F70F2E94EF8D05AEDC5EE3B32270953D3 -rw-r--r-- 1 paul paul 7687 Dec 12 23:59 C9FA54EEF557DE7B67B66A145CCC0D65037117F0EDFF8EFCE694B56C4A6F7FEB -rw-r--r-- 1 paul paul 71752 Dec 12 23:59 D3040C6348CB758498988DA5FAF086666553B27CFB7E591B5E7616C3E8373068 -rwxr-xr-x 1 paul paul 1841 Dec 12 23:59 myTree -rw-r--r-- 1 paul paul 204 Dec 12 23:59 myTree.log real 0m1.992s user 0m2.196s sys 0m0.296s Making directory tree ... /home/paul/SandBox/Toys/dirTree/sha256 256 65536 262748 4 drwxr-xr-x 2 paul paul 4096 Dec 12 23:59 ./01/B4 262811 4 drwxr-xr-x 2 paul paul 4096 Dec 12 23:59 ./01/F3 262752 4 drwxr-xr-x 2 paul paul 4096 Dec 12 23:59 ./01/B8 262644 4 drwxr-xr-x 2 paul paul 4096 Dec 12 23:59 ./01/4C real 3m47.220s user 0m42.700s sys 0m32.900s 135626 72 -rw-r--r-- 1 paul paul 71752 Dec 12 23:59 sha256/D3/04/D3040C6348CB758498988DA5FAF086666553B27CFB7E591B5E7616C3E8373068 135639 308 -rw-r--r-- 1 paul paul 311346 Dec 12 23:59 sha256/A8/BC/A8BCA57679FE42C2902D2AE70804C4C93088079B167BC99A7827295FFB34D32E 135641 20 -rw-r--r-- 1 paul paul 17545 Dec 12 23:59 sha256/4E/8A/4E8A34A5010C507ADF81E3D9EEC6330A9E866D3B70857111D3A9DF5C5008BA9D 133634 4 -rw-r--r-- 1 paul paul 2685 Dec 12 23:59 sha256/5B/02/5B02D3A74A2E4B433D0C7DEE57446460CCE6661E3CE59918B5243B903E3358A6 135640 92 -rw-r--r-- 1 paul paul 90655 Dec 12 23:59 sha256/59/0A/590A853F3C97C05BB55BBBDFBA988210066807C188E54B78F340F01DD48C0AF5 135625 8 -rw-r--r-- 1 paul paul 7687 Dec 12 23:59 sha256/C9/FA/C9FA54EEF557DE7B67B66A145CCC0D65037117F0EDFF8EFCE694B56C4A6F7FEB 135642 16 -rw-r--r-- 1 paul paul 13545 Dec 12 23:59 sha256/8F/ED/8FED5B546352BF30E3B98E7EB8EB916DA5E2814B4227461F01173547263BB257 135624 4 -rw-r--r-- 1 paul paul 2051 Dec 12 23:59 sha256/6F/48/6F48C65B219CA78B8C7FF03F12A8E27E1A298A541481E9B3C86645B622DCB5B1 135628 4 -rw-r--r-- 1 paul paul 2092 Dec 12 23:59 sha256/B3/14/B31470EC1AF3204CF2327F12A48296F8161B51E3C30679EFA71E65AA882DCED4 135630 8 -rw-r--r-- 1 paul paul 4602 Dec 12 23:59 sha256/C2/BF/C2BFA9351040ABA8F36990D0C3E3E32F70F2E94EF8D05AEDC5EE3B32270953D3 real 0m2.118s user 0m0.528s sys 0m1.520s
這是腳本:
#! /bin/bash Tree="sha256" #.. Fake some test files from man pages, renamed with their own sha256. mkFile () { local Fn local Awk='{ printf ("%s\n", toupper ($(NF))); }' man -s 1 "${1}" > Man Fn=$( openssl dgst -sha256 Man | awk "${Awk}" ) mv Man "${Fn}" } #.. Make a directory tree for the first 4 hex characters of any name, #.. such that files 7BC12A13... go into ./7B/C1. mkDirs () { local a b c d for a in {0..9} {A..F}; do for b in {0..9} {A..F}; do for c in {0..9} {A..F}; do for d in {0..9} {A..F}; do mkdir -p ./${a}${b}/${c}${d} done done done done } #.. Move all files in the current directory that have sha256-type names #.. into their appropriate directory. mvFiles () { local Awk=' BEGIN { FS = "/"; cmd = "mv -t \047%s/%s/%s\047 \047%s\047\n"; } length ($NF) == 64 && $NF ~ /^[[:xdigit:]]*$/ { printf (cmd, Tree, substr ($NF, 1, 2), substr ($NF, 3, 2), $NF); } ' awk -v Tree="${Tree}" -f <( printf '%s' "${Awk}" ) - } #.. Tests. #.. Nothing up my sleeves. pwd ls -l #.. Make some test files. [ x ] && time ( echo "Making test files ..." for tx in cut cat ls find wc dd bash awk vi dc; do mkFile "${tx}" done ls -l ) #.. Make a directory tree. [ x ] && time ( echo "Making directory tree ..." rm -rf ${Tree} mkdir -p ${Tree} cd ${Tree} || exit pwd mkDirs ls -d * | wc -l ls -d */* | wc -l find . -ls | tail -n +1299 | head -n 4 ) #.. Move all the local sha256 files into the tree. [ x ] && time ( find . -maxdepth 1 -type f | mvFiles "${Tree}" | bash find "${Tree}" -type f -ls )