Shell
如何根據文件的內容將文件移動到不同的目錄?
我有很多文件包含這樣的字元串:
/databis/defontis/Dossier_fasta_chrm_avec_piler/SRR6237661_chrm.fasta: N putative CRISPR arrays found
其中
N
是一個可以是0
或更大的數字。我需要將所有文件移到目錄N
中,所有大於目錄的文件移動到目錄中。0``Sans_crispr``N``0``Avec_crispr
我還可以看到
ls
,所有沒有找到 CRISPR 的文件(那些N
是0
)都小於 3355 字節,所以也許可以使用它。我試過這個:
find . -name "*.out" -type 'f' -size -5k -exec mv {} /databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans_Crispr/ \;
但是對於我所有的文件,我都有這個
mv: cannot move './SRR5273182_chrm.fasta.fa-pilercr.out' to '/databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans-Crispr/': Not a directory
我嘗試了一些
for f in ...do done
或if then fi
。我嘗試grep
了這種模式' 0 putative CRISPR arrays found'
但它們都不起作用,總是一個錯誤或者我沒有找到我想要的。這是我的文件的一個範例:
這就是內容:使用 Crispr
Help on reading this report =========================== This report has three sections: Detailed, Summary by Similarity and Summary by Position. The detailed section shows each repeat in each putative CRISPR array. The summary sections give one line for each array. An 'array' is a contiguous sequence of CRISPR repeats looking like this: REPEAT Spacer REPEAT Spacer REPEAT ... Spacer REPEAT Within one array, repeats have high similarity and spacers are, roughly speaking, unique within a window around the array. In a given array, each repeat has a similar length, and each spacer has a similar length. With default parameters, the algorithm allows a fair amount of variability in order to maximize sensitivity. This may allow identification of inactive ("fossil") arrays, and may in rare cases also induce false positives due to other classes of repeats such as microsatellites, LTRs and arrays of RNA genes. Columns in the detailed section are: Pos Sequence position, starting at 1 for the first base. Repeat Length of the repeat. %id Identity with the consensus sequence. Spacer Length of spacer to the right of this repeat. Left flank 10 bases to the left of this repeat. Repeat Sequence of this repeat. Dots indicate positions where this repeat agrees with the consensus sequence below. Spacer Sequence of spacer to the right of this repeat, or 10 bases if this is the last repeat. The left flank sequence duplicates the end of the spacer for the preceding repeat; it is provided to facilitate visual identification of cases where the algorithm does not correctly identify repeat endpoints. At the end of each array there is a sub-heading that gives the average repeat length, average spacer length and consensus sequence. Columns in the summary sections are: Array Number 1, 2 ... referring back to the detailed report. Sequence FASTA label of the sequence. May be truncated. From Start position of array. To End position of array. # copies Number of repeats in the array. Repeat Average repeat length. Spacer Average spacer length. + +/-, indicating orientation relative to first array in group. Distance Distance from previous array. Consensus Consensus sequence. In the Summary by Similarity section, arrays are grouped by similarity of their consensus sequences. If consensus sequences are sufficiently similar, they are aligned to each other to indicate probable relationships between arrays. In the Summary by Position section, arrays are sorted by position within the input sequence file. The Distance column facilitates identification of cases where a single array has been reported as two adjacent arrays. In such a case, (a) the consensus sequences will be similar or identical, and (b) the distance will be approximately a small multiple of the repeat length + spacer length. Use the -noinfo option to turn off this help. Use the -help option to get a list of command line options. pilercr v1.06 By Robert C. Edgar /databis/defontis/Dossier_fasta_chrm_avec_piler/SRR2177954_chrm.fasta: 1 putative CRISPR arrays found. DETAIL REPORT Array 1 >SRR2177954.k141_500270 flag=1 multi=9.2309 len=7453 Pos Repeat %id Spacer Left flank Repeat Spacer ========== ====== ====== ====== ========== ==================================== ====== 66 36 100.0 25 CAGAAGTATT .................................... CTCACACACGCTGATGCAGACAACA 127 36 100.0 26 GCAGACAACA .................................... GCGAGAGCAGGGATTTGGAACGTAAT 189 36 100.0 26 GGAACGTAAT .................................... ATGTTGATGGAAAAACTCCCACAGAC 251 36 100.0 TCCCACAGAC .................................... ACTGAATGTG ========== ====== ====== ====== ========== ==================================== 4 36 25 ATCTACAAAAGTAGAAATTTTATAGAGGTATTTGGC SUMMARY BY SIMILARITY Array Sequence Position Length # Copies Repeat Spacer + Consensus ===== ================ ========== ========== ======== ====== ====== = ========= 1 SRR2177954.k141_ 66 221 4 36 25 + ATCTACAAAAGTAGAAATTTTATAGAGGTATTTGGC SUMMARY BY POSITION >SRR2177954.k141_500270 flag=1 multi=9.2309 len=7453 Array Sequence Position Length # Copies Repeat Spacer Distance Consensus ===== ================ ========== ========== ======== ====== ====== ========== ========= 1 SRR2177954.k141_ 66 221 4 36 25 ATCTACAAAAGTAGAAATTTTATAGAGGTATTTGGC
沒有 Crispr
Help on reading this report =========================== This report has three sections: Detailed, Summary by Similarity and Summary by Position. The detailed section shows each repeat in each putative CRISPR array. The summary sections give one line for each array. An 'array' is a contiguous sequence of CRISPR repeats looking like this: REPEAT Spacer REPEAT Spacer REPEAT ... Spacer REPEAT Within one array, repeats have high similarity and spacers are, roughly speaking, unique within a window around the array. In a given array, each repeat has a similar length, and each spacer has a similar length. With default parameters, the algorithm allows a fair amount of variability in order to maximize sensitivity. This may allow identification of inactive ("fossil") arrays, and may in rare cases also induce false positives due to other classes of repeats such as microsatellites, LTRs and arrays of RNA genes. Columns in the detailed section are: Pos Sequence position, starting at 1 for the first base. Repeat Length of the repeat. %id Identity with the consensus sequence. Spacer Length of spacer to the right of this repeat. Left flank 10 bases to the left of this repeat. Repeat Sequence of this repeat. Dots indicate positions where this repeat agrees with the consensus sequence below. Spacer Sequence of spacer to the right of this repeat, or 10 bases if this is the last repeat. The left flank sequence duplicates the end of the spacer for the preceding repeat; it is provided to facilitate visual identification of cases where the algorithm does not correctly identify repeat endpoints. At the end of each array there is a sub-heading that gives the average repeat length, average spacer length and consensus sequence. Columns in the summary sections are: Array Number 1, 2 ... referring back to the detailed report. Sequence FASTA label of the sequence. May be truncated. From Start position of array. To End position of array. # copies Number of repeats in the array. Repeat Average repeat length. Spacer Average spacer length. + +/-, indicating orientation relative to first array in group. Distance Distance from previous array. Consensus Consensus sequence. In the Summary by Similarity section, arrays are grouped by similarity of their consensus sequences. If consensus sequences are sufficiently similar, they are aligned to each other to indicate probable relationships between arrays. In the Summary by Position section, arrays are sorted by position within the input sequence file. The Distance column facilitates identification of cases where a single array has been reported as two adjacent arrays. In such a case, (a) the consensus sequences will be similar or identical, and (b) the distance will be approximately a small multiple of the repeat length + spacer length. Use the -noinfo option to turn off this help. Use the -help option to get a list of command line options. pilercr v1.06 By Robert C. Edgar /databis/defontis/Dossier_fasta_chrm_avec_piler/ERR1544006_chrm.fasta: 0 putative CRISPR arrays found.
謝謝你的時間
只需遍歷文件,然後
grep
為: 0 putative CRISPR regions
. 如果grep
找到匹配項,則移動文件:mkdir -p Sans_crispr Avec_crispr for file in *pilercr.out; do if grep -q ': 0 putative CRISPR arrays' "$file"; then mv "$file" Sans_crispr else mv "$file" Avec_crispr fi done
-q
標誌告訴它不要列印任何輸出,但如果沒有找到匹配項,grep
它仍然會以失敗狀態退出,如果找到匹配項,它仍然會退出。所以在這裡我們使用它來將文件移動到適當的文件夾。您收到此錯誤的原因:
mv: cannot move './SRR5273182_chrm.fasta.fa-pilercr.out' to '/databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans-Crispr/': Not a directory
是因為目錄
/databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans-Crispr/
不存在。這就是為什麼上面小腳本中的第一個命令的mkdir -p Sans_crispr Avec_crispr
意思是“創建目錄 Sans_crispr 和 Avec_crispr ,除非它們不存在”。