Text-Processing
使用 awk 生成一系列新文件的 2 個文件之間的算術運算
我有一個製表符分隔的模型輸入文件,我想改變這種格式的集成分析
cat input.txt
############################################# ### Parameter file for the program ### ############################################# ### GENERAL PARAMETERS 4 /* nbout # Number of outputs */ 46 /* numesp # Number of species */ 0.05 /* p # light incidence param (diff through turbid medium) */ 0.1357158 0.2446549 0.3535940 0.4992873 0.6449806 0.6957850 0.7465893 0.8130218 0.8794543 0.9397271 1.0000000 0.9397271 0.8794543 0.8078294 0.7362045 0.6899817 0.6437589 0.5989616 0.5541642 0.4617186 0.3692730 0.3633708 0.3574686 0.2426215 /* normalized daily light course (from 7am to 7pm, with a half-hour time-step */ 1 /* vox_la_max. The max voxel leaf area. */ 0 /* l_growth_scheme. 0 = top down; 1 = random; 2 = homogeneous; 3 = bottom up */ 0.1 /* knockout_max. Parameter controlling the extent to which lianas can knock out trees */ 0.05 /* shed_prob. With this probability, the liana is completely shed from the voxel. */ ### Species description **** Nmass LMA wsg dmax hmax ah tmax seedmass Fregdistgr Pmass g1 s_liana Alvaradoa_amorphoides 0.0214 74.775 0.584 0.5 24.44 0.892 1 0.0078 40 0.00145 3.77 0 Annona_reticulata 0.0350 74.529 0.503 0.5 24.44 0.892 1 0.2392 40 0.00142 3.77 0 Brosimum_alicastrum 0.0201 104.281 0.760 0.5 17.31 0.117 1 1.2486 40 0.00097 3.77 0 ### Climate (input environment) 25.47447 26.02723 26.87827 27.58436 26.95839 25.63987 25.61669 25.26543 24.99990 24.10808 24.71997 24.67287 /*Temperature in degree C*/
我有另一個選項卡分隔的乘數文件,從格式如下的分佈中選擇
cat multipliers.txt
2 3 4 3 2 2 4 3 3
我正在嘗試將 3 個特定輸入欄位乘以乘數,以生成一系列與乘數相等的新輸入文件(在本例中為 3),同時保持輸入文件的其餘部分不變。在這種情況下,我想將第一個文件分別乘以
vox_la_max
、knockout_max
和shed_prob
2、3 和 4,第二個文件乘以 3、2 和 2,第三個文件乘以 4、3 和 3。我會生成 3 個新文件,例如這樣cat input1.txt
############################################# ### Parameter file for the program ### ############################################# ### GENERAL PARAMETERS 4 /* nbout # Number of outputs */ 46 /* numesp # Number of species */ 0.05 /* p # light incidence param (diff through turbid medium) */ 0.1357158 0.2446549 0.3535940 0.4992873 0.6449806 0.6957850 0.7465893 0.8130218 0.8794543 0.9397271 1.0000000 0.9397271 0.8794543 0.8078294 0.7362045 0.6899817 0.6437589 0.5989616 0.5541642 0.4617186 0.3692730 0.3633708 0.3574686 0.2426215 /* normalized daily light course (from 7am to 7pm, with a half-hour time-step */ 2 /* vox_la_max. The max voxel leaf area. */ 0 /* l_growth_scheme. 0 = top down; 1 = random; 2 = homogeneous; 3 = bottom up */ 0.3 /* knockout_max. Parameter controlling the extent to which lianas can knock out trees */ 0.2 /* shed_prob. With this probability, the liana is completely shed from the voxel. */ ### Species description **** Nmass LMA wsg dmax hmax ah tmax seedmass Fregdistgr Pmass g1 s_liana Alvaradoa_amorphoides 0.0214 74.775 0.584 0.5 24.44 0.892 1 0.0078 40 0.00145 3.77 0 Annona_reticulata 0.0350 74.529 0.503 0.5 24.44 0.892 1 0.2392 40 0.00142 3.77 0 Brosimum_alicastrum 0.0201 104.281 0.760 0.5 17.31 0.117 1 1.2486 40 0.00097 3.77 0 ### Climate (input environment) 25.47447 26.02723 26.87827 27.58436 26.95839 25.63987 25.61669 25.26543 24.99990 24.10808 24.71997 24.67287 /*Temperature in degree C*/
cat input2.txt
############################################# ### Parameter file for the program ### ############################################# ### GENERAL PARAMETERS 4 /* nbout # Number of outputs */ 46 /* numesp # Number of species */ 0.05 /* p # light incidence param (diff through turbid medium) */ 0.1357158 0.2446549 0.3535940 0.4992873 0.6449806 0.6957850 0.7465893 0.8130218 0.8794543 0.9397271 1.0000000 0.9397271 0.8794543 0.8078294 0.7362045 0.6899817 0.6437589 0.5989616 0.5541642 0.4617186 0.3692730 0.3633708 0.3574686 0.2426215 /* normalized daily light course (from 7am to 7pm, with a half-hour time-step */ 3 /* vox_la_max. The max voxel leaf area. */ 0 /* l_growth_scheme. 0 = top down; 1 = random; 2 = homogeneous; 3 = bottom up */ 0.2 /* knockout_max. Parameter controlling the extent to which lianas can knock out trees */ 0.1 /* shed_prob. With this probability, the liana is completely shed from the voxel. */ ### Species description **** Nmass LMA wsg dmax hmax ah tmax seedmass Fregdistgr Pmass g1 s_liana Alvaradoa_amorphoides 0.0214 74.775 0.584 0.5 24.44 0.892 1 0.0078 40 0.00145 3.77 0 Annona_reticulata 0.0350 74.529 0.503 0.5 24.44 0.892 1 0.2392 40 0.00142 3.77 0 Brosimum_alicastrum 0.0201 104.281 0.760 0.5 17.31 0.117 1 1.2486 40 0.00097 3.77 0 ### Climate (input environment) 25.47447 26.02723 26.87827 27.58436 26.95839 25.63987 25.61669 25.26543 24.99990 24.10808 24.71997 24.67287 /*Temperature in degree C*/
cat input3.txt
############################################# ### Parameter file for the program ### ############################################# ### GENERAL PARAMETERS 4 /* nbout # Number of outputs */ 46 /* numesp # Number of species */ 0.05 /* p # light incidence param (diff through turbid medium) */ 0.1357158 0.2446549 0.3535940 0.4992873 0.6449806 0.6957850 0.7465893 0.8130218 0.8794543 0.9397271 1.0000000 0.9397271 0.8794543 0.8078294 0.7362045 0.6899817 0.6437589 0.5989616 0.5541642 0.4617186 0.3692730 0.3633708 0.3574686 0.2426215 /* normalized daily light course (from 7am to 7pm, with a half-hour time-step */ 4 /* vox_la_max. The max voxel leaf area. */ 0 /* l_growth_scheme. 0 = top down; 1 = random; 2 = homogeneous; 3 = bottom up */ 0.3 /* knockout_max. Parameter controlling the extent to which lianas can knock out trees */ 0.15 /* shed_prob. With this probability, the liana is completely shed from the voxel. */ ### Species description **** Nmass LMA wsg dmax hmax ah tmax seedmass Fregdistgr Pmass g1 s_liana Alvaradoa_amorphoides 0.0214 74.775 0.584 0.5 24.44 0.892 1 0.0078 40 0.00145 3.77 0 Annona_reticulata 0.0350 74.529 0.503 0.5 24.44 0.892 1 0.2392 40 0.00142 3.77 0 Brosimum_alicastrum 0.0201 104.281 0.760 0.5 17.31 0.117 1 1.2486 40 0.00097 3.77 0 ### Climate (input environment) 25.47447 26.02723 26.87827 27.58436 26.95839 25.63987 25.61669 25.26543 24.99990 24.10808 24.71997 24.67287 /*Temperature in degree C*/
我認為我應該使用 awk,但到目前為止,我只能成功地一次使用一列乘法器文件改變一個參數,並且我需要能夠同時改變這 3 個參數。我可以設置什麼樣的腳本來生成這些輸出?
**TL; DR:**為您的範例硬編碼的緊湊
awk
腳本NR != FNR { out = "out" FNR ".txt" printf "" > out for (l=m=1; l <= nl; l++) printf tmpl[l] ORS, l in vals ? $(m++)*vals[l] : 0 >> out close(out) next } { gsub(/%/, "%%") # here is the regex that selects the fields by their name if ($3 ~ /^(vox_la_max|knockout_max|shed_prob)[^[:alnum:]_]*$/) { vals[NR] = $1 sub(/^[0-9]+(\.[0-9]+)?/, OFMT) } tmpl[NR] = $0; nl++ }
將其用作:
LC_NUMERIC=C awk -f script input.txt multipliers.txt
它生成名為
outX.txt
.
LC_NUMERIC=C
如果您的語言環境將使用逗號而不是點作為浮點值的小數分隔符,則需要該位。為簡單起見,我做了一些看起來合理的假設:
- 想要的輸入欄位始終是單獨的值,相鄰的註釋將欄位名稱指示為一個單詞,必須用空格(至少一個空格)與
/*
- 沒有同名的欄位
- 浮點值僅用數字和(可能)一個點表示,即沒有指數或其他科學表示
與上面相同的腳本,但冗長、描述和擴展以允許:
- 按行號任意指定所需欄位
- 通過屬於每個欄位的輸入行上的註釋所引用的所需欄位的名稱任意指定
- 輸出文件自動以輸入文件名命名,輸入文件名可能有一個副檔名(例如 .txt),並且其指示的路徑(如果有)不能有點;換句話說,最好從包含輸入文件的目錄中執行腳本
# some preparations BEGIN { # output files named as the input file name split(ARGV[1], f, ".") outpfx = f[1] # remember wanted fields specified on command line as comma-separated line numbers if (nums) { # split variable "nums" on comma into helper array "r" n = split(nums, r, ",") # loop over helper array to build final array, thus indexed by wanted line numbers while (n) rows[r[n--]] } } # here we operate on multipliers file NR != FNR { # output file name for this set of multipliers out = outpfx FNR ".txt" # create/overwrite this output file printf "" > out # loop over template lines scanned from input file for (linenum = multnum = 1; linenum <= numlines; linenum++) # use the template line as printf format string to consume values to be multiplied (if any) printf tmpl[linenum] ORS, linenum in wanted_values ? $(multnum++)*wanted_values[linenum] : 0 >> out close(out) next } # here we scan the input file to build a template for printf { # escape existing % chars as we are going to leverage printfs own format string which is %-based gsub(/%/, "%%") # on specified line numbers or named fields: if (NR in rows || names && match($3, "^("names")[^[:alnum:]_]*$")) { # remember this value wanted_values[NR] = $1 # replace the original value with the printfs conversion specification for floating-point values # it will be used by printf later on while processing the multipliers file sub(/^[0-9]+(\.[0-9]+)?/, OFMT) } # remember this whole line as a template tmpl[NR] = $0; numlines++ }
像這樣使用它:
# specify fields by their line numbers, each separated by a comma LC_NUMERIC=C awk -f script -v nums=36,38,39 input.txt multipliers.txt # or specify fields by their names, each separated by the | character (NOTE it's a regexp) LC_NUMERIC=C awk -f script -v names='vox_la_max|knockout_max|shed_prob' input.txt multipliers.txt # or also use both ways of specifying fields LC_NUMERIC=C awk -f script -v nums=15,112,234,71,5 -v names='vox_la_max|numesp' input.txt multipliers.txt
如果您指定的欄位多於乘數,則超出的欄位將變為
0
(乘以 0)。如果您指定的欄位少於乘數,則會簡單地忽略超出的乘數。
在任何情況下,這些欄位總是按照它們出現的行號的順序消耗乘數,即輸入文件中遇到的第一個欄位會消耗第一個乘數,無論您如何指定該欄位。