從由多個鍵鍵控的文件中提取值
考慮一個帶有
key=value
對的文件,每個文件key
可選地是多個key
s 的串聯。換句話說,許多key
s 可以映射到一個value
。這背後的原因是,key
與 的長度相比,每個單詞都是一個相對較短的單詞value
,因此數據被“壓縮”成更短的行。插圖(即不是真實值):
$ cat testfile AA,BB,CC=a-lengthy-value A,B,C=a-very-long-value D,E,F=another-very-long-value K1,K2,K3=many-many-more Z=more-long-value
假設所有
key
s 都是唯一的並且不包含以下字元是有效的:
key
分隔符:,
- 鍵值分隔符:
=
- 空白字元:
``key
s may come in any form in the *future* (with the above constraints), they *currently* adhere to the following regex coincidentally:
[[:upper:]]{2}[[:upper:]0-9]. Likewise,
values will not contain
=, so
=can be safely used to split each line. There are no multi-line
keys or
value`s, so it is also safe to process line-by-line.In order to facilitate data extraction from this file, a function
getval()
is defined as such:getval() { sed -n "/^\([^,]*,\)*$1\(,[^=]*\)*=\(.*\)$/{s//\3/p;q}" testfile }
As such, calling
getval A
will return the valuea-very-long-value
, nota-lengthy-value
. It should also return nothing for a non-existentkey
.Questions:
- Is the current definition of
getval()
robust enough?- Are there alternative ways of performing the data extraction that are possibly shorter/more expressive/more restrictive?
For what it’s worth, this script will run with cygwin’s
bash
andcoreutils
that comes with it. Portability is not required here as a result (i.e. only brownie points will be given). Thanks!edit:
Corrected function, added clarification about the keys.
edit 2:
Added clarification about the format (no multi-lines) and portability (not a requirement).`
You can write it in much more readable form using
awk`:getval() { awk -F'=' '$1~/\<'"$1"'\>/{print $2}' testfile }