Grep

具有在文件最後一行添加尾隨換行符的副作用的 Grep 命令

  • January 21, 2020

我一直在研究如何從最後一行可能沒有尾隨換行符的文件中正確讀取行。在Read a line-oriented file which may not end with a newline中找到了答案。

但是,我有第二個目標是排除行首的註釋,並找到了grep實現目標的命令

$ grep -v '^ *#' file

但是我注意到這個命令有一個(對我來說出乎意料的)副作用:如果它不存在,它會在最後一行添加一個尾隨換行符

$ cat file
# This is a commentary
aaaaaa
# This is another commentary
bbbbbb
cccccc

$ od -c file
0000000   #       T   h   i   s       i   s       a       c   o   m   m
0000020   e   n   t   a   r   y  \n   a   a   a   a   a   a  \n   #
0000040   T   h   i   s       i   s       a   n   o   t   h   e   r
0000060   c   o   m   m   e   n   t   a   r   y  \n   b   b   b   b   b
0000100   b  \n   c   c   c   c   c   c  \n
0000111

$ truncate -s -1 file

$ od -c file
0000000   #       T   h   i   s       i   s       a       c   o   m   m
0000020   e   n   t   a   r   y  \n   a   a   a   a   a   a  \n   #
0000040   T   h   i   s       i   s       a   n   o   t   h   e   r
0000060   c   o   m   m   e   n   t   a   r   y  \n   b   b   b   b   b
0000100   b  \n   c   c   c   c   c   c
0000110

$ od -c <(grep -v '^ *#' file)
0000000   a   a   a   a   a   a  \n   b   b   b   b   b   b  \n   c   c
0000020   c   c   c   c  \n
0000025

請注意,除了刪除行開頭的註釋外,它還在最後一行添加了尾隨換行符。

怎麼可能?

POSIX 規範指出

A line is a sequence of zero or more non-<newline> characters plus a terminating
<newline> character.

的行為grep是預期的。它將缺少的尾隨換行符添加到incomplete line.

這邊走:

$ cat file
# This is a commentary
aaaaaa
# This is another commentary
bbbbbb
cccccc

$ od -c file
0000000   #       T   h   i   s       i   s       a       c   o   m   m
0000020   e   n   t   a   r   y  \n   a   a   a   a   a   a  \n   #
0000040   T   h   i   s       i   s       a   n   o   t   h   e   r
0000060   c   o   m   m   e   n   t   a   r   y  \n   b   b   b   b   b
0000100   b  \n   c   c   c   c   c   c  \n
0000111

$ truncate -s -1 file

$ od -c file
0000000   #       T   h   i   s       i   s       a       c   o   m   m
0000020   e   n   t   a   r   y  \n   a   a   a   a   a   a  \n   #
0000040   T   h   i   s       i   s       a   n   o   t   h   e   r
0000060   c   o   m   m   e   n   t   a   r   y  \n   b   b   b   b   b
0000100   b  \n   c   c   c   c   c   c
0000110

$ od -c <(grep '.' file)
0000000   #       T   h   i   s       i   s       a       c   o   m   m
0000020   e   n   t   a   r   y  \n   a   a   a   a   a   a  \n   #    
0000040   T   h   i   s       i   s       a   n   o   t   h   e   r    
0000060   c   o   m   m   e   n   t   a   r   y  \n   b   b   b   b   b
0000100   b  \n   c   c   c   c   c   c  \n
0000111

引用自:https://unix.stackexchange.com/questions/562665