Text-Processing
在新列中添加和減去多列
我複制了部分 csv 文件。
publish_date,headline_text,likes_count,comments_count,shares_count,love_count,wow_count,haha_count,sad_count,thankful_count,angry_count 20030219,aba decides against community broadcasting licence,1106,118,109,155,6,5,2,0,6 20030219,act fire witnesses must be aware of defamation,137,362,67,0,0,0,0,0,0 20030219,a g calls for infrastructure protection summit,357,119,212,0,0,0,0,0,0 20030219,air nz staff in aust strike for pay rise,826,254,105,105,21,45,7,0,90 20030219,air nz strike to affect australian travellers,693,123,153,17,113,4,103,0,7 20030219,ambitious olsson wins triple jump,488,57,161,0,0,0,0,0,0 20030219,antic delighted with record breaking barca,386,60,80,3,4,0,93,0,68 20030219,aussie qualifier stosur wastes four memphis match,751,45,297,0,0,0,0,0,0 20030219,aust addresses un security council over iraq,3847,622,141,1,0,0,0,0,0 20030219,australia is locked into war timetable opp,1330,205,874,0,0,0,0,0,0 20030219,australia to contribute 10 million in aid to iraq,3530,130,0,23,16,4,1,0,0 20030219,barca take record as robson celebrates birthday in,13875,331,484,0,0,0,0,0,0 20030219,bathhouse plans move ahead,11202,450,2576,433,51,20,4,0,34 20030219,big hopes for launceston cycling championship,3988,445,955,0,0,0,0,0,0 20030219,big plan to boost paroo water supplies,460,101,92,0,0,0,0,0,0 20030219,blizzard buries united states in bills,303,223,193,0,0,0,0,0,0
我想找到一個shell命令,它可以幫助我創建一個新列,將每個條目(likes_count+love_count+thankful_count)-(angry_count+sad_count)相加,並將列命名為emotion_polarity。
我努力了
awk -F , {$12=$3+$6+$10-$11-$9;}{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12} file
但由於某種原因,列混合在一起不起作用。我認為這可能是因為我在執行此操作時失去了逗號
也設置OFS(輸出欄位分隔符),這樣您就不會失去逗號。當您這樣做時,它會失去逗號
$12=$3+$6+$10-$11-$9
,即設置/更新任何列的值,在這種情況下,awk根據 OFS 內部變數(預設情況下為空格字元)在目前行上進行欄位拆分,因此將其設置為逗號將列印時保持輸出。awk 'BEGIN{ FS=OFS="," } { $(NF+1)=(NR==1? "emotional_polarity" : $3+$6+$10-$11-$9); print }' infile
或者只是將新的更新附加到目前輸入行:
awk -F, '{ $0=$0 FS (NR==1? "emotional_polarity" : $3+$6+$10-$11-$9); print }' infile
來自awk 手冊:
FS
輸入欄位分隔符(請參閱指定欄位分隔方式一節)。該值是與輸入記錄中的欄位之間的分隔符匹配的單字元字元串或多字元正則表達式。
OFS
輸出欄位分隔符(請參閱輸出分隔符部分)。它在列印語句列印的欄位之間輸出。它的預設值是“ ”,一個由一個空格組成的字元串。