Text-Processing

將文本文件重新格式化為 CSV 格式

  • May 3, 2020

樣本輸入

0bef-82-46-8a-9a0b.xml "Fruits/Mango Apple /Plum cherry date">1446815.ABC
0bef-82-46-8a-9a0b 5da-0-ba-c1-1a9 "Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b ac-94-4ab-91-23 "Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b 5z-94-ab-92-2f3 "Fruits/Pear Banana/Plum orange mango"

952f-82-46-8a-9a0b.xml "Fruits/Mango"1244115.ABC
3cff-82-46-8a-9a0b.xml "Fruits/Big Mango/Not Sweet ">905499.ABC
6m0k-82-46-8a-9a0b.xml "Fruits/Big Pear/Very Sweet">855499.ABC

17a-42-df-c24.xml "Fruits Market/Big Apple/Sweet "1483415.ABC
17a-42-df-c24 54-ba-4411-9-3d8 "Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24 2da5-0-4a-b1-e89 "Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24 b7-94-4db-92-2f3 "Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24 4d-67c-446-b5-ac "Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24 2-8b-4det-87-769 "Veg/Radish /Radish Carrot Celery Onion"

預期產出 -

0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,5da-0-ba-c1-1a9,"Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,ac-94-4ab-91-23,"Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,5z-94-ab-92-2f3,"Fruits/Pear Banana/Plum orange mango"

952f-82-46-8a-9a0b.xml,"Fruits/Mango",,
3cff-82-46-8a-9a0b.xml,"Fruits/Big Mango/Not Sweet ",,
6m0k-82-46-8a-9a0b.xml,"Fruits/Big Pear/Very Sweet",,


17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,54-ba-4411-9-3d8,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,2da5-0-4a-b1-e89,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,b7-94-4db-92-2f3,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,4d-67c-446-b5-ac,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,2-8b-4det-87-769,"Veg/Radish /Radish Carrot Celery Onion"

在輸入原始數據中:

  1. 每行中沒有前導和尾隨空格。
  2. 行之間沒有空格。顯示的空白區域旨在使其看起來像樣/易於理解。最終輸出中的空白也不需要。
  3. 幾行中缺少符號“>”。這不是一個錯字。

您能否指導我如何使用 bash/shell 腳本(sed、awk 等)重新格式化。我迷路了。

使用awk

awk '{
 if (sub(/\.xml /, ".xml,")){      # replace `.xml ` with `.xml,`
   if (NR>1 && is_processed != 1){ # xml line was not printed?
      print xml","                 # print previous xml line + `,`
   }
   sub(/>?[0-9]+\.ABC$/, ",") # replace strings `>1446815.ABC` or `1244115.ABC` with `,`
   xml=$0                     # save line in variable `xml`
   is_processed=0             # clear flag
 }
 else {
   if (!NF) next  # skip empty line
   sub(/ /, ",")  # replace 1st ` ` with `,`
   sub(/ /, ",")  # replace 2nd ` ` with `,`
   print xml$0    # print xml line + current line
   is_processed=1 # set flag
 }
}
END {
 # print possible remaining line
 if (is_processed != 1) print xml","
}' filein > fileout

if-block 處理包含的行並將.xml其保存在變數中xml。-blockelse處理 xml 行的以下“子代”並列印 xml 行加上修改後的行,前兩個空格字元被逗號替換。空行被跳過。

如果沒有“子”,則帶有附加逗號的 xml 行將列印在頂部if塊(如果行號大於 1)或END塊中。

輸出(fileout):

0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,5da-0-ba-c1-1a9,"Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,ac-94-4ab-91-23,"Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,5z-94-ab-92-2f3,"Fruits/Pear Banana/Plum orange mango"
952f-82-46-8a-9a0b.xml,"Fruits/Mango",,
3cff-82-46-8a-9a0b.xml,"Fruits/Big Mango/Not Sweet ",,
6m0k-82-46-8a-9a0b.xml,"Fruits/Big Pear/Very Sweet",,
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,54-ba-4411-9-3d8,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,2da5-0-4a-b1-e89,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,b7-94-4db-92-2f3,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,4d-67c-446-b5-ac,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,2-8b-4det-87-769,"Veg/Radish /Radish Carrot Celery Onion"

引用自:https://unix.stackexchange.com/questions/555643