如何根據子字元串映射數據？

May 11, 2022

我有以下方式的數據：
staging_uw_pc_account_contact_role_hive_tb
staging_uw_pc_account_hive_tb
staging_uw_pc_account_location_hive_tb
uw_pc_account_contact_hive_tb
uw_pc_account_contact_hive_tb_backup
uw_pc_account_contact_role_hive_tb
uw_pc_account_contact_role_hive_tb_backup
如何根據以下規則創建地圖？
_backup從最後刪除
staging_從頭開始刪除
現在檢查映射。
結果應該是這樣的。在這種情況下，並非每個表都應該有暫存和備份，這些欄位應該是空的。
uw_pc_account_contact_role_hive_tb, uw_pc_account_contact_role_hive_tb_backup, staging_uw_pc_account_contact_role_hive_tb

以下awk腳本將檢測每行輸入中是否存在staging_前綴和/或_backup後綴。
如果存在前綴和後綴，則將其刪除，剩餘的字元串用作名為的關聯數組中的鍵map。
原始行保存在map數組中，以逗號分隔的字元串與生成的鍵相關聯。
最後，所有的整體map都被列印出來。
BEGIN {
       OFS = ", "
       prefix = "staging_"
       suffix = "_backup"
}

{
       key = $0
       sub("^" prefix, "", key)
       sub(suffix "$", "", key)

       map[key] = (map[key] == "" ? $0 : map[key] OFS $0)
}

END {
       for (key in map) print map[key]
}
使用文件中問題的數據進行測試執行file：
$ awk -f script file
staging_uw_pc_account_location_hive_tb
staging_uw_pc_account_hive_tb
uw_pc_account_contact_hive_tb, uw_pc_account_contact_hive_tb_backup
staging_uw_pc_account_contact_role_hive_tb, uw_pc_account_contact_role_hive_tb, uw_pc_account_contact_role_hive_tb_backup
各個輸出行中欄位的順序由原始文件中行的順序決定。

引用自：https://unix.stackexchange.com/questions/702178

如何根據子字元串映射數據？

相關問答

如何從文本文件中刪除視覺上的空行？

正則表達式會在字元串之後 grep 時間

使用 ‘sed’ 替換任何單詞，而不是字元

如何在日誌文件中用逗號分隔

僅刪除單引號中的逗號

從文件末尾刪除最後 n 個字元（包括 n r 和 ^Z）（使用 sed）