對列表進行數字排序

March 25, 2021

我有一個具有以下結構的文本列表（每個條目上的所有行都以製表符空格開頭，這些行之間沒有空行，並且條目之間有一個空行）：

 292G.- La Ilíada (tomo I) ; Collection one (volume 3) ; Homer ; http://www.ataun.eus/BIBLIOTECAGRATUITA/Cl%C3%A1sicos%20en%20Espa%C3%B1ol/Homero/Iliada.pdf
 - I have to download more ancient greek texts.
 - Another note line.

 293G.- El Ingenioso Hidalgo "Don Quijote" De La Mancha ; Collection one (volume 1) ; Miguel de Cervantes ; http://www.daemcopiapo.cl/Biblioteca/Archivos/7_6253.pdf
 - Masterpiece.

 294G.- Crimen y castigo ; Collection one (volume 4) ; Fiódor Dostoyevski ; http://www.ataun.eus/BIBLIOTECAGRATUITA/Cl%C3%A1sicos%20en%20Espa%C3%B1ol/Fedor%20Dostoiewski/Crimen%20y%20castigo.pdf
 - Russian masterpiece.

 295G.- La isla del tesoro ; Collection one (volume 2) ; Robert Louis Stevenson ; https://www.biblioteca.org.ar/libros/130864.pdf
 - I read this one as a kid.

從位置 292G 開始，繼續收集超過 100 卷的 Collection one。我希望這 100 卷按卷號排序（可以在第二個欄位中找到）。預期的輸出是：

 292G.- El Ingenioso Hidalgo "Don Quijote" De La Mancha ; Collection one (volume 1) ; Miguel de Cervantes ; http://www.daemcopiapo.cl/Biblioteca/Archivos/7_6253.pdf
 - Masterpiece.

 293G.- La isla del tesoro ; Collection one (volume 2) ; Robert Louis Stevenson ; https://www.biblioteca.org.ar/libros/130864.pdf
 - I read this one as a kid.

 294G.- La Ilíada (tomo I) ; Collection one (volume 3) ; Homer ; http://www.ataun.eus/BIBLIOTECAGRATUITA/Cl%C3%A1sicos%20en%20Espa%C3%B1ol/Homero/Iliada.pdf
 - I have to download more ancient greek texts.
 - Another note line.

 295G.- Crimen y castigo ; Collection one (volume 4) ; Fiódor Dostoyevski ; http://www.ataun.eus/BIBLIOTECAGRATUITA/Cl%C3%A1sicos%20en%20Espa%C3%B1ol/Fedor%20Dostoiewski/Crimen%20y%20castigo.pdf
 - Russian masterpiece.

請注意，標題可以包含字元和字元串，例如", (, )，但不能包含;（它們僅用作分隔符）。我想sort這裡有答案，但這超出了我的菜鳥技能。

這（使用 GNU awk 將第三個參數用於match()、gensub()、sorted_in和FPAT）只會對您想要的部分進行排序（即序列號為“292”或更大的集合“one”），可以處理包含任何字元或字元串的標題包括;, (,)或(volume <N>), 並將在未排序的周圍部分中的原始位置輸出已排序的部分：

$ cat tst.awk
BEGIN {
   RS = ""
   ORS = "\n\n"
   FPAT = "[^;]*(\"[^\"]*\")*[^;]*"
   tgtColl = "one"
   begSeqNr = 292
   maxSeqs = 100
}
match($2,/Collection (.*) \(volume ([0-9]+))/,a) {
   coll  = a[1]
   volNr = a[2]
   seqNr = $1+0
}
(coll == tgtColl) && (seqNr &gt;= begSeqNr) && (++seqCnt &lt;= maxSeqs) {
   vols[volNr] = $0
   next
}
{
   prtVols()
   print
}
END { prtVols() }

function prtVols(       volNr, seqNr, vol) {
   PROCINFO["sorted_in"] = "@ind_num_asc"
   seqNr = begSeqNr
   for (volNr in vols) {
       vol = vols[volNr]
       sub(/[0-9]+/,seqNr++,vol)
       print vol
   }
   delete vols
}

例如，假設這個輸入是從問題中的晴天案例修改的，以添加幾個有用的測試案例：

$ cat file
 100G.- some earlier collection ; Collection zero (volume 1) ; Robert Louis Stevenson ; https://www.biblioteca.org.ar/libros/130864.pdf
 - TEST earlier collection ID

 200G.- right collection, too early sequence number; Collection one (volume 6) ; Homer ; http://www.ataun.eus/BIBLIOTECAGRATUITA/Cl%C3%A1sicos%20en%20Espa%C3%B1ol/Homero/Iliada.pdf
 - TEST earlier sequence number

 292G.- La Ilíada ; Collection one (volume 3) ; Homer ; http://www.ataun.eus/BIBLIOTECAGRATUITA/Cl%C3%A1sicos%20en%20Espa%C3%B1ol/Homero/Iliada.pdf
 - I have to download more ancient greek texts.
 - Another note line.

 293G.- El Quijote ; Collection one (volume 1) ; Miguel de Cervantes ; http://www.daemcopiapo.cl/Biblioteca/Archivos/7_6253.pdf
 - Masterpiece.

 294G.- Crimen y castigo ; Collection one (volume 4) ; Fiódor Dostoyevski ; http://www.ataun.eus/BIBLIOTECAGRATUITA/Cl%C3%A1sicos%20en%20Espa%C3%B1ol/Fedor%20Dostoiewski/Crimen%20y%20castigo.pdf
 - Russian masterpiece.

 295G.- "Kill Bill; Bury Him (volume 2)" ; Collection one (volume 5) ; Tarantino ; https://www.biblioteca.org.ar/libros/130864.pdf
 - TEST quoted title with sparator chars and target string

 296G.- La isla del tesoro ; Collection one (volume 2) ; Robert Louis Stevenson ; https://www.biblioteca.org.ar/libros/130864.pdf
 - I read this one as a kid.

 300G.- some later collection ; Collection twenty-three (volume 2) ; Robert Louis Stevenson ; https://www.biblioteca.org.ar/libros/130864.pdf
 - TEST later collecion ID

它將輸出：

$ awk -f tst.awk file
 100G.- some earlier collection ; Collection zero (volume 1) ; Robert Louis Stevenson ; https://www.biblioteca.org.ar/libros/130864.pdf
 - TEST earlier collection ID

 200G.- right collection, too early sequence number; Collection one (volume 6) ; Homer ; http://www.ataun.eus/BIBLIOTECAGRATUITA/Cl%C3%A1sicos%20en%20Espa%C3%B1ol/Homero/Iliada.pdf
 - TEST earlier sequence number

 292G.- El Quijote ; Collection one (volume 1) ; Miguel de Cervantes ; http://www.daemcopiapo.cl/Biblioteca/Archivos/7_6253.pdf
 - Masterpiece.

 293G.- La isla del tesoro ; Collection one (volume 2) ; Robert Louis Stevenson ; https://www.biblioteca.org.ar/libros/130864.pdf
 - I read this one as a kid.

 294G.- La Ilíada ; Collection one (volume 3) ; Homer ; http://www.ataun.eus/BIBLIOTECAGRATUITA/Cl%C3%A1sicos%20en%20Espa%C3%B1ol/Homero/Iliada.pdf
 - I have to download more ancient greek texts.
 - Another note line.

 295G.- Crimen y castigo ; Collection one (volume 4) ; Fiódor Dostoyevski ; http://www.ataun.eus/BIBLIOTECAGRATUITA/Cl%C3%A1sicos%20en%20Espa%C3%B1ol/Fedor%20Dostoiewski/Crimen%20y%20castigo.pdf
 - Russian masterpiece.

 296G.- "Kill Bill; Bury Him (volume 2)" ; Collection one (volume 5) ; Tarantino ; https://www.biblioteca.org.ar/libros/130864.pdf
 - TEST quoted title with sparator chars and target string

 300G.- some later collection ; Collection twenty-three (volume 2) ; Robert Louis Stevenson ; https://www.biblioteca.org.ar/libros/130864.pdf
 - TEST later collecion ID

由於它是欄位分隔符，;因此標題中出現的任何內容都必須在雙引號內，無論是單獨引用Kill Bill";" Bury Him還是作為上面範例中整個引用標題的一部分，標題中的其他字元或字元串都不需要任何特殊處理。

如果你真的想要所有的集合one，而不僅僅是從一個序列號開始，反之亦然，這是一個非常微不足道的調整，很明顯只是不測試一個或另一個，同樣，如果你希望所有集合從給定開始排序，begSeqNr而不是其中只有 100 個，然後不包含的文本seqCnt，如果您不想列印周圍的集合/序列，那麼只需擺脫獨立print語句。

您沒有指定明確的語言要求，所以這裡是 python 3.8 中的一個骯髒的解決方案。我相信其他人可以想出更好的方法，但這應該足夠了。

該程式碼假定文本位於目前目錄中名為 list.txt 的文件中，並將創建一個名為 new-list.txt 的新文件

它也不處理“-La isla del tesoro”中的缺失空間

import re

booklist = []
bookcount = 0
entry = ''
line_numbers = []

# Find and return the volume number for a book
def get_volnum(book):
       volstring = ''
       volstring = re.search('\\(volume (\d+)\\)', book)
       volnum = volstring.group(1)
       return volnum

# Read file and put in doc variable
doc = open('list.txt', 'r').readlines()

# Group each book in a single string and append in a booklist
for line in doc:
   # if line begins with three decimals followed by 'G.', put line in a new entry. 
   if re.match("(\d\d\d)G.*", line): 
       #read the line number and append to a list
       line_numbers.append(line.split('G.')[0])
       # Add previous entry to booklist (without the three decimals and G.)
       if bookcount &gt; 0:
           booklist.append(entry.split('G.')[1])  

       entry = line
       bookcount +=1
   # If line begins with a '- ', concatenate the line into the current entry.
   if line.startswith('- '):
       entry += line

#Append last line
booklist.append(entry.split('G.')[1])  
# Make a list (booktable) that contains [volnum, book]
booktable = []
[booktable.append([get_volnum(book), book]) for book in booklist]

# Sort that list by volnum (index 0 of each list item of booktable)
booktable.sort(key=lambda x: int(x[0]))

line_numbers.sort()

# Write result to file
f = open("new-list.txt", "w")
for b in booktable:
   f.write(line_numbers.pop(0) + 'G.' + b[1])
   f.write('\n')

f.close()

引用自：https://unix.stackexchange.com/questions/639814

對列表進行數字排序

相關問答

如何按字母順序對文件的各個部分進行排序？

如何對 1000 行文件中的每 20 行進行排序，並僅將每個間隔中具有最高值的排序行保存到另一個文件中？

如何對具有 major.minor.patch 級別的列表進行排序，有時 rc 正確？

比較來自兩個不同文件的列並從第一個文件列印記錄那些與第二個文件不匹配的列

替換具有循環結構的第一列的內容

如何將 CSV 文件的列拆分為單獨的文件？