從 bibtex 文件中提取選定條目的腳本

January 15, 2022

我有一個包含許多條目的大型 bibtex 文件，其中每個條目都有一般結構
@ARTICLE{AuthorYear,
item = {...},
item = {...},
item = {...},
etc
}
（在某些情況下ARTICLE可能是一個不同的詞，例如BOOK）
我想做的是編寫一個簡單的腳本（最好只是一個 shell 腳本）來提取具有給定 AuthorYear 的條目並將它們放入一個新的 .bib 文件中。
我可以想像我可以通過 AuthorYear 辨識條目的第一句話，通過單次關閉辨識最後一句，}也許可以sed用來提取條目，但我真的不知道如何準確地做到這一點。有人可以告訴我如何實現這一目標嗎？
它可能應該是這樣的
sed -n "/AuthorYear/,/\}/p" file.bib
但是由於}條目的第一項關閉而停止，因此給出了以下輸出：
@ARTICLE{AuthorYear,
item = {...},
所以我需要辨識是否}是一行中唯一的字元，並且只有在這種情況下才停止閱讀。

以下 Python 腳本執行所需的過濾。

#!/usr/bin/python
import re

# Bibliography entries to retrieve
# Multiple pattern compilation from: http://stackoverflow.com/a/11693340/147021
pattern_strings = ['Author2010', 'Author2012',]
pattern_string = '|'.join(pattern_strings)
patterns = re.compile(pattern_string)


with open('bibliography.bib', 'r') as bib_file:
   keep_printing = False
   for line in bib_file:
       if patterns.findall(line):
           # Beginning of an entry
           keep_printing = True

       if line.strip() == '}':
           if keep_printing:
               print line
               # End of an entry -- should be the one which began earlier
               keep_printing = False

       if keep_printing:
           # The intermediate lines
           print line,

就個人而言，當過濾邏輯變得複雜時，我更喜歡使用腳本語言。這也許至少在可讀性方面具有優勢。

我建議使用經過實戰考驗的 BibTeX 庫的語言，而不是重新發明那個輪子。例如

#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use BibTeX::Parser;

open my $fh, '&lt;', $ARGV[0];
my $parser = BibTeX::Parser-&gt;new($fh);
my @authoryear;
while (my $entry = $parser-&gt;next) {
   if ($entry-&gt;parse_ok) {
       if ($entry-&gt;key eq "AuthorYear") {
           push @authoryear, $entry;
       }
   }
   else {
       warn "Error parsing file: " . $entry-&gt;error;
   }
}

# I'm not familiar with bibtex files, so this may be insufficient
open my $out, '&gt;', "authoryear.bib";
foreach my $entry (@authoryear) {
   say $out $entry-&gt;raw_bibtex;
}

您可能必須安裝該模組：cpan install BibTeX::Parser

引用自：https://unix.stackexchange.com/questions/105893

從 bibtex 文件中提取選定條目的腳本

相關問答

解碼 URL 編碼（百分比編碼）

如何從文件中刪除所有評論？

與 csv 文件值比較

如何在替換期間忽略 sed 中的起始空格？

替代不寫入臨時文件的 sed -i

使用 ansible 或 unix 命令在文件中特定模式的每次匹配後插入行