Bash

從一個巨大的日誌文件中僅提取 GPS 位置到以日期標記命名的新文件

  • March 28, 2022

所以,我有巨大的(超過 10 萬條記錄)日誌文件,並且需要根據它們的日期戳提取所有 GPS 位置。

./production.log.109.gz:I, [2022-02-10T10:00:59.703529 #25190]  INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:35 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26343
./production.log.109.gz:I, [2022-02-10T10:01:13.939349 #25190]  INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:40 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26352
./production.log.109.gz:I, [2022-02-10T10:10:44.757308 #25190]  INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:10:40 +0000, GPS: 52.1773033,20.8162, SAT: 18, KM/H: 0, V: 25924

因此,基本上,對於我需要找到的那 3 條記錄10th February 2022,將兩個圖章剪切並粘貼"GPS:"到名為 的新文件2022-02-10.txt中,或者最好粘貼到合適的.KML文件中。

每個事件都在單獨的行中,因此您可以逐行讀取並用於在之後和之後regex查找文本- 然後您可以用作文件名並寫入TS:``GPS:``TS``append mode


最小的工作範例。

我只用iowithtext來模擬記憶體中的文件,但你應該使用open()

text = '''./production.log.109.gz:I, [2022-02-10T10:00:59.703529 #25190]  INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:35 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26343
./production.log.109.gz:I, [2022-02-10T10:01:13.939349 #25190]  INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:40 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26352
./production.log.109.gz:I, [2022-02-10T10:10:44.757308 #25190]  INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:10:40 +0000, GPS: 52.1773033,20.8162, SAT: 18, KM/H: 0, V: 25924
'''

import io
import re

# open file for reading
#file_in = open("filename.log")
file_in = io.StringIO(text)

# read line by line
for line in file_in:

   # find values
   ts  = re.findall('TS: ([^ ]*) ', line)[0]
   gps = re.findall('GPS: ([^ ]*), ', line)[0]
   val = gps.split(',')
   gps = f'{val[1]},{val[0]}'
   
   print('TS:', ts, '| GPS:', gps)
   
   # open file for writing in `append mode`
   with open(f'{ts}.txt', 'a') as file_out:
       # write in new line
       file_out.write(gps + '\n')

結果:

TS: 2022-02-10 | GPS: 20.8162,52.1773033
TS: 2022-02-10 | GPS: 20.8162,52.1773033
TS: 2022-02-10 | GPS: 20.8162,52.1773033

KML是更複雜的格式(使用XML結構),我不會嘗試編寫它。

但是有 Python 模組可以寫入KML- 即。簡單的

它可能沒有附加到文件的功能,因此首先它可能需要獲取所有值 GPS,按數據分組,然後為每個組創建 KML 並一次保存所有點。


編輯:

text = '''./production.log.109.gz:I, [2022-02-10T10:00:59.703529 #25190]  INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:35 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26343
./production.log.109.gz:I, [2022-02-10T10:01:13.939349 #25190]  INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:40 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26352
./production.log.109.gz:I, [2022-02-10T10:10:44.757308 #25190]  INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:10:40 +0000, GPS: 52.1773033,20.8162, SAT: 18, KM/H: 0, V: 25924
'''

import io
import re
import simplekml

#f = open("filename.log")
f = io.StringIO(text)

# -----------------------

groups = {}

for line in f:
   ts  = re.findall('TS: ([^ ]*) ', line)[0]
   gps = re.findall('GPS: ([^ ]*), ', line)[0]
   val = gps.split(',')
   gps = [val[1],val[0]]
   
   print('TS:', ts, '| GPS:', gps)
   
   if ts not in groups:
       groups[ts] = []
       
   groups[ts].append(gps)
   
#----------------------------------------
   
for name, values in groups.items():
   print('name:', name)
   
   kml = simplekml.Kml()
   
   for gps in values:
       kml.newpoint(coords=[gps])
       
   # --- after loop ---
   kml.save(f"{name}.kml")

引用自:https://unix.stackexchange.com/questions/696392