Shell-Script
用於反轉 HTML 文件中數千個元素的排序順序的正確工具
我有一個包含數千個
<div class='date'></div><ul>...</ul>
程式碼塊的 HTML 文件,如下所示:<!DOCTYPE html> <html> <head> </head> <body> <div class="date">Wed May 23 2018</div> <ul> <li> Do laundry <ul> <li> Get coins </li> </ul> </li> <li> Wash the dishes </li> </ul> <div class='date'>Thu May 24 2018</div> <ul> <li> Solve the world's hunger problem <ul> <li> Don't tell anyone </li> </ul> </li> <li> Get something to wear </li> </ul> <div class='date'>Fri May 25 2018</div> <ul> <li> Modify the website according to GDPR </li> <li> Watch YouTube </li> </ul> </body> </html>
每個
<div>
和相應的<ul>
元素都是針對特定日期的。的塊<div class='date'></div><ul>...</ul>
按升序排序,即較新的日期位於文件的底部。我打算將它們按降序排列,以便較新的日期位於文件的頂部,如下所示:<!DOCTYPE html> <html> <head> </head> <body> <div class='date'>Fri May 25 2018</div> <ul> <li> Modify the website according to GDPR </li> <li> Watch YouTube </li> </ul> <div class='date'>Thu May 24 2018</div> <ul> <li> Solve the world's hunger problem <ul> <li> Don't tell anyone </li> </ul> </li> <li> Get something to wear </li> </ul> <div class="date">Wed May 23 2018</div> <ul> <li> Do laundry <ul> <li> Get coins </li> </ul> </li> <li> Wash the dishes </li> </ul> </body> </html>
我不確定什麼是正確的工具,它是 shell 腳本嗎?是
awk
嗎?是 Python 嗎?還有什麼更快更方便的嗎?
擴展**
Python
**解決方案:**
sort_html_by_date.py
**腳本:from bs4 import BeautifulSoup from datetime import datetime with open('input.html') as html_doc: # replace with your actual html file name soup = BeautifulSoup(html_doc, 'lxml') divs = {} for div in soup.find_all('div', 'date'): divs[datetime.strptime(div.string, '%a %B %d %Y')] = \ str(div) + '\n' + div.find_next_sibling('ul').prettify() soup.body.clear() for el in sorted(divs, reverse=True): soup.body.append(divs[el]) print(soup.prettify(formatter=None))
用法:
python sort_html_by_date.py
輸出:
<!DOCTYPE html> <html> <head> </head> <body> <div class="date">Fri May 25 2018</div> <ul> <li> Modify the website according to GDPR </li> <li> Watch YouTube </li> </ul> <div class="date">Thu May 24 2018</div> <ul> <li> Solve the world's hunger problem <ul> <li> Don't tell anyone </li> </ul> </li> <li> Get something to wear </li> </ul> <div class="date">Wed May 23 2018</div> <ul> <li> Do laundry <ul> <li> Get coins </li> </ul> </li> <li> Wash the dishes </li> </ul> </body> </html>
使用的模組:
beautifulsoup - https://www.crummy.com/software/BeautifulSoup/bs4/doc/
datetime - https://docs.python.org/3.3/library/datetime.html#module-datetime