Shell-Script

用於反轉 HTML 文件中數千個元素的排序順序的正確工具

  • June 6, 2018

我有一個包含數千個<div class='date'></div><ul>...</ul>程式碼塊的 HTML 文件,如下所示:

<!DOCTYPE html>
<html>

   <head>
   </head>

   <body>

       <div class="date">Wed May 23 2018</div>
       <ul>
           <li>
               Do laundry
               <ul>
                   <li>
                       Get coins
                   </li>
               </ul>
           </li>
           <li>
               Wash the dishes
           </li>
       </ul>

       <div class='date'>Thu May 24 2018</div>
       <ul>
           <li>
               Solve the world's hunger problem
               <ul>
                   <li>
                       Don't tell anyone
                   </li>
               </ul>
           </li>
           <li>
               Get something to wear
           </li>
       </ul>

       <div class='date'>Fri May 25 2018</div>
       <ul>
           <li>
               Modify the website according to GDPR
           </li>
           <li>
               Watch YouTube
           </li>
       </ul>

   </body>

</html>

每個<div>和相應的<ul>元素都是針對特定日期的。的塊<div class='date'></div><ul>...</ul>按升序排序,即較新的日期位於文件的底部。我打算將它們按降序排列,以便較新的日期位於文件的頂部,如下所示:

<!DOCTYPE html>
<html>

   <head>
   </head>

   <body>

       <div class='date'>Fri May 25 2018</div>
       <ul>
           <li>
               Modify the website according to GDPR
           </li>
           <li>
               Watch YouTube
           </li>
       </ul>

       <div class='date'>Thu May 24 2018</div>
       <ul>
           <li>
               Solve the world's hunger problem
               <ul>
                   <li>
                       Don't tell anyone
                   </li>
               </ul>
           </li>
           <li>
               Get something to wear
           </li>
       </ul>

       <div class="date">Wed May 23 2018</div>
       <ul>
           <li>
               Do laundry
               <ul>
                   <li>
                       Get coins
                   </li>
               </ul>
           </li>
           <li>
               Wash the dishes
           </li>
       </ul>

   </body>

</html> 

我不確定什麼是正確的工具,它是 shell 腳本嗎?是awk嗎?是 Python 嗎?還有什麼更快更方便的嗎?

擴展**Python**解決方案:

**sort_html_by_date.py**腳本:

from bs4 import BeautifulSoup
from datetime import datetime

with open('input.html') as html_doc:    # replace with your actual html file name
   soup = BeautifulSoup(html_doc, 'lxml')
   divs = {}
   for div in soup.find_all('div', 'date'):
       divs[datetime.strptime(div.string, '%a %B %d %Y')] = \
           str(div) + '\n' + div.find_next_sibling('ul').prettify()

   soup.body.clear()
   for el in sorted(divs, reverse=True):
       soup.body.append(divs[el])

   print(soup.prettify(formatter=None))

用法:

python sort_html_by_date.py

輸出:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
 <div class="date">Fri May 25 2018</div>
<ul>
<li>
 Modify the website according to GDPR
</li>
<li>
 Watch YouTube
</li>
</ul>
 <div class="date">Thu May 24 2018</div>
<ul>
<li>
 Solve the world's hunger problem
 <ul>
  <li>
   Don't tell anyone
  </li>
 </ul>
</li>
<li>
 Get something to wear
</li>
</ul>
 <div class="date">Wed May 23 2018</div>
<ul>
<li>
 Do laundry
 <ul>
  <li>
   Get coins
  </li>
 </ul>
</li>
<li>
 Wash the dishes
</li>
</ul>
</body>
</html>

使用的模組:

beautifulsoup - https://www.crummy.com/software/BeautifulSoup/bs4/doc/

datetime - https://docs.python.org/3.3/library/datetime.html#module-datetime

引用自:https://unix.stackexchange.com/questions/448016