Symlink

是否有算法來確定符號連結是否循環?

  • July 28, 2021

Unix 系統通常只會在遇到包含符號連結循環或符號連結過多的路徑時出錯,因為它們對在一次路徑查找中遍歷的符號連結數量有限制。但是有沒有辦法真正決定給定的路徑是否解析為某些東西或包含一個循環,即使它包含的連結比 unix 願意遵循的更多?或者這是一個形式上無法確定的問題?如果可以決定,是否可以在合理的時間/記憶體量內決定(例如,無需訪問文件系統上的所有文件)?

一些例子:

a/b/c/d
where a/b is a symlink to ../e
and e is a symlink to f
and f is a symlink to a/b

a/b/c/d
where a/b/c is a symlink to ../c

a/b/c/d
where a/b/c is a symlink to ../c/d

a/b/c/d
where a/b/c is a symlink to /a/b/e
where a/b/e is a symlink to /a/b/f
where a/b/f is a symlink to /a/b/g

編輯

澄清一下,我不是在詢問在文件系統中查找循環,而是在詢問一種決策算法,該算法決定給定路徑是解析為確定的文件/目錄還是根本不解析。例如在以下系統中,有一個循環,但給定的路徑仍然可以正常解析:

/ -- a -- b
where b is a symlink to /a

這個目錄樹顯然有一個循環,但路徑a/b/b/b/b/b仍然可以很好地解析為/a.

好的,經過更多思考,我想我有一個明確的解決方案。

關鍵的見解是,如果作為路徑一部分的每個連結都解析為某個東西,那麼整個路徑都會解析。或者相反,如果路徑無法解析,則必須有一個特定的符號連結需要遍歷但無法解析。

在之前考慮這個問題時,我使用了一種算法,該算法從根開始遍歷路徑元素,當它遇到符號連結時,它用符號連結的內容替換了該路徑元素,然後繼續遍歷。由於這種方法不記得它目前正在解析哪個符號連結,因此它無法檢測到它何時處於非解析循環中。

如果算法跟踪它目前正在解析哪個符號連結(或者在遞歸連結的情況下是哪些符號連結),它可以檢測它是否正在嘗試再次遞歸解析它仍在忙於解析的連結。

算法:

initialize `location` to the current working directory
initialize `link_contents` to the path we want to resolve
initialize `active_symlinks` to the empty set

def resolve_symlink(location, link_contents, active_symlinks) :
   loop forever:
       next_location = location / [first element of link_contents]
       see if next_location is a symlink.
       if so:
           if next_location in active_symlinks: abort, we have a loop
           location = resolve_symlink(location, readlink(next_location), active_symlinks ∪ {next_location})
       else:
           location = next_location
       strip first element of link_contents
       if link_contents is empty: 
           return location

編輯

我在https://bitbucket.org/JanKanis/python-inotify/src/853ed903e870cbfa283e6ce7a5e41aeffe16d4e7/inotify/pathresolver.py?at=pathwatcher在 python 中有一個工作實現。

編輯 2:Bitbucket 停止託管 mercurial repos。這是完整的文件:

# pathresolver.py - This module contains an iterator that iterates over all
# elements of a path including any symlinks. 

# Copyright 2012-2013 Jan Kanis

# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.


"""
This module contains a few functions that help traverse paths with
symlinks.

`resolve_symlinks` is a generator that yields pairs of paths, with one
yield for each path element that is traversed to reach the final
target of a path. This includes path elements and paths from symlinks.

`resolve_path` is a wrapper around `resolve_symlinks` that takes a
single path as an argument and sets up the other arguments to
`resolve_symlinks`.

`get_symlinkmax` is a function that determines the maximum number of
symlinks the system will traverse in a path.

Note: this module forms part of python-inotify, but is considered an
internal module. As such there are no stability guarantees regarding
the api's of these functions.
"""


__author__ = "Jan Kanis"


import sys, os, errno
import tempfile, shutil
from pathlib import PosixPath


_curdir = PosixPath('.')
_root = PosixPath('/')
_parentdir = PosixPath('..')


def resolve_path(path):
   '''Resolve the symlinks in path, yielding all filesystem locations that are traversed.

   The yielded value is a tuple, of which the first element is a symlink-free
   path, and the second element is a path relative to the first element that
   has not yet been traversed. This second element may contain more symlinks.
   
   The resolution implementation will follow an unbounded number of symlinks
   but will still detect symlink loops if they prevent a path from resolving.

   path can be given as a string or as a pathlib object. The yielded values
   are pathlib.PosixPath objects.

   '''
   linkcache = {}
   linkcounter = [0]
   for p in resolve_symlink(_curdir, PosixPath(path), set(),
                                 linkcache, linkcounter):
       yield p


def resolve_symlink(location, link_contents, active_links, known_links, linkcounter):
   '''Recursively resolve a symlink (or another path) to the file or
   directory it ultimately points to. This function handles an
   unlimited number of symlinks, and correctly detects symlink
   loops. All path parameters should be given as pathlib.PosixPath
   instances.

   location: The directory in which the currently to be resolved link resides.

   link_contents: The path stored in the symlink as returned by readlink().

   active_links: a set of symlinks that is currently being resolved.

   linkcache: a dictionary of link location -> resolved target paths. This
   cache prevents this function from having to resolve the same symlink
   twice. (Note that having to traverse the same symlink multiple times
   does not necessarily mean that the path does not resolve to anything.)

   linkcounter: A list containing a single number. (We use a list so that the
   value can be passed by reference.) This number is updated to indicate the
   total number of symlinks that has been traversed.

   '''

   while True:
       if link_contents.is_absolute():
           location = _root
           link_contents = link_contents.relative()

       yield location, link_contents
       if link_contents == _curdir:
           return

       if link_contents.parts[0] == '..':
           # We need to choose here if we allow traversing of a path above
           # the root or above the current directory. Going above CWD
           # should be allowed as long as we don't go above / by doing
           # so. The OS allows going to /.. (which just ends up at /
           # again), so for consistency with that we also allow it,
           # although a path that requires us to do this is probably a bug
           # somewhere.
           if all(p in ('/', '..') for p in location.parts):
               location = location['..']
           else:
               location = location.parent()
           # Strip the first part of link_contents off
           link_contents = link_contents.parts[1:]
           continue

       try:
           nextpath = location[link_contents.parts[0]]
           newlink = PosixPath(os.readlink(str(nextpath)))
       except OSError as e:
           if e.errno == errno.EINVAL:
               # The entry is not a symbolic link, assume it is a normal file
               # or directory. If it is a file and we need it to be a
               # directory that will be detected the next time through the
               # loop in the os.errno.ENOTDIR check. Checking it here would be
               # possible, but keeping the number of system calls at one per
               # loop makes reasoning about race conditions easier.
               location = nextpath
               link_contents = link_contents.parts[1:]
               continue
           if e.errno == errno.ENOENT:
               # The entry does not exist
               raise FileNotFoundError(nextpath)
           if e.errno == errno.ENOTDIR:
               # At this point we know the path is not valid, but we can not
               # determine with certainty what is wrong. If there were no
               # concurrent modifications we can safely make an is_dir()
               # call. If concurrent modifications did happen the is_dir()
               # check may succeed erroneously but we can't detect all
               # concurrent modifications anyway. If the check fails
               # (erroneously or not) that indicates a concurrent modification
               # so we fall through.
               if not location.is_dir():
                   raise NotADirectoryError(location)
               # We should not be able to get here, unless there is a bug
               # or some relevant part of the file system was changed
               # concurrently while we were resolving this link.
               raise ConcurrentFilesystemModificationError(nextpath)
           if e.errno == errno.ELOOP:
               # Can only happen if a path component was changed concurrently
               raise ConcurrentFilesystemModificationError(nextpath)
           # For other OSErrors (such as in case of EACCESS) we propagate to
           # the caller. 
           raise

       # It is a symlink!
       if nextpath in active_links:
           raise SymlinkLoopError(nextpath)

       link_contents = link_contents.parts[1:]
       # We have not yet attempted traversing this symlink during the
       # current call or any of its parents.
       if nextpath in known_links:
           # known_links stores fully resolved paths, so we don't need to
           # traverse the cached path and can just continue our traversal from
           # there.
           location = known_links[nextpath]
           continue
       
       # An unknown link, resolve it recursively
       linkcounter[0] += 1
       # Don't yield the very last result of this recursive call immediately,
       # we still want to process that further. 
       lastloc, lastlink = None, None
       for loc, link in resolve_symlink(location, newlink,
                         active_links.union((nextpath,)), known_links, linkcounter):
           if lastloc:
               yield lastloc, lastlink[link_contents]
           lastloc, lastlink = loc, link
       # The last yielded location is the final resolution of the symlink. The
       # last yielded link_contents is always '.' so we can ignore that.
       known_links[nextpath] = loc
       location = loc
       continue


_symlinkmax = None
def get_symlinkmax():
 '''Returns the maximum number of symlinks that this system will traverse in
 resolving a file path.

 '''
 global _symlinkmax
 if _symlinkmax is not None:
   return _symlinkmax

 try:
   tempdir = tempfile.mkdtemp(prefix='inotify-symlinkmax-tmpdir-')
   open(tempdir+'/testfile', 'w').close()

   target = 'testfile'
   for i in range(1, 60):
     name = 'l'+str(i)
     os.symlink(target, tempdir+'/'+name)
     target = name

     try:
       open(tempdir+'/'+name).close()
     except IOError as e:
       if e.errno == errno.ELOOP:
         _symlinkmax = i - 1
         break
       raise
   
 finally:
   if tempdir:
     shutil.rmtree(tempdir)
 return _symlinkmax



class InvalidPathError (OSError):
   def __init__(self, msg, path, errno=None, *args):
       self.filename = path
       self.errno = errno
       if errno:
           self.strerror = os.strerror(errno)
       OSError.__init__(self, msg, *args)

class SymlinkLoopError (InvalidPathError):
   def __init__(self, path, *args):
       msg = "Path not valid: The symlink at '{}' forms a symlink loop".format(path)
       InvalidPathError.__init__(self, msg, path, errno=errno.ELOOP, *args)

class ConcurrentFilesystemModificationError (InvalidPathError):
   def __init__(self, path, *args):
       msg = "Path not valid: A concurrent change was detected while traversing '{}'".format(path)
       InvalidPathError.__init__(self, msg, path, errno=None, *args)


# To be Python 2 and 3 compatible and also inherit from the right exception
# types in python 3, we need some boilerplate.

fnf_msg = "Path not valid: '{}' does not exist"
nad_msg = "Path not valid: '{}' is not a directory"

if sys.version_info >= (3, 3):
   class FileNotFoundError (InvalidPathError, FileNotFoundError):
       def __init__(self, path, *args):
           InvalidPathError.__init__(self, fnf_msg.format(path), path,
                                         errno=ENOENT, *args)

   class NotADirectoryError (InvalidPathError, NotADirectoryError):
       def __init__(self, path, *args):
           InvalidPathError.__init__(self, nad_msg.format(path), path,
                                         errno=errno.ENOTDIR, *args)

else:
   class FileNotFoundError (InvalidPathError):
       def __init__(self, path, *args):
           InvalidPathError.__init__(self, fnf_msg.format(path), path,
                                         errno=errno.ENOENT, *args)

   class NotADirectoryError (InvalidPathError):
       def __init__(self, path, *args):
           InvalidPathError.__init__(self, nad_msg.format(path), path,
                                         errno=errno.ENOTDIR, *args)

我不完全明白你在問什麼。如果我不知道更好,我想您是在問是否有辦法在處理文件時檢測到這一點。我不相信這是可能的。

我能想到的唯一方法是查找您專門開始查看目錄樹中特定分支的位置。

例子

$ tree 
.
`-- a
   `-- b
       |-- c
       |   `-- d
       |       `-- e -> ../../../../a/b
       `-- e -> e

5 directories, 1 file

find命令會檢測到這個循環,但不會真正告訴你很多關於它的資訊。

$ find -L . -mindepth 15
find: File system loop detected; `./a/b/c/d/e' is part of the same file system loop as `./a/b'.
find: `./a/b/e': Too many levels of symbolic links

我隨意選擇了 15 個級別,以阻止find. -mindepth但是,如果您不關心正在顯示的目錄樹,則可以刪除該開關 ( )。該find命令仍然檢測到循環並停止:

$ find -L . 
.
./a
./a/b
./a/b/c
./a/b/c/d
find: File system loop detected; `./a/b/c/d/e' is part of the same file system loop as `./a/b'.
find: `./a/b/e': Too many levels of symbolic links

順便說一句,如果您想覆蓋MAXSYMLINKSLinux 上顯然是 40 的預設值(較新的 3.x 核心版本),您可以查看標題為:How do you increase MAXSYMLINKS的 U&L Q&A 。

使用符號連結命令

FTP 站點維護人員可以使用一個名為的工具,該工具symlinks將有助於揭示由符號連結引起的樹過長或懸空的問題。

在某些情況下,該symlinks工具也可用於刪除違規連結。

例子

$ symlinks -srv a
lengthy:  /home/saml/tst/99159/a/b/c/d/e -> ../../../../a/b
dangling: /home/saml/tst/99159/a/b/e -> e

glibc 庫

glibc 庫看起來提供了一些圍繞此的 C 函式,但我不完全了解它們的作用或如何實際使用它們。所以我只能給你指出來。

手冊頁man symlink顯示了一個名為symlink(). 描述是這樣的:

symlink() 創建一個名為 newpath 的符號連結,其中包含字元串 oldpath。

該函式返回的錯誤之一是:

ELOOP 在解析新路徑時遇到了太多符號連結。

我還將引導您查看手冊頁,man path_resolution其中討論了 Unix 如何確定磁碟上項目的路徑。特別是這一段。

If  the component is found and is a symbolic link (symlink), we first 
resolve this symbolic link (with the current lookup directory as starting 
lookup directory).  Upon error, that error is returned.  If the result is 
not a directory, an ENOTDIR error is returned.  If the resolution of the 
symlink is successful and returns a directory, we set the current lookup
directory to that directory, and go to the next component.  Note that the 
resolution process here involves recursion.  In order  to  protect  the 
kernel against stack overflow, and also to protect against denial of 
service, there are limits on the maximum recursion depth, and on the maximum 
number of symbolic links followed.  An ELOOP error is returned  when  the
maximum is exceeded ("Too many levels of symbolic links").

引用自:https://unix.stackexchange.com/questions/99159