是否有算法來確定符號連結是否循環?
Unix 系統通常只會在遇到包含符號連結循環或符號連結過多的路徑時出錯,因為它們對在一次路徑查找中遍歷的符號連結數量有限制。但是有沒有辦法真正決定給定的路徑是否解析為某些東西或包含一個循環,即使它包含的連結比 unix 願意遵循的更多?或者這是一個形式上無法確定的問題?如果可以決定,是否可以在合理的時間/記憶體量內決定(例如,無需訪問文件系統上的所有文件)?
一些例子:
a/b/c/d where a/b is a symlink to ../e and e is a symlink to f and f is a symlink to a/b a/b/c/d where a/b/c is a symlink to ../c a/b/c/d where a/b/c is a symlink to ../c/d a/b/c/d where a/b/c is a symlink to /a/b/e where a/b/e is a symlink to /a/b/f where a/b/f is a symlink to /a/b/g
編輯:
澄清一下,我不是在詢問在文件系統中查找循環,而是在詢問一種決策算法,該算法決定給定路徑是解析為確定的文件/目錄還是根本不解析。例如在以下系統中,有一個循環,但給定的路徑仍然可以正常解析:
/ -- a -- b where b is a symlink to /a
這個目錄樹顯然有一個循環,但路徑
a/b/b/b/b/b
仍然可以很好地解析為/a
.
好的,經過更多思考,我想我有一個明確的解決方案。
關鍵的見解是,如果作為路徑一部分的每個連結都解析為某個東西,那麼整個路徑都會解析。或者相反,如果路徑無法解析,則必須有一個特定的符號連結需要遍歷但無法解析。
在之前考慮這個問題時,我使用了一種算法,該算法從根開始遍歷路徑元素,當它遇到符號連結時,它用符號連結的內容替換了該路徑元素,然後繼續遍歷。由於這種方法不記得它目前正在解析哪個符號連結,因此它無法檢測到它何時處於非解析循環中。
如果算法跟踪它目前正在解析哪個符號連結(或者在遞歸連結的情況下是哪些符號連結),它可以檢測它是否正在嘗試再次遞歸解析它仍在忙於解析的連結。
算法:
initialize `location` to the current working directory initialize `link_contents` to the path we want to resolve initialize `active_symlinks` to the empty set def resolve_symlink(location, link_contents, active_symlinks) : loop forever: next_location = location / [first element of link_contents] see if next_location is a symlink. if so: if next_location in active_symlinks: abort, we have a loop location = resolve_symlink(location, readlink(next_location), active_symlinks ∪ {next_location}) else: location = next_location strip first element of link_contents if link_contents is empty: return location
編輯:
我在https://bitbucket.org/JanKanis/python-inotify/src/853ed903e870cbfa283e6ce7a5e41aeffe16d4e7/inotify/pathresolver.py?at=pathwatcher在 python 中有一個工作實現。
編輯 2:Bitbucket 停止託管 mercurial repos。這是完整的文件:
# pathresolver.py - This module contains an iterator that iterates over all # elements of a path including any symlinks. # Copyright 2012-2013 Jan Kanis # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. """ This module contains a few functions that help traverse paths with symlinks. `resolve_symlinks` is a generator that yields pairs of paths, with one yield for each path element that is traversed to reach the final target of a path. This includes path elements and paths from symlinks. `resolve_path` is a wrapper around `resolve_symlinks` that takes a single path as an argument and sets up the other arguments to `resolve_symlinks`. `get_symlinkmax` is a function that determines the maximum number of symlinks the system will traverse in a path. Note: this module forms part of python-inotify, but is considered an internal module. As such there are no stability guarantees regarding the api's of these functions. """ __author__ = "Jan Kanis" import sys, os, errno import tempfile, shutil from pathlib import PosixPath _curdir = PosixPath('.') _root = PosixPath('/') _parentdir = PosixPath('..') def resolve_path(path): '''Resolve the symlinks in path, yielding all filesystem locations that are traversed. The yielded value is a tuple, of which the first element is a symlink-free path, and the second element is a path relative to the first element that has not yet been traversed. This second element may contain more symlinks. The resolution implementation will follow an unbounded number of symlinks but will still detect symlink loops if they prevent a path from resolving. path can be given as a string or as a pathlib object. The yielded values are pathlib.PosixPath objects. ''' linkcache = {} linkcounter = [0] for p in resolve_symlink(_curdir, PosixPath(path), set(), linkcache, linkcounter): yield p def resolve_symlink(location, link_contents, active_links, known_links, linkcounter): '''Recursively resolve a symlink (or another path) to the file or directory it ultimately points to. This function handles an unlimited number of symlinks, and correctly detects symlink loops. All path parameters should be given as pathlib.PosixPath instances. location: The directory in which the currently to be resolved link resides. link_contents: The path stored in the symlink as returned by readlink(). active_links: a set of symlinks that is currently being resolved. linkcache: a dictionary of link location -> resolved target paths. This cache prevents this function from having to resolve the same symlink twice. (Note that having to traverse the same symlink multiple times does not necessarily mean that the path does not resolve to anything.) linkcounter: A list containing a single number. (We use a list so that the value can be passed by reference.) This number is updated to indicate the total number of symlinks that has been traversed. ''' while True: if link_contents.is_absolute(): location = _root link_contents = link_contents.relative() yield location, link_contents if link_contents == _curdir: return if link_contents.parts[0] == '..': # We need to choose here if we allow traversing of a path above # the root or above the current directory. Going above CWD # should be allowed as long as we don't go above / by doing # so. The OS allows going to /.. (which just ends up at / # again), so for consistency with that we also allow it, # although a path that requires us to do this is probably a bug # somewhere. if all(p in ('/', '..') for p in location.parts): location = location['..'] else: location = location.parent() # Strip the first part of link_contents off link_contents = link_contents.parts[1:] continue try: nextpath = location[link_contents.parts[0]] newlink = PosixPath(os.readlink(str(nextpath))) except OSError as e: if e.errno == errno.EINVAL: # The entry is not a symbolic link, assume it is a normal file # or directory. If it is a file and we need it to be a # directory that will be detected the next time through the # loop in the os.errno.ENOTDIR check. Checking it here would be # possible, but keeping the number of system calls at one per # loop makes reasoning about race conditions easier. location = nextpath link_contents = link_contents.parts[1:] continue if e.errno == errno.ENOENT: # The entry does not exist raise FileNotFoundError(nextpath) if e.errno == errno.ENOTDIR: # At this point we know the path is not valid, but we can not # determine with certainty what is wrong. If there were no # concurrent modifications we can safely make an is_dir() # call. If concurrent modifications did happen the is_dir() # check may succeed erroneously but we can't detect all # concurrent modifications anyway. If the check fails # (erroneously or not) that indicates a concurrent modification # so we fall through. if not location.is_dir(): raise NotADirectoryError(location) # We should not be able to get here, unless there is a bug # or some relevant part of the file system was changed # concurrently while we were resolving this link. raise ConcurrentFilesystemModificationError(nextpath) if e.errno == errno.ELOOP: # Can only happen if a path component was changed concurrently raise ConcurrentFilesystemModificationError(nextpath) # For other OSErrors (such as in case of EACCESS) we propagate to # the caller. raise # It is a symlink! if nextpath in active_links: raise SymlinkLoopError(nextpath) link_contents = link_contents.parts[1:] # We have not yet attempted traversing this symlink during the # current call or any of its parents. if nextpath in known_links: # known_links stores fully resolved paths, so we don't need to # traverse the cached path and can just continue our traversal from # there. location = known_links[nextpath] continue # An unknown link, resolve it recursively linkcounter[0] += 1 # Don't yield the very last result of this recursive call immediately, # we still want to process that further. lastloc, lastlink = None, None for loc, link in resolve_symlink(location, newlink, active_links.union((nextpath,)), known_links, linkcounter): if lastloc: yield lastloc, lastlink[link_contents] lastloc, lastlink = loc, link # The last yielded location is the final resolution of the symlink. The # last yielded link_contents is always '.' so we can ignore that. known_links[nextpath] = loc location = loc continue _symlinkmax = None def get_symlinkmax(): '''Returns the maximum number of symlinks that this system will traverse in resolving a file path. ''' global _symlinkmax if _symlinkmax is not None: return _symlinkmax try: tempdir = tempfile.mkdtemp(prefix='inotify-symlinkmax-tmpdir-') open(tempdir+'/testfile', 'w').close() target = 'testfile' for i in range(1, 60): name = 'l'+str(i) os.symlink(target, tempdir+'/'+name) target = name try: open(tempdir+'/'+name).close() except IOError as e: if e.errno == errno.ELOOP: _symlinkmax = i - 1 break raise finally: if tempdir: shutil.rmtree(tempdir) return _symlinkmax class InvalidPathError (OSError): def __init__(self, msg, path, errno=None, *args): self.filename = path self.errno = errno if errno: self.strerror = os.strerror(errno) OSError.__init__(self, msg, *args) class SymlinkLoopError (InvalidPathError): def __init__(self, path, *args): msg = "Path not valid: The symlink at '{}' forms a symlink loop".format(path) InvalidPathError.__init__(self, msg, path, errno=errno.ELOOP, *args) class ConcurrentFilesystemModificationError (InvalidPathError): def __init__(self, path, *args): msg = "Path not valid: A concurrent change was detected while traversing '{}'".format(path) InvalidPathError.__init__(self, msg, path, errno=None, *args) # To be Python 2 and 3 compatible and also inherit from the right exception # types in python 3, we need some boilerplate. fnf_msg = "Path not valid: '{}' does not exist" nad_msg = "Path not valid: '{}' is not a directory" if sys.version_info >= (3, 3): class FileNotFoundError (InvalidPathError, FileNotFoundError): def __init__(self, path, *args): InvalidPathError.__init__(self, fnf_msg.format(path), path, errno=ENOENT, *args) class NotADirectoryError (InvalidPathError, NotADirectoryError): def __init__(self, path, *args): InvalidPathError.__init__(self, nad_msg.format(path), path, errno=errno.ENOTDIR, *args) else: class FileNotFoundError (InvalidPathError): def __init__(self, path, *args): InvalidPathError.__init__(self, fnf_msg.format(path), path, errno=errno.ENOENT, *args) class NotADirectoryError (InvalidPathError): def __init__(self, path, *args): InvalidPathError.__init__(self, nad_msg.format(path), path, errno=errno.ENOTDIR, *args)
我不完全明白你在問什麼。如果我不知道更好,我想您是在問是否有辦法在處理文件時檢測到這一點。我不相信這是可能的。
我能想到的唯一方法是查找您專門開始查看目錄樹中特定分支的位置。
例子
$ tree . `-- a `-- b |-- c | `-- d | `-- e -> ../../../../a/b `-- e -> e 5 directories, 1 file
該
find
命令會檢測到這個循環,但不會真正告訴你很多關於它的資訊。$ find -L . -mindepth 15 find: File system loop detected; `./a/b/c/d/e' is part of the same file system loop as `./a/b'. find: `./a/b/e': Too many levels of symbolic links
我隨意選擇了 15 個級別,以阻止
find
.-mindepth
但是,如果您不關心正在顯示的目錄樹,則可以刪除該開關 ( )。該find
命令仍然檢測到循環並停止:$ find -L . . ./a ./a/b ./a/b/c ./a/b/c/d find: File system loop detected; `./a/b/c/d/e' is part of the same file system loop as `./a/b'. find: `./a/b/e': Too many levels of symbolic links
順便說一句,如果您想覆蓋
MAXSYMLINKS
Linux 上顯然是 40 的預設值(較新的 3.x 核心版本),您可以查看標題為:How do you increase MAXSYMLINKS的 U&L Q&A 。使用符號連結命令
FTP 站點維護人員可以使用一個名為的工具,該工具
symlinks
將有助於揭示由符號連結引起的樹過長或懸空的問題。在某些情況下,該
symlinks
工具也可用於刪除違規連結。例子
$ symlinks -srv a lengthy: /home/saml/tst/99159/a/b/c/d/e -> ../../../../a/b dangling: /home/saml/tst/99159/a/b/e -> e
glibc 庫
glibc 庫看起來提供了一些圍繞此的 C 函式,但我不完全了解它們的作用或如何實際使用它們。所以我只能給你指出來。
手冊頁
man symlink
顯示了一個名為symlink()
. 描述是這樣的:symlink() 創建一個名為 newpath 的符號連結,其中包含字元串 oldpath。
該函式返回的錯誤之一是:
ELOOP 在解析新路徑時遇到了太多符號連結。
我還將引導您查看手冊頁,
man path_resolution
其中討論了 Unix 如何確定磁碟上項目的路徑。特別是這一段。If the component is found and is a symbolic link (symlink), we first resolve this symbolic link (with the current lookup directory as starting lookup directory). Upon error, that error is returned. If the result is not a directory, an ENOTDIR error is returned. If the resolution of the symlink is successful and returns a directory, we set the current lookup directory to that directory, and go to the next component. Note that the resolution process here involves recursion. In order to protect the kernel against stack overflow, and also to protect against denial of service, there are limits on the maximum recursion depth, and on the maximum number of symbolic links followed. An ELOOP error is returned when the maximum is exceeded ("Too many levels of symbolic links").