Filesystems

All database clients are expected to be subclasses of DatabaseClient, and so will share a common API and inherit a suite of IPython magics. Protocol implementations are also free to add extra methods, which are documented in the “Subclass Reference” section below.

Common API

class omniduct.filesystems.base.FileSystemClient(cwd=None, home=None, read_only=False, global_writes=False, **kwargs)[source]

Bases: omniduct.duct.Duct, omniduct.utils.magics.MagicsProvider

An abstract class providing the common API for all filesystem clients.

Class Attributes:
 
  • DUCT_TYPE (Duct.Type) – The type of Duct protocol implemented by this class.
  • DEFAULT_PORT (int) – The default port for the filesystem service (defined by subclasses).
Attributes inherited from Duct:
protocol (str): The name of the protocol for which this instance was
created (especially useful if a Duct subclass supports multiple protocols).
name (str): The name given to this Duct instance (defaults to class
name).
host (str): The host name providing the service (will be ‘127.0.0.1’, if
service is port forwarded from remote; use ._host to see remote host).
port (int): The port number of the service (will be the port-forwarded
local port, if relevant; for remote port use ._port).

username (str, bool): The username to use for the service. password (str, bool): The password to use for the service. registry (None, omniduct.registry.DuctRegistry): A reference to a

DuctRegistry instance for runtime lookup of other services.
remote (None, omniduct.remotes.base.RemoteClient): A reference to a
RemoteClient instance to manage connections to remote services.
cache (None, omniduct.caches.base.Cache): A reference to a Cache
instance to add support for caching, if applicable.
connection_fields (tuple<str>, list<str>): A list of instance attributes
to monitor for changes, whereupon the Duct instance should automatically disconnect. By default, the following attributes are monitored: ‘host’, ‘port’, ‘remote’, ‘username’, and ‘password’.
prepared_fields (tuple<str>, list<str>): A list of instance attributes to
be populated (if their values are callable) when the instance first connects to a service. Refer to Duct.prepare and Duct._prepare for more details. By default, the following attributes are prepared: ‘_host’, ‘_port’, ‘_username’, and ‘_password’.

Additional attributes including host, port, username and password are documented inline.

Class Attributes:
AUTO_LOGGING_SCOPE (bool): Whether this class should be used by omniduct
logging code as a “scope”. Should be overridden by subclasses as appropriate.
DUCT_TYPE (Duct.Type): The type of Duct service that is provided by
this Duct instance. Should be overridden by subclasses as appropriate.
PROTOCOLS (list<str>): The name(s) of any protocols that should be
associated with this class. Should be overridden by subclasses as appropriate.
__init__(cwd=None, home=None, read_only=False, global_writes=False, **kwargs)[source]
cwd (None, str): The path prefix to use as the current working directory
(if None, the user’s home directory is used where that makes sense).
home (None, str): The path prefix to use as the current users’ home
directory. If not specified, it will default to an implementation- specific value (often ‘/’).
read_only (bool): Whether the filesystem should only be able to perform
read operations.
global_writes (bool): Whether to allow writes outside of the user’s home
folder.

**kwargs (dict): Additional keyword arguments to passed on to subclasses.

path_home

The path prefix to use as the current users’ home directory. Unless cwd is set, this will be the prefix to use for all non-absolute path references on this filesystem. This is assumed not to change between connections, and so will not be updated on client reconnections. Unless global_writes is set to True, this will be the only folder into which this client is permitted to write.

Type:str
path_cwd

The path prefix associated with the current working directory. If not otherwise set, it will be the users’ home directory, and will be the prefix used by all non-absolute path references on this filesystem.

Type:str
path_separator

The character(s) to use in separating path components. Typically this will be ‘/’.

Type:str
path_join(path, *components)[source]

Generate a new path by joining together multiple paths.

If any component starts with self.path_separator or ‘~’, then all previous path components are discarded, and the effective base path becomes that component (with ‘~’ expanding to self.path_home). Note that this method does not simplify paths components like ‘..’. Use self.path_normpath for this purpose.

Parameters:
  • path (str) – The base path to which components should be joined.
  • *components (str) – Any additional components to join to the base path.
Returns:

The path resulting from joining all of the components nominated, in order, to the base path.

Return type:

str

path_basename(path)[source]

Extract the last component of a given path.

Components are determined by splitting by self.path_separator. Note that if a path ends with a path separator, the basename will be the empty string.

Parameters:path (str) – The path from which the basename should be extracted.
Returns:The extracted basename.
Return type:str
path_dirname(path)[source]

Extract the parent directory for provided path.

This method returns the entire path except for the basename (the last component), where components are determined by splitting by self.path_separator.

Parameters:path (str) – The path from which the directory path should be extracted.
Returns:The extracted directory path.
Return type:str
path_normpath(path)[source]

Normalise a pathname.

This method returns the normalised (absolute) path corresponding to path on this filesystem.

Parameters:path (str) – The path to normalise (make absolute).
Returns:The normalised path.
Return type:str
read_only

Whether this filesystem client should be permitted to attempt any write operations.

Type:bool
global_writes

Whether writes should be permitted outside of home directory. This write-lock is designed to prevent inadvertent scripted writing in potentially dangerous places.

Type:bool
exists(path)[source]

Check whether nominated path exists on this filesytem.

Parameters:path (str) – The path for which to check existence.
Returns:
True if file/folder exists at nominated path, and False
otherwise.
Return type:bool
isdir(path)[source]

Check whether a nominated path is directory.

Parameters:path (str) – The path for which to check directory nature.
Returns:True if folder exists at nominated path, and False otherwise.
Return type:bool
isfile(path)[source]

Check whether a nominated path is a file.

Parameters:path (str) – The path for which to check file nature.
Returns:True if a file exists at nominated path, and False otherwise.
Return type:bool
dir(path=None)[source]

Retrieve information about the children of a nominated directory.

This method returns a generator over FileSystemFileDesc objects that represent the files/directories that a present as children of the nominated path. If path is not a directory, an exception is raised. The path is interpreted as being relative to the current working directory (on remote filesytems, this will typically be the home folder).

Parameters:path (str) – The path to examine for children.
Returns:The children of path represented as FileSystemFileDesc objects.
Return type:generator<FileSystemFileDesc>

This method should return a generator over FileSystemFileDesc objects.

listdir(path=None)[source]

Retrieve the names of the children of a nomianted directory.

This method inspects the contents of a directory using .dir(path), and returns the names of child members as strings. path is interpreted relative to the current working directory (on remote filesytems, this will typically be the home folder).

Parameters:path (str) – The path of the directory from which to enumerate filenames.
Returns:The names of all children of the nominated directory.
Return type:list<str>
showdir(path=None)[source]

Return a dataframe representation of a directory.

This method returns a pandas.DataFrame representation of the contents of a path, which are retrieved using .dir(path). The exact columns will vary from filesystem to filesystem, depending on the fields returned by .dir(), but the returned DataFrame is guaranteed to at least have the columns: ‘name’ and ‘type’.

Parameters:path (str) – The path of the directory from which to show contents.
Returns:A DataFrame representation of the contents of the nominated directory.
Return type:pandas.DataFrame
walk(path=None)[source]

Explore the filesystem tree starting at a nominated path.

This method returns a generator which recursively walks over all paths that are children of path, one result for each directory, of form: (<path name>, [<directory 1>, …], [<file 1>, …])

Parameters:path (str) – The path of the directory from which to enumerate contents.
Returns:A generator of tuples, each tuple being associated with one directory that is either path or one of its descendants.
Return type:generator<tuple>
find(path_prefix=None, **attrs)[source]

Find a file or directory based on certain attributes.

This method searches for files or folders which satisfy certain constraints on the attributes of the file (as encoded into FileSystemFileDesc). Note that without attribute constraints, this method will function identically to self.dir.

Parameters:
  • path_prefix (str) – The path under which files/directories should be found.
  • **attrs (dict) – Constraints on the fields of the FileSystemFileDesc objects associated with this filesystem, as constant values or callable objects (in which case the object will be called and should return True if attribute value is match, and False otherwise).
Returns:

A generator over FileSystemFileDesc

objects that are descendents of path_prefix and which statisfy provided constraints.

Return type:

generator<FileSystemFileDesc>

mkdir(path, recursive=True, exist_ok=False)[source]

Create a directory at the given path.

Parameters:
  • path (str) – The path of the directory to create.
  • recursive (bool) – Whether to recursively create any parents of this path if they do not already exist.

Note: exist_ok is passed onto subclass implementations of _mkdir rather that implementing the existence check using .exists so that they can avoid the overhead associated with multiple operations, which can be costly in some cases.

remove(path, recursive=False)[source]

Remove file(s) at a nominated path.

Directories (and their contents) will not be removed unless recursive is set to True.

Parameters:
  • path (str) – The path of the file/directory to be removed.
  • recursive (bool) – Whether to remove directories and all of their contents.
open(path, mode='rt')[source]

Open a file for reading and/or writing.

This method opens the file at the given path for reading and/or writing operations. The object returned is programmatically interchangeable with any other Python file-like object, including specification of file modes. If the file is opened in write mode, changes will only be flushed to the source filesystem when the file is closed.

Parameters:
  • path (str) – The path of the file to open.
  • mode (str) – All standard Python file modes.
Returns:

An opened file-like object.

Return type:

FileSystemFile or file-like

download(source, dest=None, overwrite=False, fs=None)[source]

Download files to another filesystem.

This method (recursively) downloads a file/folder from path source on this filesystem to the path dest on filesytem fs, overwriting any existing file if overwrite is True.

Parameters:
  • source (str) – The path on this filesystem of the file to download to the nominated filesystem (fs). If source ends with ‘/’ then contents of the the source directory will be copied into destination folder, and will throw an error if path does not resolve to a directory.
  • dest (str) – The destination path on filesystem (fs). If not specified, the file/folder is downloaded into the default path, usually one’s home folder. If dest ends with ‘/’, and corresponds to a directory, the contents of source will be copied instead of copying the entire folder. If dest is otherwise a directory, an exception will be raised.
  • overwrite (bool) – True if the contents of any existing file by the same name should be overwritten, False otherwise.
  • fs (FileSystemClient) – The FileSystemClient into which the nominated file/folder source should be downloaded. If not specified, defaults to the local filesystem.
upload(source, dest=None, overwrite=False, fs=None)[source]

Upload files from another filesystem.

This method (recursively) uploads a file/folder from path source on filesystem fs to the path dest on this filesytem, overwriting any existing file if overwrite is True. This is equivalent to fs.download(…, fs=self).

Parameters:
  • source (str) – The path on the specified filesystem (fs) of the file to upload to this filesystem. If source ends with ‘/’, and corresponds to a directory, the contents of source will be copied instead of copying the entire folder.
  • dest (str) – The destination path on this filesystem. If not specified, the file/folder is uploaded into the default path, usually one’s home folder, on this filesystem. If dest ends with ‘/’ then file will be copied into destination folder, and will throw an error if path does not resolve to a directory.
  • overwrite (bool) – True if the contents of any existing file by the same name should be overwritten, False otherwise.
  • fs (FileSystemClient) – The FileSystemClient from which to load the file/folder at source. If not specified, defaults to the local filesystem.
connect()

Connect to the service backing this client.

It is not normally necessary for a user to manually call this function, since when a connection is required, it is automatically created.

Returns:A reference to the current object.
Return type:Duct instance
disconnect()

Disconnect this client from backing service.

This method is automatically called during reconnections and/or at Python interpreter shutdown. It first calls Duct._disconnect (which should be implemented by subclasses) and then notifies the RemoteClient subclass, if present, to stop port-forwarding the remote service.

Returns:A reference to this object.
Return type:Duct instance
is_connected()

Check whether this Duct instances is currently connected.

This method checks to see whether a Duct instance is currently connected. This is performed by verifying that the remote host and port are still accessible, and then by calling Duct._is_connected, which should be implemented by subclasses.

Returns:Whether this Duct instance is currently connected.
Return type:bool
prepare()

Prepare a Duct subclass for use (if not already prepared).

This method is called before the value of any of the fields referenced in self.connection_fields are retrieved. The fields include, by default: ‘host’, ‘port’, ‘remote’, ‘cache’, ‘username’, and ‘password’. Subclasses may add or subtract from these special fields.

When called, it first checks whether the instance has already been prepared, and if not calls _prepare and then records that the instance has been successfully prepared.

FileSystemClient Quirks:

This method may be overridden by subclasses, but provides the following default behaviour:

  • Ensures self.registry, self.remote and self.cache values are instances of the right types.
  • It replaces string values of self.remote and self.cache with remotes and caches looked up using self.registry.lookup.
  • It looks through each of the fields nominated in self.prepared_fields and, if the corresponding value is callable, sets the value of that field to result of calling that value with a reference to self. By default, prepared_fields contains ‘_host’, ‘_port’, ‘_username’, and ‘_password’.
  • Ensures value of self.port is an integer (or None).
class omniduct.filesystems.base.FileSystemFile(fs, path, mode='r')[source]

Bases: object

A file-like implementation that is interchangeable with native Python file objects, allowing remote files to be treated identically to local files both by omniduct, the user and other libraries.

__init__(fs, path, mode='r')[source]

Initialize self. See help(type(self)) for accurate signature.

class omniduct.filesystems.base.FileSystemFileDesc[source]

Bases: omniduct.filesystems.base.Node

A representation of a file/directory stored within an Omniduct FileSystemClient.

Subclass Reference

For comprehensive documentation on any particular subclass, please refer to one of the below documents.