sdss-access Reference¶
Path¶
- class sdss_access.path.path.BasePath(release=None, public=False, mirror=False, verbose=False, force_modules=None, preserve_envvars=None)[source]¶
Bases:
object
Class for construction of paths in general.
- Parameters:
release (str) – The release name, e.g. ‘DR15’, ‘MPL-9’.
public (bool) – If True, uses public urls. Only needed for public data releases. Automatically set to True when release contains “DR”.
mirror (bool) – If True, uses the mirror data domain url. Default is False.
verbose (bool) – If True, turns on verbosity. Default is False.
force_modules (bool) – If True, forces svn or github software products to use any existing local Module environment paths, e.g. PLATEDESIGN_DIR
preserve_envvars (bool | list) – Flag(s) to indicate some or all original environment variables to preserve
- Variables:
templates (dict) – The set of templates read from the configuration file.
- add_temp_path(name: str, path: str, envvar_path: str | None = None)[source]¶
Add a temporary path template in sdss_access
Add a path template temporarily into the local os environment for use in sdss_access. Define a template name and path. The path must start with an environment variable definition.
This is useful for development of new paths before adding them to the tree and tagging a new version. This allows sdss_access to still be used in the interim. This is an alternative to checking out the tree git repo and modifying paths there. The recommended way of adding new paths is through a PR on the tree product.
- Parameters:
- Raises:
ValueError – when the name does not match the correct syntax
ValueError – when the path does not start with an environment variable
ValueError – when the environment variable is not defined
- any(filetype, **kwargs)[source]¶
Checks if the local directory contains any of the type of file
- Parameters:
filetype (str) – File type parameter.
- Returns:
any (bool) – Boolean indicating if the any files exist in the expanded path on disk.
- static check_modules(template, permanent=None)[source]¶
Check for any existing Module path environment
For software product paths, overrides the tree environment paths with existing original envvars from os.environ that may be set from shell bash or module environments. Checks the original os.environ for any environment variables and replaces the template envvar with the original os version. Ignores all SAS data paths. If permanent is True, then permanently replaces the envvar in existing os.environ with the original. Assumes original environment variables points to definitions created by module files or bash profiles.
- dir(filetype, **kwargs)[source]¶
Return the directory containing a file of a given type.
- Parameters:
filetype (str) – File type parameter.
- Returns:
dir (str) – Directory containing the file.
- extract(name, example)[source]¶
Extract keywords from an example path
Attempts to extract the defined keyword values from an example filepath for a given path name. The filepath must be a full SDSS SAS filepath.
- Parameters:
- Returns:
A dictionary of path keyword values
Example
>>> from sdss_access.path import Path >>> path = Path() >>> filepath = '/Users/Brian/Work/sdss/sas/mangawork/manga/spectro/redux/v2_5_3/8485/stack/manga-8485-1901-LOGCUBE.fits' >>> path.extract('mangacube', filepath) >>> {'drpver': 'v2_5_3', 'plate': '8485', 'ifu': '1901', 'wave': 'LOG'}
- find_location(filetype, **kwargs)[source]¶
Finds a relative location of a product path
Attempts to find a relative path location for a software product path. Loops over all product_roots defined in the tree and tests if a relative location can be extracted, i.e. if the path starts with a given root path. The root environment paths searched are the following in order of precendence: PRODUCT_ROOT, SDSS_SVN_ROOT, SDSS_INSTALL_PRODUCT_ROOT, SDSS_PRODUCT_ROOT, SDSS4_PRODUCT_ROOT. If no root is found uses one directory up from SAS_BASE_DIR.
- static get_available_releases(public=None)[source]¶
Get the available releases
- Parameters:
public (bool) – If True, only return public data releases
- get_netloc(netloc=None, sdss=None, sdss5=None, dtn=None, svn=None, mirror=None)[source]¶
Get a net url domain
Returns an SDSS url domain location. Options are the SDSS SAS domain, the rsync download server, the svn server, or the mirror data domain. The mirror data domain is retrieved either by the
mirror
input keyword argument or by thepath.mirror
attribute.- Parameters:
netloc (str) – An exact net location to return directly
sdss (bool) – If True, returns SDSS data domain: data.sdss.org
sdss5 (bool) – If True, sets the SDSS-V data domain: data.sdss5.org
dtn (bool) – If True, returns SDSS rsync server domain: dtn.sdss.org
svn (bool) – If True, returns SDSS svn domain: svn.sdss.org
mirror (bool) – If True, return SDSS mirror domain: data.mirror.sdss.org.
- Returns:
An http domain name
- has_name(name)[source]¶
Check if a given path name exists in the set of templates
- Parameters:
name (str) – The path name to lookup
- location(filetype, base_dir=None, **kwargs)[source]¶
Return the location of the relative sas path of a given type of file.
- lookup_keys(name)[source]¶
Lookup the keyword arguments needed for a given path name
- Parameters:
name (str) – The name of the path
- Returns:
A list of keywords needed for filepath generation
- lookup_names()[source]¶
Lookup what path names are available
Returns a list of the available path names in sdss_access. Use with lookup_keys to find the required keyword arguments for a given path name.
- Returns:
A list of the available path names.
- name(filetype, **kwargs)[source]¶
Return the name of a file of a given type.
- Parameters:
filetype (str) – File type parameter.
- Returns:
name (str) – Name of a file with no directory information.
- random(filetype, **kwargs)[source]¶
Returns random number of the given type of file
- Parameters:
- Returns:
random (list) – Random file selected from the expanded list of full paths on disk.
- refine(filelist, regex, filterdir='out', **kwargs)[source]¶
Returns a list of files filterd by a regular expression
- Parameters:
filelist (list) – A list of files to filter on.
regex (str) – The regular expression string to filter your list
filterdir ({'in', 'out'}) – Indicates the filter to be inclusive or exclusive ‘out’ removes the items satisfying the regular expression ‘in’ keeps the items satisfying the regular expression
- Returns:
refine (list) – A file list refined by an input regular expression.
- replant_tree(release=None)[source]¶
Replants the tree based on release
Resets the path definitions given a specified release
- Parameters:
release (str) – A release to use when replanting the tree
- set_base_dir(base_dir=None)[source]¶
Sets the base directory
Sets the
base_dir
attribute. Defaults to $SAS_BASE_DIR. Can be overridden by passing inbase_dir
keyword argument. Thebase_dir
sets the beginning part of all local paths.- Parameters:
base_dir (str) – A directory path to use as the base
- set_netloc(netloc=None, sdss=None, sdss5=None, dtn=None, svn=None, mirror=None)[source]¶
Set a url domain location
Sets an SDSS url domain location. Options are the SDSS SAS domain, the rsync download server, the svn server, or the mirror data domain. The mirror data domain is set either by the
mirror
input keyword argument or by thepath.mirror
attribute.- Parameters:
netloc (str) – An exact net location to use directly
sdss (bool) – If True, sets the SDSS-IV data domain: data.sdss.org
sdss5 (bool) – If True, sets the SDSS-V data domain: data.sdss5.org
dtn (bool) – If True, sets the SDSS rsync server domain: dtn.sdss.org
svn (bool) – If True, sets the SDSS svn domain: svn.sdss.org
mirror (bool) – If True, sets the SDSS mirror domain: data.mirror.sdss.org.
- class sdss_access.path.path.Path(release=None, public=False, mirror=False, verbose=False, force_modules=None, preserve_envvars=None)[source]¶
Bases:
BasePath
Class for construction of paths in general. Sets a particular template file.
- Parameters:
release (str) – The release name, e.g. ‘DR15’, ‘MPL-9’.
public (bool) – If True, uses public urls. Only needed for public data releases. Automatically set to True when release contains “DR”.
mirror (bool) – If True, uses the mirror data domain url. Default is False.
verbose (bool) – If True, turns on verbosity. Default is False.
force_modules (bool) – If True, forces svn or github software products to use any existing local Module environment paths, e.g. PLATEDESIGN_DIR
preserve_envvars (bool | list) – Flag(s) to indicate some or all original environment variables to preserve
- Variables:
templates (dict) – The set of templates read from the configuration file.
- cat_id_groups(filetype, **kwargs)[source]¶
Return a folder structure to group data together based on their catalog identifier so that we don’t have too many files in any one folder.
- component_default(filetype, **kwargs)[source]¶
Return the component name, if given.
The component designates a stellar or planetary body following the Washington Multiplicity Catalog, which was adopted by the XXIV meeting of the International Astronomical Union. When no component is given, the star is assumed to be without a discernible companion. When a component is given it follows the system (Hessman et al., arXiv:1012.0707):
- – the brightest component is called “A”, whether it is initially resolved
into sub-components or not;
- – subsequent distinct components not contained within “A” are labeled “B”,
“C”, etc.;
- – sub-components are designated by the concatenation of on or more suffixes
with the primary label, starting with lowercase letters for the 2nd hierarchical level and then with numbers for the 3rd.
- Parameters:
- Returns:
component (str) – The component name if given, otherwise a blank string.
- configsubmodule(filetype, **kwargs)[source]¶
Returns configuration summary submodule group subdirectory
- definitiondir(filetype, **kwargs)[source]¶
Returns definition subdirectory in
PLATELIST_DIR
of the form:NNNNXX
.
- fieldgrp(filetype, **kwargs)[source]¶
Returns the fieldid group for the BOSS idlspec2d run2d version
- isplate(filetype, **kwargs)[source]¶
Returns the plate flag for BOSS idlspec2d run2d versions that utilize it
- pad_fieldid(filetype, **kwargs)[source]¶
Returns the fieldid zero padded to its proper length for the BOSS idlspec2d run2d version
- platedir(filetype, **kwargs)[source]¶
Returns plate subdirectory in
PLATELIST_DIR
of the form:NNNNXX/NNNNNN
.
- sdss_id_groups(filetype, **kwargs)[source]¶
Return a folder structure to group data together based on their SDSS identifier so that we don’t have too many files in any one folder.
- spcoaddfolder(filetype, **kwargs)[source]¶
Returns the reorganized subfolder structure for the BOSS idlspec2d run2d version
- spcoaddgrp(filetype, **kwargs)[source]¶
Returns the coadd group (field group analog) subfolder structure for the BOSS idlspec2d run2d version
- spcoaddobs(filetype, **kwargs)[source]¶
Returns the formatted observatory flag for custom coadds for the BOSS idlspec2d
- spectrodir(filetype, **kwargs)[source]¶
Returns
SPECTRO_REDUX
orBOSS_SPECTRO_REDUX
depending on the value ofrun2d
.
- sdss_access.path.path.check_public_release(release: str | None = None, public: bool = False) bool [source]¶
Check if a release is public
Checks a given release to see if it is public. A release is public if it contains “DR” in the release name, and if todays date is <= the release_date as specified in the Tree.
- Parameters:
- Returns:
bool – If the release if public
- Raises:
AttributeError – when tree does not have a valid release date for a DR tree config
Sync¶
Access¶
- class sdss_access.sync.access.Access(label='sdss_rsync', stream_count=5, mirror=False, public=False, release=None, verbose=False)[source]¶
Bases:
RsyncAccess
Class for providing Rsync or Curl access depending on posix
- access_mode = 'rsync'¶
Auth¶
- class sdss_access.sync.auth.Auth(netloc=None, public=False, verbose=False)[source]¶
Bases:
object
class for setting up SAS authenticaton for SDSS users
- set_netrc()[source]¶
add the following username and password to your ~/.netrc file and remember to chmod 600 ~/.netrc
For SDSS-IV access: machine data.sdss.org login sdss password *-****
For SDSS-V access: machine data.sdss5.org login sdss5 password ***-*
Windows: recommending _netrc following https://stackoverflow.com/questions/6031214/git-how-to-use-netrc-file-on-windows-to-save-user-and-password
BaseAccess¶
- class sdss_access.sync.baseaccess.BaseAccess(label=None, stream_count=5, mirror=False, public=False, release=None, verbose=False, force_modules=None, preserve_envvars=None)[source]¶
-
Class for providing Rsync or Curl access to SDSS SAS Paths
- add_file(path, input_type=None)[source]¶
Adds a file into the list of tasks to download
Adds a full filepath, url, or location to the list of tasks to download. This takes advantage of
sdss_access
parallel streams to download a list of files.This is similar to the
.add
method, except this takes full filepaths or urls as input, whereas the.add
method is better when inputing a product name and path template kwargs.
- abstract generate_stream_task(task=None, out=None)[source]¶
creates the task to put in the download stream
- access_mode = 'rsync'¶
- remote_scheme = None¶
Client¶
- class sdss_access.sync.cli.Cli(label=None, data_dir=None, verbose=False)[source]¶
Bases:
object
Class for providing command line interface (cli) sync scripts, and logs to local disk
- foreground_run(command, test=False, logger=None, logall=False, message=None, outname=None, errname=None)[source]¶
A convenient wrapper to log and perform system calls.
- Parameters:
command (str) – The system command to run. It will be split internally by shlex.split().
test (bool, optional) – Set this to true to not actually run the commands.
logger (logging.logging, optional) – If passed a logging object, diagnostic output will be set to this object.
logall (bool, optional) – Set this to true to always log stdout and stderr (at level DEBUG). Otherwise stdout and stderr will only be logged for nonzero status
message (str, optional) – Call logger.critical(message) with this message and then call sys.exit(status).
outname (str, optional) – If set, use
outname
as the name of a file to contain stdout. It is the responsibility of the caller of this function to clean up the resulting file. Otherwise a temporary file will be used.errname (str, optional) – Same as
outname
, but will contain stderr.
- Returns:
(status,out,err) (tuple) – The exit status, stdout and stderr of the process
Examples
>>> status,out,err = transfer.common.system_call('date')
- tmp_dir = '/tmp'¶
- tmp_exists = True¶
Curl¶
- class sdss_access.sync.curl.CurlAccess(label='sdss_curl', stream_count=5, mirror=False, public=False, release=None, verbose=False)[source]¶
Bases:
BaseAccess
Class for providing Curl access to SDSS SAS Paths
- check_file_exists_locally(destination=None, url_file_size=None, url_file_time=None)[source]¶
Checks if file already exists (note that time check is only accurate to the minute)
- get_query_list(url_query)[source]¶
Search through user specified “*” options and return all possible and valid url paths
- set_url_list(query_path=None)[source]¶
Gets url paths from get_query_list and returns file proparties and path
- access_mode = 'curl'¶
- remote_scheme = 'https'¶
Http¶
- class sdss_access.sync.http.HttpAccess(verbose=None, public=None, release=None, label='sdss_http')[source]¶
-
Class for providing HTTP access via urllib.request (python3) or urllib2 (python2) to SDSS SAS Paths
- download_url_to_path(url, path, force=False)[source]¶
Download a file from url via http, and put it at path
Rsync¶
- class sdss_access.sync.rsync.RsyncAccess(label='sdss_rsync', stream_count=5, mirror=False, public=False, release=None, verbose=False)[source]¶
Bases:
BaseAccess
Class for providing Rsync access to SDSS SAS Paths
- access_mode = 'rsync'¶
- remote_scheme = 'rsync'¶
Stream¶
System Call¶
- sdss_access.sync.system_call.system_call(command, test=False, logger=None, logall=False, message=None, outname=None, errname=None)[source]¶
A convenient wrapper to log and perform system calls.
- Parameters:
command (str) – The system command to run. It will be split internally by shlex.split().
test (bool, optional) – Set this to true to not actually run the commands.
logger (logging.logging, optional) – If passed a logging object, diagnostic output will be set to this object.
logall (bool, optional) – Set this to true to always log stdout and stderr (at level DEBUG). Otherwise stdout and stderr will only be logged for nonzero status
message (str, optional) – Call logger.critical(message) with this message and then call sys.exit(status).
outname (str, optional) – If set, use
outname
as the name of a file to contain stdout. It is the responsibility of the caller of this function to clean up the resulting file. Otherwise a temporary file will be used.errname (str, optional) – Same as
outname
, but will contain stderr.
- Returns:
(status,out,err) (tuple) – The exit status, stdout and stderr of the process
Examples
>>> status,out,err = transfer.common.system_call('date')
Changelog Utilities¶
- sdss_access.path.changelog.compute_changelog(new, old, pprint=None, to_list=None)[source]¶
Compute the difference between two Tree PATH sections
Compares two tree PATH ini sections from the given environment configurations and returns adictionary with keys
new
, andupdated
, indicating newly added paths, and any paths that have been modified from the last release. Accepts either string names of config files, e.g. “dr16” and “dr15”, or the preloadedTree
configs, e.g.Tree(config='dr16')
.- Parameters:
- Returns:
A dictionary of relevant changes between the two releases
- sdss_access.path.changelog.get_available_releases(public=None)[source]¶
Get the available releases
- Parameters:
public (bool) – If True, only return public data releases