biothings.utils¶
biothings.utils.aws¶
- biothings.utils.aws.create_bucket(name, region=None, aws_key=None, aws_secret=None, acl=None, ignore_already_exists=False)[source]¶
Create a S3 bucket “name” in optional “region”. If aws_key and aws_secret are set, S3 client will these, otherwise it’ll use default system-wide setting. “acl” defines permissions on the bucket: “private” (default), “public-read”, “public-read-write” and “authenticated-read”
- biothings.utils.aws.download_s3_file(s3key, localfile=None, aws_key=None, aws_secret=None, s3_bucket=None, overwrite=False)[source]¶
- biothings.utils.aws.get_s3_file(s3key, localfile=None, return_what=False, aws_key=None, aws_secret=None, s3_bucket=None)[source]¶
- biothings.utils.aws.get_s3_file_contents(s3key, aws_key=None, aws_secret=None, s3_bucket=None) bytes [source]¶
- biothings.utils.aws.get_s3_folder(s3folder, basedir=None, aws_key=None, aws_secret=None, s3_bucket=None)[source]¶
- biothings.utils.aws.get_s3_static_website_url(s3key, aws_key=None, aws_secret=None, s3_bucket=None)[source]¶
- biothings.utils.aws.send_s3_big_file(localfile, s3key, overwrite=False, acl=None, aws_key=None, aws_secret=None, s3_bucket=None, storage_class=None)[source]¶
Multiparts upload for file bigger than 5GiB
- biothings.utils.aws.send_s3_file(localfile, s3key, overwrite=False, permissions=None, metadata=None, content=None, content_type=None, aws_key=None, aws_secret=None, s3_bucket=None, redirect=None)[source]¶
save a localfile to s3 bucket with the given key. bucket is set via S3_BUCKET it also save localfile’s lastmodified time in s3 file’s metadata
- Parameters
redirect (str) – if not None, set the redirect property of the object so it produces a 301 when accessed
biothings.utils.backend¶
biothings.utils.common¶
This module contains util functions may be shared by both BioThings data-hub and web components. In general, do not include utils depending on any third-party modules.
- class biothings.utils.common.BiothingsJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶
Bases:
JSONEncoder
A class to dump Python Datetime object. json.dumps(data, cls=DateTimeJSONEncoder, indent=indent)
Constructor for JSONEncoder, with sensible defaults.
If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.
If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.
If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an RecursionError). Otherwise, no such check takes place.
If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.
If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.
If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.
If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is
None
and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a
TypeError
.- default(o)[source]¶
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- class biothings.utils.common.LogPrint(log_f, log=1, timestamp=0)[source]¶
Bases:
object
If this class is set to sys.stdout, it will output both log_f and __stdout__. log_f is a file handler.
- biothings.utils.common.SubStr(input_string, start_string='', end_string='', include=0)[source]¶
Return the substring between start_string and end_string. If start_string is ‘’, cut string from the beginning of input_string. If end_string is ‘’, cut string to the end of input_string. If either start_string or end_string can not be found from input_string, return ‘’. The end_pos is the first position of end_string after start_string. If multi-occurence,cut at the first position. include=0(default), does not include start/end_string; include=1: include start/end_string.
- biothings.utils.common.addsuffix(filename, suffix, noext=False)[source]¶
Add suffix in front of “.extension”, so keeping the same extension. if noext is True, remove extension from the filename.
- async biothings.utils.common.aiogunzipall(folder, pattern, job_manager, pinfo)[source]¶
Gunzip all files in folder matching pattern. job_manager is used for parallelisation, and pinfo is a pre-filled dict used by job_manager to report jobs in the hub (see bt.utils.manager.JobManager)
- biothings.utils.common.anyfile(infile, mode='r')[source]¶
return a file handler with the support for gzip/zip comppressed files. if infile is a two value tuple, then first one is the compressed file; the second one is the actual filename in the compressed file. e.g., (‘a.zip’, ‘aa.txt’)
- biothings.utils.common.ask(prompt, options='YN')[source]¶
Prompt Yes or No,return the upper case ‘Y’ or ‘N’.
- biothings.utils.common.dump(obj, filename, protocol=4, compress='gzip')[source]¶
Saves a compressed object to disk protocol version 4 is the default for py3.8, supported since py3.4
- biothings.utils.common.dump2gridfs(obj, filename, db, protocol=2)[source]¶
Save a compressed (support gzip only) object to MongoDB gridfs.
- biothings.utils.common.file_newer(source, target)[source]¶
return True if source file is newer than target file.
- biothings.utils.common.filter_dict(d, keys)[source]¶
Remove keys from dict “d”. “keys” is a list of string, dotfield notation can be used to express nested keys. If key to remove doesn’t exist, silently ignore it
- biothings.utils.common.find_classes_subclassing(mods, baseclass)[source]¶
Given a module or a list of modules, inspect and find all classes which are a subclass of the given baseclass, inside those modules
- biothings.utils.common.find_doc(k, keys)[source]¶
Used by jsonld insertion in www.api.es._insert_jsonld
- biothings.utils.common.get_compressed_outfile(filename, compress='gzip')[source]¶
Get a output file handler with given compress method. currently support gzip/bz2/lzma, lzma only available in py3
- biothings.utils.common.get_dotfield_value(dotfield, d)[source]¶
Explore dictionary d using dotfield notation and return value. Example:
d = {"a":{"b":1}}. get_dotfield_value("a.b",d) => 1
- biothings.utils.common.is_str(s)[source]¶
return True or False if input is a string or not. python3 compatible.
- biothings.utils.common.iter_n(iterable, n, with_cnt=False)[source]¶
Iterate an iterator by chunks (of n) if with_cnt is True, return (chunk, cnt) each time ref http://stackoverflow.com/questions/8991506/iterate-an-iterator-by-chunks-of-n-in-python
- biothings.utils.common.json_encode(obj)[source]¶
Tornado-aimed json encoder, it does the same job as tornado.escape.json_encode but also deals with datetime encoding
- biothings.utils.common.json_serial(obj)[source]¶
JSON serializer for objects not serializable by default json code
- biothings.utils.common.list2dict(a_list, keyitem, alwayslist=False)[source]¶
Return a dictionary with specified keyitem as key, others as values. keyitem can be an index or a sequence of indexes. For example:
li = [['A','a',1], ['B','a',2], ['A','b',3]] list2dict(li, 0)---> {'A':[('a',1),('b',3)], 'B':('a',2)}
If alwayslist is True, values are always a list even there is only one item in it.
list2dict(li, 0, True)---> {'A':[('a',1),('b',3)], 'B':[('a',2),]}
- biothings.utils.common.loadobj(filename, mode='file')[source]¶
Loads a compressed object from disk file (or file-like handler) or MongoDB gridfs file (mode=’gridfs’)
obj = loadobj('data.pyobj') obj = loadobj(('data.pyobj', mongo_db), mode='gridfs')
- biothings.utils.common.merge(x, dx)[source]¶
Merge dictionary dx (Δx) into dictionary x. If __REPLACE__ key is present in any level z in dx, z in x is replaced, instead of merged, with z in dx.
- biothings.utils.common.newer(t0, t1, fmt='%Y%m%d')[source]¶
t0 and t1 are string of timestamps matching “format” pattern. Return True if t1 is newer than t0.
- biothings.utils.common.open_anyfile(infile, mode='r')[source]¶
a context manager can be used in “with” stmt. accepts a filehandle or anything accepted by anyfile function.
- with open_anyfile(‘test.txt’) as in_f:
do_something()
- biothings.utils.common.open_compressed_file(filename)[source]¶
Get a read-only file-handler for compressed file, currently support gzip/bz2/lzma, lzma only available in py3
- biothings.utils.common.rmdashfr(top)[source]¶
Recursively delete dirs and files from “top” directory, then delete “top” dir
- biothings.utils.common.run_once()[source]¶
should_run_task_1 = run_once() print(should_run_task_1()) -> True print(should_run_task_1()) -> False print(should_run_task_1()) -> False print(should_run_task_1()) -> False
should_run_task_2 = run_once() print(should_run_task_2(‘2a’)) -> True print(should_run_task_2(‘2b’)) -> True print(should_run_task_2(‘2a’)) -> False print(should_run_task_2(‘2b’)) -> False …
- biothings.utils.common.safewfile(filename, prompt=True, default='C', mode='w')[source]¶
return a file handle in ‘w’ mode,use alternative name if same name exist. if prompt == 1, ask for overwriting,appending or changing name, else, changing to available name automatically.
- biothings.utils.common.split_ids(q)[source]¶
split input query string into list of ids. any of ``”
- |,+”`` as the separator,
but perserving a phrase if quoted (either single or double quoted) more detailed rules see: http://docs.python.org/2/library/shlex.html#parsing-rules
e.g.:
>>> split_ids('CDK2 CDK3') ['CDK2', 'CDK3'] >>> split_ids('"CDK2 CDK3"
- CDk4’)
[‘CDK2 CDK3’, ‘CDK4’]
- class biothings.utils.common.splitstr[source]¶
Bases:
str
Type representing strings with space in it
- biothings.utils.common.timesofar(t0, clock=0, t1=None)[source]¶
return the string(eg.’3m3.42s’) for the passed real time/CPU time so far from given t0 (return from t0=time.time() for real time/ t0=time.clock() for CPU time).
- biothings.utils.common.traverse(obj, leaf_node=False)[source]¶
Output path-dictionary pairs. For example, input: {
‘exac_nontcga’: {‘af’: 0.00001883}, ‘gnomad_exome’: {‘af’: {‘af’: 0.0000119429, ‘af_afr’: 0.000123077}}, ‘snpeff’: {‘ann’: [{‘effect’: ‘intron_variant’,
‘feature_id’: ‘NM_014672.3’}, {‘effect’: ‘intron_variant’, ‘feature_id’: ‘NM_001256678.1’}]}
} will be translated to a generator: (
(“exac_nontcga”, {“af”: 0.00001883}), (“gnomad_exome.af”, {“af”: 0.0000119429, “af_afr”: 0.000123077}), (“gnomad_exome”, {“af”: {“af”: 0.0000119429, “af_afr”: 0.000123077}}), (“snpeff.ann”, {“effect”: “intron_variant”, “feature_id”: “NM_014672.3”}), (“snpeff.ann”, {“effect”: “intron_variant”, “feature_id”: “NM_001256678.1”}), (“snpeff.ann”, [{ … },{ … }]), (“snpeff”, {“ann”: [{ … },{ … }]}), (‘’, {‘exac_nontcga’: {…}, ‘gnomad_exome’: {…}, ‘snpeff’: {…}})
) or when traversing leaf nodes: (
(‘exac_nontcga.af’, 0.00001883), (‘gnomad_exome.af.af’, 0.0000119429), (‘gnomad_exome.af.af_afr’, 0.000123077), (‘snpeff.ann.effect’, ‘intron_variant’), (‘snpeff.ann.feature_id’, ‘NM_014672.3’), (‘snpeff.ann.effect’, ‘intron_variant’), (‘snpeff.ann.feature_id’, ‘NM_001256678.1’)
)
- biothings.utils.common.uncompressall(folder)[source]¶
Try to uncompress any known archive files in folder
- biothings.utils.common.untargzall(folder, pattern='*.tar.gz')[source]¶
gunzip and untar all
*.tar.gz
files in “folder”
biothings.utils.configuration¶
- class biothings.utils.configuration.ConfigAttrMeta(confmod: biothings.utils.configuration.MetaField = <factory>, section: biothings.utils.configuration.Text = <factory>, description: biothings.utils.configuration.Paragraph = <factory>, readonly: biothings.utils.configuration.Flag = <factory>, hidden: biothings.utils.configuration.Flag = <factory>, invisible: biothings.utils.configuration.Flag = <factory>)[source]¶
Bases:
object
- class biothings.utils.configuration.ConfigLine(seq)[source]¶
Bases:
UserString
- PATTERNS = (('hidden', re.compile('^#-\\s*hide\\s*-#\\s*$'), <function ConfigLine.<lambda>>), ('invisible', re.compile('^#-\\s*invisible\\s*-#\\s*$'), <function ConfigLine.<lambda>>), ('readonly', re.compile('^#-\\s*readonly\\s*-#\\s*$'), <function ConfigLine.<lambda>>), ('section', re.compile('^#\\*\\s*(.*)\\s*\\*#\\s*$'), <function ConfigLine.<lambda>>), ('description', re.compile('.*\\s*#\\s+(.*)$'), <function ConfigLine.<lambda>>))¶
- class biothings.utils.configuration.ConfigurationValue(code)[source]¶
Bases:
object
type to wrap default value when it’s code and needs to be interpreted later code is passed to eval() in the context of the whole “config” dict (so for instance, paths declared before in the configuration file can be used in the code passed to eval) code will also be executed through exec() if eval() raised a syntax error. This would happen when code contains statements, not just expression. In that case, a variable should be created in these statements (named the same as the original config variable) so the proper value can be through ConfigurationManager.
- class biothings.utils.configuration.ConfigurationWrapper(default_config, conf)[source]¶
Bases:
object
Wraps and manages configuration access and edit. A singleton instance is available throughout all hub apps using biothings.config or biothings.hub.config after calling import biothings.hub. In addition to providing config value access, either from config files or database, config manager can supersede attributes of a class with values coming from the database, allowing dynamic configuration of hub’s elements.
When constructing a ConfigurationWrapper instance, variables will be defined with default values coming from default_config, then they can be overridden by conf’s values, or new variables will be added if not defined in default_conf. Only metadata come from default_config will be used.
- property modified¶
- property readonly¶
- class biothings.utils.configuration.Flag(value=None)[source]¶
Bases:
MetaField
- default¶
alias of
bool
- class biothings.utils.configuration.MetaField(value=None)[source]¶
Bases:
object
- default¶
alias of
None
- property value¶
biothings.utils.dataload¶
Utility functions for parsing flatfiles, mapping to JSON, cleaning.
- biothings.utils.dataload.alwayslist(value)[source]¶
If input value if not a list/tuple type, return it as a single value list.
- biothings.utils.dataload.boolean_convert(d, convert_keys=None, level=0)[source]¶
Explore document d and specified convert keys to boolean. Use dotfield notation for inner keys
- biothings.utils.dataload.dict_apply(d, key, value, sort=True)[source]¶
add value to d[key], append it if key exists
>>> d = {'a': 1} >>> dict_apply(d, 'a', 2) {'a': [1, 2]} >>> dict_apply(d, 'a', 3) {'a': [1, 2, 3]} >>> dict_apply(d, 'b', 2) {'a': 1, 'b': 2}
- biothings.utils.dataload.dict_attrmerge(dict_li, removedup=True, sort=True, special_fns=None)[source]¶
- dict_attrmerge([{‘a’: 1, ‘b’:[2,3]},
{‘a’: [1,2], ‘b’:[3,5], ‘c’=4}])
- sould return
{‘a’: [1,2], ‘b’:[2,3,5], ‘c’=4}
- special_fns is a dictionary of {attr: merge_fn}
used for some special attr, which need special merge_fn e.g., {‘uniprot’: _merge_uniprot}
- biothings.utils.dataload.dict_convert(_dict, keyfn=None, valuefn=None)[source]¶
Return a new dict with each key converted by keyfn (if not None), and each value converted by valuefn (if not None).
- biothings.utils.dataload.dict_sweep(d, vals=None, remove_invalid_list=False)[source]¶
Remove keys whos values are “.”, “-”, “”, “NA”, “none”, ” ” and remove empty dictionaries
- Parameters
d (dict) – a dictionary
vals (str or list) – a string or list of strings to sweep, or None to use the default values
remove_invalid_list (boolean) –
when true, will remove key for which list has only one value, which is part of “vals”. Ex:
test_dict = {'gene': [None, None], 'site': ["Intron", None], 'snp_build' : 136}
with remove_invalid_list == False:
{'gene': [None], 'site': ['Intron'], 'snp_build': 136}
with remove_invalid_list == True:
{'site': ['Intron'], 'snp_build': 136}
- biothings.utils.dataload.dict_to_list(gene_d)[source]¶
return a list of genedoc from genedoc dictionary and make sure the “_id” field exists.
- biothings.utils.dataload.dict_traverse(d, func, traverse_list=False)[source]¶
Recursively traverse dictionary d, calling func(k,v) for each key/value found. func must return a tuple(new_key,new_value)
- biothings.utils.dataload.dict_walk(dictionary, key_func)[source]¶
Recursively apply key_func to dict’s keys
- biothings.utils.dataload.dupline_seperator(dupline, dup_sep, dup_idx=None, strip=False)[source]¶
for a line like this:
a b1,b2 c1,c2
return a generator of this list (breaking out of the duplicates in each field):
[(a,b1,c1), (a,b2,c1), (a,b1,c2), (a,b2,c2)]
Example:
dupline_seperator(dupline=['a', 'b1,b2', 'c1,c2'], dup_idx=[1,2], dup_sep=',')
if dup_idx is None, try to split on every field. if strip is True, also tripe out of extra spaces.
- biothings.utils.dataload.file_merge(infiles, outfile=None, header=1, verbose=1)[source]¶
merge a list of input files with the same format. if header will be removed from the 2nd files in the list.
- biothings.utils.dataload.float_convert(d, include_keys=None, exclude_keys=None)[source]¶
Convert elements in a document to floats.
By default, traverse all keys If include_keys is specified, only convert the list from include_keys a.b, a.b.c If exclude_keys is specified, only exclude the list from exclude_keys
- Parameters
d – a dictionary to traverse keys on
include_keys – only convert these keys (optional)
exclude_keys – exclude all other keys except these keys (optional)
- Returns
generate key, value pairs
- biothings.utils.dataload.int_convert(d, include_keys=None, exclude_keys=None)[source]¶
Convert elements in a document to integers.
By default, traverse all keys If include_keys is specified, only convert the list from include_keys a.b, a.b.c If exclude_keys is specified, only exclude the list from exclude_keys
- Parameters
d – a dictionary to traverse keys on
include_keys – only convert these keys (optional)
exclude_keys – exclude all other keys except these keys (optional)
- Returns
generate key, value pairs
- biothings.utils.dataload.list2dict(a_list, keyitem, alwayslist=False)[source]¶
Return a dictionary with specified keyitem as key, others as values. keyitem can be an index or a sequence of indexes. For example:
li=[['A','a',1], ['B','a',2], ['A','b',3]] list2dict(li,0)---> {'A':[('a',1),('b',3)], 'B':('a',2)}
If alwayslist is True, values are always a list even there is only one item in it:
list2dict(li,0,True)---> {'A':[('a',1),('b',3)], 'B':[('a',2),]}
- biothings.utils.dataload.list_itemcnt(a_list)[source]¶
Return number of occurrence for each type of item in the list.
- biothings.utils.dataload.list_split(d, sep)[source]¶
Split fields by sep into comma separated lists, strip.
- biothings.utils.dataload.listitems(a_list, *idx)[source]¶
Return multiple items from list by given indexes.
- biothings.utils.dataload.listsort(a_list, by, reverse=False, cmp=None, key=None)[source]¶
Given list is a list of sub(list/tuple.) Return a new list sorted by the ith(given from “by” item) item of each sublist.
- biothings.utils.dataload.merge_dict(dict_li, attr_li, missingvalue=None)[source]¶
Merging multiple dictionaries into a new one. Example:
In [136]: d1 = {'id1': 100, 'id2': 200} In [137]: d2 = {'id1': 'aaa', 'id2': 'bbb', 'id3': 'ccc'} In [138]: merge_dict([d1,d2], ['number', 'string']) Out[138]: {'id1': {'number': 100, 'string': 'aaa'}, 'id2': {'number': 200, 'string': 'bbb'}, 'id3': {'string': 'ccc'}} In [139]: merge_dict([d1,d2], ['number', 'string'], missingvalue='NA') Out[139]: {'id1': {'number': 100, 'string': 'aaa'}, 'id2': {'number': 200, 'string': 'bbb'}, 'id3': {'number': 'NA', 'string': 'ccc'}}
- biothings.utils.dataload.merge_duplicate_rows(rows, db)[source]¶
@param rows: rows to be grouped by @param db: database name, string
- biothings.utils.dataload.merge_root_keys(doc1, doc2, exclude=None)[source]¶
- Ex: d1 = {“_id”:1,”a”:”a”,”b”:{“k”:”b”}}
d2 = {“_id”:1,”a”:”A”,”b”:{“k”:”B”},”c”:123}
Both documents have the same _id, and 2 root keys, “a” and “b”. Using this storage, the resulting document will be:
{‘_id’: 1, ‘a’: [‘A’, ‘a’], ‘b’: [{‘k’: ‘B’}, {‘k’: ‘b’}],”c”:123}
- biothings.utils.dataload.normalized_value(value, sort=True)[source]¶
Return a “normalized” value: 1. if a list, remove duplicate and sort it 2. if a list with one item, convert to that single item only 3. if a list, remove empty values 4. otherwise, return value as it is.
- biothings.utils.dataload.rec_handler(infile, block_end='\n', skip=0, include_block_end=False, as_list=False)[source]¶
A generator to return a record (block of text) at once from the infile. The record is separated by one or more empty lines by default. skip can be used to skip top n-th lines if include_block_end is True, the line matching block_end will also be returned. if as_list is True, return a list of lines in one record.
- biothings.utils.dataload.safe_type(f, val)[source]¶
Convert an input string to int/float/… using passed function. If the conversion fails then None is returned. If value of a type other than a string then the original value is returned.
- biothings.utils.dataload.tabfile_feeder(datafile, header=1, sep='\t', includefn=None, coerce_unicode=True, assert_column_no=None)[source]¶
a generator for each row in the file.
- biothings.utils.dataload.to_boolean(val, true_str=None, false_str=None)[source]¶
Normlize str value to boolean value
- biothings.utils.dataload.traverse_keys(d, include_keys=None, exclude_keys=None)[source]¶
Return all key, value pairs for a document.
By default, traverse all keys If include_keys is specified, only traverse the list from include_kes a.b, a.b.c If exclude_keys is specified, only exclude the list from exclude_keys
if a key in include_keys/exclude_keys is not found in d, it’s skipped quietly.
- Parameters
d – a dictionary to traverse keys on
include_keys – only traverse these keys (optional)
exclude_keys – exclude all other keys except these keys (optional)
- Returns
generate key, value pairs
- biothings.utils.dataload.unlist_incexcl(d, include_keys=None, exclude_keys=None)[source]¶
Unlist elements in a document.
If there is 1 value in the list, set the element to that value. Otherwise, leave the list unchanged.
By default, traverse all keys If include_keys is specified, only traverse the list from include_keys a.b, a.b.c If exclude_keys is specified, only exclude the list from exclude_keys
- Parameters
d – a dictionary to unlist
include_keys – only unlist these keys (optional)
exclude_keys – exclude all other keys except these keys (optional)
- Returns
generate key, value pairs
- biothings.utils.dataload.update_dict_recur(d, u)[source]¶
Update dict d with dict u’s values, recursively (so existing values in d but not in u are kept even if nested)
- biothings.utils.dataload.updated_dict(_dict, attrs)[source]¶
Same as dict.update, but return the updated dictionary.
- biothings.utils.dataload.value_convert(_dict, fn, traverse_list=True)[source]¶
For each value in _dict, apply fn and then update _dict with return the value. if traverse_list is True and a value is a list, apply fn to each item of the list.
- biothings.utils.dataload.value_convert_incexcl(d, fn, include_keys=None, exclude_keys=None)[source]¶
Convert elements in a document using a function fn.
By default, traverse all keys If include_keys is specified, only convert the list from include_keys a.b, a.b.c If exclude_keys is specified, only exclude the list from exclude_keys
- Parameters
d – a dictionary to traverse keys on
fn – function to convert elements with
include_keys – only convert these keys (optional)
exclude_keys – exclude all other keys except these keys (optional)
- Returns
generate key, value pairs
biothings.utils.diff¶
biothings.utils.doc_traversal¶
Some utility functions that do document traversal
- biothings.utils.doc_traversal.breadth_first_recursive_traversal(doc, path=None)[source]¶
doesn’t exactly implement breadth first ordering it seems, not sure why…
biothings.utils.docs¶
- biothings.utils.docs.flatten_doc(doc, outfield_sep='.', sort=True)[source]¶
This function will flatten an elasticsearch document (really any json object). outfield_sep is the separator between the fields in the return object. sort specifies whether the output object should be sorted alphabetically before returning
(otherwise output will remain in traveral order)
biothings.utils.dotfield¶
- biothings.utils.dotfield.compose_dot_fields_by_fields(genedoc, fields)[source]¶
reverse funtion of parse_dot_fields
biothings.utils.dotstring¶
- biothings.utils.dotstring.key_value(dictionary, key)[source]¶
- Return a generator for all values in a dictionary specific by a dotstirng (key)
if key is not found from the dictionary, None is returned.
- Parameters
dictionary – a dictionary to return values from
key – key that specifies a value in the dictionary
- Returns
generator for values that match the given key
- biothings.utils.dotstring.last_element(d, key_list)[source]¶
Return the last element and key for a document d given a docstring.
A document d is passed with a list of keys key_list. A generator is then returned for all elements that match all keys. Not that there may be a 1-to-many relationship between keys and elements due to lists in the document.
- Parameters
d – document d to return elements from
key_list – list of keys that specify elements in the document d
- Returns
generator for elements that match all keys
- biothings.utils.dotstring.list_length(d, field)[source]¶
Return the length of a list specified by field.
If field represents a list in the document, then return its length. Otherwise return 0.
- Parameters
d – a dictionary
field – the dotstring field specifying a list
- biothings.utils.dotstring.remove_key(dictionary, key)[source]¶
Remove field specified by the docstring key
- Parameters
dictionary – a dictionary to remove the value from
key – key that specifies an element in the dictionary
- Returns
dictionary after changes have been made
- biothings.utils.dotstring.set_key_value(dictionary, key, value)[source]¶
- Set values all values in dictionary matching a dotstring key to a specified value.
if key is not found in dictionary, it just skip quietly.
- Parameters
dictionary – a dictionary to set values in
key – key that specifies an element in the dictionary
- Returns
dictionary after changes have been made
biothings.utils.es¶
- class biothings.utils.es.Database[source]¶
Bases:
IDatabase
- CONFIG = None¶
- property address¶
Returns sufficient information so a connection to a database can be created. Information can be a dictionary, object, etc… and depends on the actual backend
- class biothings.utils.es.ESIndex(client, index_name)[source]¶
Bases:
object
An Elasticsearch Index Wrapping A Client. Counterpart for pymongo.collection.Collection
- property doc_type¶
- class biothings.utils.es.ESIndexer(index, doc_type='_doc', es_host='localhost:9200', step=500, step_size=10, number_of_shards=1, number_of_replicas=0, check_index=True, **kwargs)[source]¶
Bases:
object
- check_index()[source]¶
Check if index is an alias, and update self._index to point to actual index
- TODO: the overall design of ESIndexer is not great. If we are exposing ES
implementation details (such as the abilities to create and delete indices, create and update aliases, etc.) to the user of this Class, then this method doesn’t seem that out of place.
- clean_field(field, dryrun=True, step=5000)[source]¶
remove a top-level field from ES index, if the field is the only field of the doc, remove the doc as well. step is the size of bulk update on ES try first with dryrun turned on, and then perform the actual updates with dryrun off.
- find_biggest_doc(fields_li, min=5, return_doc=False)[source]¶
return the doc with the max number of fields from fields_li.
- get_alias(index: Optional[str] = None, alias_name: Optional[str] = None) List[str] [source]¶
Get indices with alias associated with given index name or alias name
- Parameters
index – name of index
alias_name – name of alias
- Returns
Mapping of index names with their aliases
- get_docs(**kwargs)[source]¶
Return matching docs for given ids iterable, if not found return None. A generator is returned to the matched docs. If only_source is False, the entire document is returned, otherwise only the source is returned.
- get_indice_names_by_settings(index: Optional[str] = None, sort_by_creation_date=False, reverse=False) List[str] [source]¶
Get list of indices names associated with given index name, using indices’ settings
- Parameters
index – name of index
sort_by_creation_date – sort the result by indice’s creation_date
reverse – control the direction of the sorting
- Returns
list of index names (str)
- get_settings(index: Optional[str] = None) Mapping[str, Mapping] [source]¶
Get indices with settings associated with given index name
- Parameters
index – name of index
- Returns
Mapping of index names with their settings
- index(doc, id=None, action='index')[source]¶
add a doc to the index. If id is not None, the existing doc will be updated.
- update(id, extra_doc, upsert=True)[source]¶
update an existing doc with extra_doc. allow to set upsert=True, to insert new docs.
- update_alias(alias_name: str, index: Optional[str] = None)[source]¶
Create or update an ES alias pointing to an index
Creates or updates an alias in Elasticsearch, associated with the given index name or the underlying index of the ESIndexer instance.
When the alias name does not exist, it will be created. If an existing alias already exists, it will be updated to only associate with the index.
When the alias name already exists, an exception will be raised, UNLESS the alias name is the same as index name that the ESIndexer is initialized with. In this case, the existing index with the name collision will be deleted, and the alias will be created in its place. This feature is intended for seamless migration from an index to an alias associated with an index for zero-downtime installs.
- Parameters
alias_name – name of the alias
index – name of the index to associate with alias. If None, the index of the ESIndexer instance is used.
- Raises
- biothings.utils.es.generate_es_mapping(inspect_doc, init=True, level=0)[source]¶
Generate an ES mapping according to “inspect_doc”, which is produced by biothings.utils.inspect module
biothings.utils.exclude_ids¶
- class biothings.utils.exclude_ids.ExcludeFieldsById(exclusion_ids, field_lst, min_list_size=1000)[source]¶
Bases:
object
This class provides a framework to exclude fields for certain identifiers. Up to three arguments are passed to this class, an identifier list, a list of fields to remove, and minimum list size. The identifier list is a list of document identifiers to act on. The list of fields are fields that will be removed; they are specified using a dotstring notation. The minimum list size is the minimum number of elements that should be in a list in order for it to be removed. The ‘drugbank’, ‘chebi’, and ‘ndc’ data sources were manually tested with this class.
Fields to truncate are specified by field_lst. The dot-notation is accepted.
biothings.utils.hub¶
biothings.utils.hub_db¶
hub_db module is a place-holder for internal hub database functions. Hub DB contains informations about sources, configurations variables, etc… It’s for internal usage. When biothings.config_for_app() is called, this module will be “filled” with the actual implementations from the specified backend (speficied in config.py, or defaulting to MongoDB).
Hub DB can be implemented over different backend, it’s orginally been done using MongoDB, so the dialect is very inspired by pymongo. Any hub db backend implementation must implement the functions and classes below. See biothings.utils.mongo and biothings.utils.sqlit3 for some examples.
- class biothings.utils.hub_db.ChangeWatcher[source]¶
Bases:
object
- col_entity = {'cmd': 'command', 'hub_config': 'config', 'src_build': 'build', 'src_build_config': 'build_config', 'src_dump': 'source', 'src_master': 'master'}¶
- do_publish = False¶
- event_queue = <Queue at 0x7f842af26740 maxsize=0>¶
- listeners = {}¶
- class biothings.utils.hub_db.Collection(colname, db)[source]¶
Bases:
object
Defines a minimal subset of MongoDB collection behavior. Note: Collection instances must be pickleable (if not, __getstate__ can be implemented to deal with those attributes for instance)
Init args can differ depending on the backend requirements. colname is the only one required.
- find(*args, **kwargs)[source]¶
Return an iterable of documents matching criterias defined in *args[0] (which will be a dict). Query dialect is a minimal one, inspired by MongoDB. Dict can contain the name of a key, and the value being searched for. Ex: {“field1”:”value1”} will return all documents where field1 == “value1”. Nested key (field1.subfield1) aren’t supported (no need to implement). Exact matches only are required.
If no query is passed, or if query is an empty dict, return all documents.
- find_one(*args, **kwargs)[source]¶
Return one document from the collection. *args will contain a dict with the query parameters. See also find()
- property name¶
Return the collection/table name
- replace_one(query, doc)[source]¶
Replace a document matching ‘query’ (or the first found one) with passed doc
- save(doc)[source]¶
Shortcut to update_one() or insert_one(). Save the document, by either inserting if it doesn’t exist, or update existing one
- update_one(query, what, upsert=False)[source]¶
Update one document (or the first matching query). See find() for query parameter. “what” tells how to update the document. $set/$unset/$push operators must be implemented (refer to MongoDB documentation for more). Nested keys operation aren’t necesary.
- class biothings.utils.hub_db.IDatabase[source]¶
Bases:
object
This class declares an interface and partially implements some of it, mimicking mongokit.Connection class. It’s used to keep used document model. Any internal backend should implement (derives) this interface
- property address¶
Returns sufficient information so a connection to a database can be created. Information can be a dictionary, object, etc… and depends on the actual backend
- biothings.utils.hub_db.backup(folder='.', archive=None)[source]¶
Dump the whole hub_db database in given folder. “archive” can be pass to specify the target filename, otherwise, it’s randomly generated
Note
this doesn’t backup source/merge data, just the internal data used by the hub
- biothings.utils.hub_db.get_cmd()[source]¶
Return a Collection instance for commands collection/table
- biothings.utils.hub_db.get_data_plugin()[source]¶
Return a Collection instance for data_plugin collection/table
- biothings.utils.hub_db.get_event()[source]¶
Return a Collection instance for events collection/table
- biothings.utils.hub_db.get_hub_config()[source]¶
Return a Collection instance storing configuration values
- biothings.utils.hub_db.get_last_command()[source]¶
Return the latest cmd document (according to _id)
- biothings.utils.hub_db.get_source_fullname(col_name)[source]¶
Assuming col_name is a collection created from an upload process, find the main source & sub_source associated.
- biothings.utils.hub_db.get_src_build()[source]¶
Return a Collection instance for src_build collection/table
- biothings.utils.hub_db.get_src_build_config()[source]¶
Return a Collection instance for src_build_hnonfig collection/table
- biothings.utils.hub_db.get_src_dump()[source]¶
Return a Collection instance for src_dump collection/table
- biothings.utils.hub_db.get_src_master()[source]¶
Return a Collection instance for src_master collection/table
biothings.utils.info¶
biothings.utils.inspect¶
This module contains util functions may be shared by both BioThings data-hub and web components. In general, do not include utils depending on any third-party modules. Note: unittests available in biothings.tests.hub
- class biothings.utils.inspect.BaseMode[source]¶
Bases:
object
- key = None¶
- report(struct, drep, orig_struct=None)[source]¶
Given a data structure “struct” being inspected, report (fill) “drep” dictionary with useful values for this mode, under drep[self.key] key. Sometimes “struct” is already converted to its analytical value at this point (inspect may count number of dict and would force to pass struct as “1”, instead of the whole dict, where number of keys could be then be reported), “orig_struct” is that case contains the original structure that was to be reported, whatever the pre-conversion step did.
- template = {}¶
- class biothings.utils.inspect.DeepStatsMode[source]¶
Bases:
StatsMode
- key = '_stats'¶
- merge(target_stats, tomerge_stats)[source]¶
Merge two different maps together (from tomerge into target)
- report(val, drep, orig_struct=None)[source]¶
Given a data structure “struct” being inspected, report (fill) “drep” dictionary with useful values for this mode, under drep[self.key] key. Sometimes “struct” is already converted to its analytical value at this point (inspect may count number of dict and would force to pass struct as “1”, instead of the whole dict, where number of keys could be then be reported), “orig_struct” is that case contains the original structure that was to be reported, whatever the pre-conversion step did.
- template = {'_stats': {'__vals': [], '_count': 0, '_max': -inf, '_min': inf}}¶
- class biothings.utils.inspect.IdentifiersMode[source]¶
Bases:
RegexMode
- ids = None¶
- key = '_ident'¶
- matchers = None¶
- class biothings.utils.inspect.RegexMode[source]¶
Bases:
BaseMode
- matchers = []¶
- report(val, drep, orig_struct=None)[source]¶
Given a data structure “struct” being inspected, report (fill) “drep” dictionary with useful values for this mode, under drep[self.key] key. Sometimes “struct” is already converted to its analytical value at this point (inspect may count number of dict and would force to pass struct as “1”, instead of the whole dict, where number of keys could be then be reported), “orig_struct” is that case contains the original structure that was to be reported, whatever the pre-conversion step did.
- class biothings.utils.inspect.StatsMode[source]¶
Bases:
BaseMode
- key = '_stats'¶
- merge(target_stats, tomerge_stats)[source]¶
Merge two different maps together (from tomerge into target)
- report(struct, drep, orig_struct=None)[source]¶
Given a data structure “struct” being inspected, report (fill) “drep” dictionary with useful values for this mode, under drep[self.key] key. Sometimes “struct” is already converted to its analytical value at this point (inspect may count number of dict and would force to pass struct as “1”, instead of the whole dict, where number of keys could be then be reported), “orig_struct” is that case contains the original structure that was to be reported, whatever the pre-conversion step did.
- template = {'_stats': {'_count': 0, '_max': -inf, '_min': inf, '_none': 0}}¶
- biothings.utils.inspect.get_converters(modes, logger=<module 'logging' from '/home/docs/.asdf/installs/python/3.10.4/lib/python3.10/logging/__init__.py'>)[source]¶
- biothings.utils.inspect.inspect(struct, key=None, mapt=None, mode='type', level=0, logger=<module 'logging' from '/home/docs/.asdf/installs/python/3.10.4/lib/python3.10/logging/__init__.py'>)[source]¶
Explore struct and report types contained in it.
- Parameters
struct – is the data structure to explore
mapt – if not None, will complete that type map with passed struct. This is useful when iterating over a dataset of similar data, trying to find a good type summary contained in that dataset.
level – is for internal purposes, mostly debugging
mode – see inspect_docs() documentation
- biothings.utils.inspect.inspect_docs(docs, mode='type', clean=True, merge=False, logger=<module 'logging' from '/home/docs/.asdf/installs/python/3.10.4/lib/python3.10/logging/__init__.py'>, pre_mapping=False, limit=None, sample=None, metadata=True, auto_convert=True)[source]¶
Inspect docs and return a summary of its structure:
- Parameters
mode –
possible values are:
”type”: (default) explore documents and report strict data structure
- ”mapping”: same as type but also perform test on data so guess best mapping
(eg. check if a string is splitable, etc…). Implies merge=True
”stats”: explore documents and compute basic stats (count,min,max,sum)
- ”deepstats”: same as stats but record values and also compute mean,stdev,median
(memory intensive…)
”jsonschema”, same as “type” but returned a json-schema formatted result
mode can also be a list of modes, eg. [“type”,”mapping”]. There’s little overhead computing multiple types as most time is spent on actually getting the data.
clean – don’t delete recorded vqlues or temporary results
merge – merge scalar into list when both exist (eg. {“val”:..} and [{“val”:…}]
limit – can limit the inspection to the x first docs (None = no limit, inspects all)
sample – in combination with limit, randomly extract a sample of ‘limit’ docs (so not necessarily the x first ones defined by limit). If random.random() is greater than sample, doc is inspected, otherwise it’s skipped
metadata – compute metadata on the result
auto_convert – run converters automatically (converters are used to convert one mode’s output to another mode’s output, eg. type to jsonschema)
biothings.utils.jsondiff¶
The MIT License (MIT)
Copyright (c) 2014 Ilya Volkov
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
biothings.utils.jsonpatch¶
Apply JSON-Patches (RFC 6902)
- class biothings.utils.jsonpatch.AddOperation(operation)[source]¶
Bases:
PatchOperation
Adds an object property or an array element.
- class biothings.utils.jsonpatch.CopyOperation(operation)[source]¶
Bases:
PatchOperation
Copies an object property or an array element to a new location
- exception biothings.utils.jsonpatch.InvalidJsonPatch[source]¶
Bases:
JsonPatchException
Raised if an invalid JSON Patch is created
- class biothings.utils.jsonpatch.JsonPatch(patch)[source]¶
Bases:
object
A JSON Patch is a list of Patch Operations.
>>> patch = JsonPatch([ ... {'op': 'add', 'path': '/foo', 'value': 'bar'}, ... {'op': 'add', 'path': '/baz', 'value': [1, 2, 3]}, ... {'op': 'remove', 'path': '/baz/1'}, ... {'op': 'test', 'path': '/baz', 'value': [1, 3]}, ... {'op': 'replace', 'path': '/baz/0', 'value': 42}, ... {'op': 'remove', 'path': '/baz/1'}, ... ]) >>> doc = {} >>> result = patch.apply(doc) >>> expected = {'foo': 'bar', 'baz': [42]} >>> result == expected True
JsonPatch object is iterable, so you could easily access to each patch statement in loop:
>>> lpatch = list(patch) >>> expected = {'op': 'add', 'path': '/foo', 'value': 'bar'} >>> lpatch[0] == expected True >>> lpatch == patch.patch True
Also JsonPatch could be converted directly to
bool
if it contains any operation statements:>>> bool(patch) True >>> bool(JsonPatch([])) False
This behavior is very handy with
make_patch()
to write more readable code:>>> old = {'foo': 'bar', 'numbers': [1, 3, 4, 8]} >>> new = {'baz': 'qux', 'numbers': [1, 4, 7]} >>> patch = make_patch(old, new) >>> if patch: ... # document have changed, do something useful ... patch.apply(old) {...}
- apply(orig_obj, in_place=False, ignore_conflicts=False, verify=False)[source]¶
Applies the patch to given object.
- Parameters
obj (dict) – Document object.
in_place (bool) – Tweaks way how patch would be applied - directly to specified obj or to his copy.
- Returns
Modified obj.
- classmethod from_diff(src, dst)[source]¶
Creates JsonPatch instance based on comparing of two document objects. Json patch would be created for src argument against dst one.
- Parameters
src (dict) – Data source document object.
dst (dict) – Data source document object.
- Returns
JsonPatch
instance.
>>> src = {'foo': 'bar', 'numbers': [1, 3, 4, 8]} >>> dst = {'baz': 'qux', 'numbers': [1, 4, 7]} >>> patch = JsonPatch.from_diff(src, dst) >>> new = patch.apply(src) >>> new == dst True
- exception biothings.utils.jsonpatch.JsonPatchConflict[source]¶
Bases:
JsonPatchException
Raised if patch could not be applied due to conflict situation such as: - attempt to add object key then it already exists; - attempt to operate with nonexistence object key; - attempt to insert value to array at position beyond of it size; - etc.
- exception biothings.utils.jsonpatch.JsonPatchException[source]¶
Bases:
Exception
Base Json Patch exception
- exception biothings.utils.jsonpatch.JsonPatchTestFailed[source]¶
Bases:
JsonPatchException
,AssertionError
A Test operation failed
- class biothings.utils.jsonpatch.MoveOperation(operation)[source]¶
Bases:
PatchOperation
Moves an object property or an array element to new location.
- class biothings.utils.jsonpatch.PatchOperation(operation)[source]¶
Bases:
object
A single operation inside a JSON Patch.
- class biothings.utils.jsonpatch.RemoveOperation(operation)[source]¶
Bases:
PatchOperation
Removes an object property or an array element.
- class biothings.utils.jsonpatch.ReplaceOperation(operation)[source]¶
Bases:
PatchOperation
Replaces an object property or an array element by new value.
- class biothings.utils.jsonpatch.TestOperation(operation)[source]¶
Bases:
PatchOperation
Test value by specified location.
- biothings.utils.jsonpatch.apply_patch(doc, patch, in_place=False, ignore_conflicts=False, verify=False)[source]¶
Apply list of patches to specified json document.
- Parameters
doc (dict) – Document object.
patch (list or str) – JSON patch as list of dicts or raw JSON-encoded string.
in_place (bool) – While
True
patch will modify target document. By default patch will be applied to document copy.ignore_conflicts (bool) – Ignore JsonConflicts errors
verify (bool) – works with ignore_conflicts = True, if errors and verify is True (recommanded), make sure the resulting objects is the same as the original one. ignore_conflicts and verify are used to run patches multiple times and get rif of errors when operations can’t be performed multiple times because the object has already been patched This will force in_place to False in order the comparison to occur.
- Returns
Patched document object.
- Return type
dict
>>> doc = {'foo': 'bar'} >>> patch = [{'op': 'add', 'path': '/baz', 'value': 'qux'}] >>> other = apply_patch(doc, patch) >>> doc is not other True >>> other == {'foo': 'bar', 'baz': 'qux'} True >>> patch = [{'op': 'add', 'path': '/baz', 'value': 'qux'}] >>> apply_patch(doc, patch, in_place=True) == {'foo': 'bar', 'baz': 'qux'} True >>> doc == other True
- biothings.utils.jsonpatch.get_loadjson()[source]¶
adds the object_pairs_hook parameter to json.load when possible
The “object_pairs_hook” parameter is used to handle duplicate keys when loading a JSON object. This parameter does not exist in Python 2.6. This methods returns an unmodified json.load for Python 2.6 and a partial function with object_pairs_hook set to multidict for Python versions that support the parameter.
- biothings.utils.jsonpatch.make_patch(src, dst)[source]¶
Generates patch by comparing of two document objects. Actually is a proxy to
JsonPatch.from_diff()
method.- Parameters
src (dict) – Data source document object.
dst (dict) – Data source document object.
>>> src = {'foo': 'bar', 'numbers': [1, 3, 4, 8]} >>> dst = {'baz': 'qux', 'numbers': [1, 4, 7]} >>> patch = make_patch(src, dst) >>> new = patch.apply(src) >>> new == dst True
biothings.utils.jsonschema¶
biothings.utils.loggers¶
- class biothings.utils.loggers.Colors(value)[source]¶
Bases:
Enum
An enumeration.
- CRITICAL = '#7b0099'¶
- DEBUG = '#a1a1a1'¶
- ERROR = 'danger'¶
- INFO = 'good'¶
- NOTSET = '#d6d2d2'¶
- WARNING = 'warning'¶
- class biothings.utils.loggers.EventRecorder(*args, **kwargs)[source]¶
Bases:
StreamHandler
Initialize the handler.
If stream is not specified, sys.stderr is used.
- emit(record)[source]¶
Emit a record.
If a formatter is specified, it is used to format the record. The record is then written to the stream with a trailing newline. If exception information is present, it is formatted using traceback.print_exception and appended to the stream. If the stream has an ‘encoding’ attribute, it is used to determine how to do the output to the stream.
- class biothings.utils.loggers.Range(start: Union[int, float] = 0, end: Union[int, float] = inf)[source]¶
Bases:
object
- end: Union[int, float] = inf¶
- start: Union[int, float] = 0¶
- class biothings.utils.loggers.Record(range, value)[source]¶
Bases:
NamedTuple
Create new instance of Record(range, value)
- value: Enum¶
Alias for field number 1
- class biothings.utils.loggers.ShellLogger(*args, **kwargs)[source]¶
Bases:
Logger
Custom “levels” for input going to the shell and output coming from it (just for naming)
Initialize the logger with a name and an optional level.
- INPUT = 1001¶
- OUTPUT = 1000¶
- class biothings.utils.loggers.SlackHandler(webhook, mentions)[source]¶
Bases:
StreamHandler
Initialize the handler.
If stream is not specified, sys.stderr is used.
- emit(record)[source]¶
Emit a record.
If a formatter is specified, it is used to format the record. The record is then written to the stream with a trailing newline. If exception information is present, it is formatted using traceback.print_exception and appended to the stream. If the stream has an ‘encoding’ attribute, it is used to determine how to do the output to the stream.
- class biothings.utils.loggers.Squares(value)[source]¶
Bases:
Enum
An enumeration.
- CRITICAL = ':large_purple_square:'¶
- DEBUG = ':white_large_square:'¶
- ERROR = ':large_red_square:'¶
- INFO = ':large_blue_square:'¶
- NOTSET = ''¶
- WARNING = ':large_orange_square:'¶
- class biothings.utils.loggers.WSLogHandler(listener)[source]¶
Bases:
StreamHandler
when listener is a bt.hub.api.handlers.ws.LogListener instance, log statements are propagated through existing websocket
Initialize the handler.
If stream is not specified, sys.stderr is used.
- emit(record)[source]¶
Emit a record.
If a formatter is specified, it is used to format the record. The record is then written to the stream with a trailing newline. If exception information is present, it is formatted using traceback.print_exception and appended to the stream. If the stream has an ‘encoding’ attribute, it is used to determine how to do the output to the stream.
- class biothings.utils.loggers.WSShellHandler(listener)[source]¶
Bases:
WSLogHandler
when listener is a bt.hub.api.handlers.ws.LogListener instance, log statements are propagated through existing websocket
Initialize the handler.
If stream is not specified, sys.stderr is used.
- biothings.utils.loggers.configurate_file_handler(logger, logfile, formater=None, force=False)[source]¶
biothings.utils.manager¶
biothings.utils.mongo¶
biothings.utils.parallel¶
biothings.utils.parallel_mp¶
biothings.utils.parsers¶
- biothings.utils.parsers.json_array_parser(patterns: Optional[Iterable[str]] = None) Callable[[str], Generator[dict, None, None]] [source]¶
Create JSON Array Parser given filename patterns
For use with manifest.json based plugins. The data comes in a JSON that is an JSON array, containing multiple documents.
- Parameters
patterns – glob-compatible patterns for filenames, like .json, data.json
- Returns
parser_func
- biothings.utils.parsers.ndjson_parser(patterns: Optional[Iterable[str]] = None) Callable[[str], Generator[dict, None, None]] [source]¶
Create NDJSON Parser given filename patterns
For use with manifest.json based plugins. Caveat: Only handles valid NDJSON (no extra newlines, UTF8, etc.)
- Parameters
patterns – glob-compatible patterns for filenames, like .ndjson, data.ndjson
- Returns
- Generator that takes in a data_folder and returns documents from
NDJSON files that matches the filename patterns
- Return type
parser_func
biothings.utils.redis¶
- class biothings.utils.redis.RedisClient(connection_params)[source]¶
Bases:
object
- client = None¶
- get_db(db_name=None)[source]¶
Return a redict client instance from a database name or database number (if db_name is an integer)
- initialize(deep=False)[source]¶
Careful: this may delete data. Prepare Redis instance to work with biothings hub: - database 0: this db is used to store a mapping between
database index and database name (so a database can be accessed by name). This method will flush this db and prepare it.
any other databases will be flushed if deep is True, making the redis server fully dedicated to
- property mapdb¶
biothings.utils.serializer¶
biothings.utils.shelve¶
biothings.utils.sqlite3¶
- class biothings.utils.sqlite3.Collection(colname, db)[source]¶
Bases:
object
- property database¶
- property name¶
- class biothings.utils.sqlite3.Database[source]¶
Bases:
IDatabase
- property address¶
Returns sufficient information so a connection to a database can be created. Information can be a dictionary, object, etc… and depends on the actual backend
biothings.utils.version¶
Functions to return versions of things.
- biothings.utils.version.check_new_version(folder, max_commits=10)[source]¶
Given a folder pointing to a Git repo, return a dict containing info about remote commits not qpplied yet to the repo, or empty dict if nothing new.
- biothings.utils.version.get_python_version()[source]¶
Get a list of python packages installed and their versions.
- biothings.utils.version.get_repository_information(app_dir=None)[source]¶
Get the repository information for the local repository, if it exists.
- biothings.utils.version.get_source_code_info(src_file)[source]¶
Given a path to a source code, try to find information about repository, revision, URL pointing to that file, etc… Return None if nothing can be determined. Tricky cases:
src_file could refer to another repo, within current repo (namely a remote data plugin, cloned within the api’s plugins folder
src_file could point to a folder, when for instance a dataplugin is analized. This is because we can’t point to an uploader file since it’s dynamically generated