biothings.web.query

biothings.web.query.builder

Biothings Query Builder

Turn the biothings query language to that of the database. The interface contains a query term (q) and query options.

Depending on the underlying database choice, the data type of the query term and query options vary. At a minimum, a query builder should support:

q: str, a query term,

when not provided, always perform a match all query. when provided as an empty string, always match none.

options: dotdict, optional query options.

scopes: list[str], the fields to look for the query term.

the meaning of scopes being an empty list or a None object/not provided is controlled by specific class implementations or not defined.

_source: list[str], fields to return in the result. size: int, maximum number of hits to return. from_: int, starting index of result to return. sort: str, customized sort keys for result list

aggs: str, customized aggregation string. post_filter: str, when provided, the search hits are filtered after the aggregations are calculated. facet_size: int, maximum number of agg results.

class biothings.web.query.builder.ESQueryBuilder(user_query=None, scopes_regexs=(), scopes_default=('_id',), allow_random_query=True, allow_nested_query=False, metadata=None)[source]

Bases: object

Build an Elasticsearch query with elasticsearch-dsl.

apply_extras(search, options)[source]

Process non-query options and customize their behaviors. Customized aggregation syntax string is translated here.

build(q=None, **options)[source]

Build a query according to q and options. This is the public method called by API handlers.

Regarding scopes:

scopes: [str] nonempty, match query. scopes: NoneType, or [], no scope, so query string query.

Additionally support these options:

explain: include es scoring information userquery: customized function to interpret q

  • additional keywords are passed through as es keywords

    for example: ‘explain’, ‘version’ …

  • multi-search is supported when q is a list. all queries

    are built individually and then sent in one request.

default_match_query(q, scopes, options)[source]

Override this to customize default match query. By default it implements a multi_match query.

default_string_query(q, options)[source]

Override this to customize default string query. By default it implements a query string query.

class biothings.web.query.builder.ESScrollID(seq: object)[source]

Bases: UserString

class biothings.web.query.builder.ESUserQuery(path)[source]

Bases: object

get_filter(named_query)[source]
get_query(named_query, **kwargs)[source]
has_filter(named_query)[source]
has_query(named_query)[source]
property logger
class biothings.web.query.builder.Group(term, scopes)

Bases: tuple

Create new instance of Group(term, scopes)

scopes

Alias for field number 1

term

Alias for field number 0

class biothings.web.query.builder.MongoQueryBuilder(default_scopes=('_id',))[source]

Bases: object

build(q, **options)[source]
class biothings.web.query.builder.QStringParser(default_scopes=('_id',), patterns=(('(?P<scope>\\w+):(?P<term>[^:]+)', ()),), gpnames=('term', 'scope'))[source]

Bases: object

parse(q)[source]
class biothings.web.query.builder.Query(term, scopes)

Bases: tuple

Create new instance of Query(term, scopes)

scopes

Alias for field number 1

term

Alias for field number 0

exception biothings.web.query.builder.RawQueryInterrupt(data)[source]

Bases: Exception

class biothings.web.query.builder.SQLQueryBuilder(tables, default_scopes=('id',), default_limit=10)[source]

Bases: object

build(q, **options)[source]

biothings.web.query.engine

Search Execution Engine

Take the output of the query builder and feed to the corresponding database engine. This stage typically resolves the db destination from a biothing_type and applies presentation and/or networking parameters.

Example:

>>> from biothings.web.query import ESQueryBackend
>>> from elasticsearch import Elasticsearch
>>> from elasticsearch_dsl import Search
>>> backend = ESQueryBackend(Elasticsearch())
>>> backend.execute(Search().query("match", _id="1017"))
>>> _["hits"]["hits"][0]["_source"].keys()
dict_keys(['taxid', 'symbol', 'name', ... ])
class biothings.web.query.engine.AsyncESQueryBackend(client, indices=None, scroll_time='1m', scroll_size=1000, multisearch_concurrency=5, total_hits_as_int=True)[source]

Bases: ESQueryBackend

Execute an Elasticsearch query

async execute(query, **options)[source]

Execute the corresponding query. Must return an awaitable. May override to add more. Handle uncaught exceptions.

Options:

fetch_all: also return a scroll_id for this query (default: false) biothing_type: which type’s corresponding indices to query (default in config.py)

class biothings.web.query.engine.ESQueryBackend(client, indices=None)[source]

Bases: object

adjust_index(original_index, query, **options)[source]

Override to get specific ES index.

execute(query, **options)[source]
exception biothings.web.query.engine.EndScrollInterrupt[source]

Bases: ResultInterrupt

class biothings.web.query.engine.MongoQueryBackend(client, collections)[source]

Bases: object

execute(query, **options)[source]
exception biothings.web.query.engine.RawResultInterrupt(data)[source]

Bases: ResultInterrupt

exception biothings.web.query.engine.ResultInterrupt(data)[source]

Bases: Exception

class biothings.web.query.engine.SQLQueryBackend(client)[source]

Bases: object

execute(query, **options)[source]

biothings.web.query.formatter

Search Result Formatter

Transform the raw query result into consumption-friendly structures by possibly removing from, adding to, and/or flattening the raw response from the database engine for one or more individual queries.

class biothings.web.query.formatter.Doc(dict=None, /, **kwargs)[source]

Bases: FormatterDict

{

“_id”: … , “_score”: … , …

}

class biothings.web.query.formatter.ESResultFormatter(licenses=None, license_transform=None, field_notes=None, excluded_keys=())[source]

Bases: ResultFormatter

Class to transform the results of the Elasticsearch query generated prior in the pipeline. This contains the functions to extract the final document from the elasticsearch query result in `Elasticsearch Query`_. This also contains the code to flatten a document etc.

transform(response, **options)[source]

Transform the query response to a user-friendly structure. Mainly deconstruct the elasticsearch response structure and hand over to transform_doc to apply the options below.

Options:

# generic transformations for dictionaries # —————————————— dotfield: flatten a dictionary using dotfield notation _sorted: sort keys alaphabetically in ascending order always_list: ensure the fields specified are lists or wrapped in a list allow_null: ensure the fields specified are present in the result,

the fields may be provided as type None or [].

# additional multisearch result transformations # ———————————————— template: base dict for every result, for example: {“success”: true} templates: a different base for every result, replaces the setting above template_hit: a dict to update every positive hit result, default: {“found”: true} template_miss: a dict to update every query with no hit, default: {“found”: false}

# document format and content management # ————————————— biothing_type: result document type to apply customized transformation.

for example, add license field basing on document type’s metadata.

one: return the individual document if there’s only one hit. ignore this setting

if there are multiple hits. return None if there is no hit. this option is not effective when aggregation results are also returned in the same query.

native: bool, if the returned result is in python primitive types. version: bool, if _version field is kept. score: bool, if _score field is kept. with_total: bool, if True, the response will include max_total documents,

and a message to tell how many query terms return greater than the max_size of hits. The default is False. An example when with_total is True: {

‘max_total’: 100, ‘msg’: ‘12 query terms return > 1000 hits, using from=1000 to retrieve the remaining hits’, ‘hits’: […]

}

transform_aggs(res)[source]

Transform the aggregations field and make it more presentable. For example, these are the fields of a two level nested aggregations:

aggregations.<term>.doc_count_error_upper_bound aggregations.<term>.sum_other_doc_count aggregations.<term>.buckets.key aggregations.<term>.buckets.key_as_string aggregations.<term>.buckets.doc_count aggregations.<term>.buckets.<nested_term>.* (recursive)

After the transformation, we’ll have:

facets.<term>._type facets.<term>.total facets.<term>.missing facets.<term>.other facets.<term>.terms.count facets.<term>.terms.term facets.<term>.terms.<nested_term>.* (recursive)

Note the first level key change doesn’t happen here.

transform_hit(path, doc, options)[source]

Transform an individual search hit result. By default add licenses for the configured fields.

If a source has a license url in its metadata, Add “_license” key to the corresponding fields. Support dot field representation field alias.

If we have the following settings in web_config.py

LICENSE_TRANSFORM = {

“exac_nontcga”: “exac”, “snpeff.ann”: “snpeff”

},

Then GET /v1/variant/chr6:g.38906659G>A should look like: {

“exac”: {

“_license”: “http://bit.ly/2H9c4hg”, “af”: 0.00002471},

“exac_nontcga”: {

“_license”: “http://bit.ly/2H9c4hg”, <— “af”: 0.00001883}, …

} And GET /v1/variant/chr14:g.35731936G>C could look like: {

“snpeff”: {

“_license”: “http://bit.ly/2suyRKt”, “ann”: [{“_license”: “http://bit.ly/2suyRKt”, <—

“effect”: “intron_variant”, “feature_id”: “NM_014672.3”, …}, {“_license”: “http://bit.ly/2suyRKt”, <— “effect”: “intron_variant”, “feature_id”: “NM_001256678.1”, …}, …]

}, …

}

The arrow marked fields would not exist without the setting lines.

transform_mapping(mapping, prefix=None, search=None)[source]

Transform Elasticsearch mapping definition to user-friendly field definitions metadata results.

trasform_jmespath(path: str, doc, options) None[source]

Transform any target field in doc using jmespath query syntax. The jmespath query parameter value should have the pattern of “<target_list_fieldname>|<jmespath_query_expression>” <target_list_fieldname> can be any sub-field of the input doc using dot notation, e.g. “aaa.bbb”.

If empty or “.”, it will be the root field.

The flexible jmespath syntax allows to filter/transform any nested objects in the input doc on the fly. The output of the jmespath transformation will then be used to replace the original target field value. .. rubric:: Examples

  • filtering an array sub-field

    jmespath=tags|[?name==`Metadata`] # filter tags array by name field jmespath=aaa.bbb|[?(sub_a==`val_a`||sub_a==`val_aa`)%26%26sub_b==`val_b`] # use %26%26 for &&

static traverse(obj, leaf_node=False)

Output path-dictionary pairs. For example, input: {

‘exac_nontcga’: {‘af’: 0.00001883}, ‘gnomad_exome’: {‘af’: {‘af’: 0.0000119429, ‘af_afr’: 0.000123077}}, ‘snpeff’: {‘ann’: [{‘effect’: ‘intron_variant’,

‘feature_id’: ‘NM_014672.3’}, {‘effect’: ‘intron_variant’, ‘feature_id’: ‘NM_001256678.1’}]}

} will be translated to a generator: (

(“exac_nontcga”, {“af”: 0.00001883}), (“gnomad_exome.af”, {“af”: 0.0000119429, “af_afr”: 0.000123077}), (“gnomad_exome”, {“af”: {“af”: 0.0000119429, “af_afr”: 0.000123077}}), (“snpeff.ann”, {“effect”: “intron_variant”, “feature_id”: “NM_014672.3”}), (“snpeff.ann”, {“effect”: “intron_variant”, “feature_id”: “NM_001256678.1”}), (“snpeff.ann”, [{ … },{ … }]), (“snpeff”, {“ann”: [{ … },{ … }]}), (‘’, {‘exac_nontcga’: {…}, ‘gnomad_exome’: {…}, ‘snpeff’: {…}})

) or when traversing leaf nodes: (

(‘exac_nontcga.af’, 0.00001883), (‘gnomad_exome.af.af’, 0.0000119429), (‘gnomad_exome.af.af_afr’, 0.000123077), (‘snpeff.ann.effect’, ‘intron_variant’), (‘snpeff.ann.feature_id’, ‘NM_014672.3’), (‘snpeff.ann.effect’, ‘intron_variant’), (‘snpeff.ann.feature_id’, ‘NM_001256678.1’)

)

class biothings.web.query.formatter.FormatterDict(dict=None, /, **kwargs)[source]

Bases: UserDict

collapse(key)[source]
exclude(keys)[source]
include(keys)[source]
wrap(key, kls)[source]
class biothings.web.query.formatter.Hits(dict=None, /, **kwargs)[source]

Bases: FormatterDict

{

“total”: … , “hits”: [

{ … }, { … }, …

]

}

class biothings.web.query.formatter.MongoResultFormatter[source]

Bases: ResultFormatter

transform(result, **options)[source]
class biothings.web.query.formatter.ResultFormatter[source]

Bases: object

transform(response)[source]
transform_mapping(mapping, prefix=None, search=None)[source]
exception biothings.web.query.formatter.ResultFormatterException[source]

Bases: Exception

class biothings.web.query.formatter.SQLResultFormatter[source]

Bases: ResultFormatter

transform(result, **options)[source]

biothings.web.query.pipeline

class biothings.web.query.pipeline.AsyncESQueryPipeline(builder, backend, formatter, **settings)[source]

Bases: QueryPipeline

async fetch(**kwargs)[source]
async search(**kwargs)[source]
class biothings.web.query.pipeline.ESQueryPipeline(builder=None, backend=None, formatter=None, *args, **kwargs)[source]

Bases: QueryPipeline

fetch(id, **options)[source]
search(q, **options)[source]
class biothings.web.query.pipeline.MongoQueryPipeline(builder, backend, formatter, **settings)[source]

Bases: QueryPipeline

class biothings.web.query.pipeline.QueryPipeline(builder, backend, formatter, **settings)[source]

Bases: object

fetch(id, **options)[source]
search(q, **options)[source]
exception biothings.web.query.pipeline.QueryPipelineException(code: int = 500, summary: str = '', details: object = None)[source]

Bases: Exception

code: int = 500
details: object = None
summary: str = ''
exception biothings.web.query.pipeline.QueryPipelineInterrupt(data)[source]

Bases: QueryPipelineException

class biothings.web.query.pipeline.SQLQueryPipeline(builder, backend, formatter, **settings)[source]

Bases: QueryPipeline

biothings.web.query.pipeline.capturesESExceptions(func)[source]