biothings.web.query

biothings.web.query.builder

Biothings Query Builder

Turn the biothings query language to that of the database. The interface contains a query term (q) and query options.

Depending on the underlying database choice, the data type of the query term and query options vary. At a minimum, a query builder should support:

q: str, a query term,: when not provided, always perform a match all query. when provided as an empty string, always match none.

options: dotdict, optional query options.

scopes: list[str], the fields to look for the query term.
the meaning of scopes being an empty list or a None object/not provided is controlled by specific class implementations or not defined.

_source: list[str], fields to return in the result. size: int, maximum number of hits to return. from_: int, starting index of result to return. sort: str, customized sort keys for result list

aggs: str, customized aggregation string. post_filter: str, when provided, the search hits are filtered after the aggregations are calculated. facet_size: int, maximum number of agg results.

class biothings.web.query.builder.ESQueryBuilder(user_query=None, scopes_regexs=(), scopes_default=('_id',), allow_random_query=True, allow_nested_query=False, metadata=None)[source]

Bases: object

Build an Elasticsearch query with elasticsearch-dsl.

apply_extras(search, options)[source]: Process non-query options and customize their behaviors. Customized aggregation syntax string is translated here.

build(q=None, **options)[source]

Build a query according to q and options. This is the public method called by API handlers.

Regarding scopes:: scopes: [str] nonempty, match query. scopes: NoneType, or [], no scope, so query string query.
Additionally support these options:: explain: include es scoring information userquery: customized function to interpret q

additional keywords are passed through as es keywords
for example: ‘explain’, ‘version’ …
multi-search is supported when q is a list. all queries
are built individually and then sent in one request.

default_match_query(q, scopes, options)[source]: Override this to customize default match query. By default it implements a multi_match query.

default_string_query(q, options)[source]: Override this to customize default string query. By default it implements a query string query.

class biothings.web.query.builder.ESScrollID(seq: object)[source]: Bases: UserString

class biothings.web.query.builder.ESUserQuery(path)[source]

Bases: object

get_filter(named_query)[source]

get_query(named_query, **kwargs)[source]

has_filter(named_query)[source]

has_query(named_query)[source]

property logger

class biothings.web.query.builder.Group(term, scopes)

Bases: tuple

Create new instance of Group(term, scopes)

scopes: Alias for field number 1

term: Alias for field number 0

class biothings.web.query.builder.MongoQueryBuilder(default_scopes=('_id',))[source]

Bases: object

build(q, **options)[source]

class biothings.web.query.builder.QStringParser(default_scopes=('_id',), patterns=(('(?P<scope>\\w+):(?P<term>[^:]+)', ()),), gpnames=('term', 'scope'))[source]

Bases: object

parse(q)[source]

class biothings.web.query.builder.Query(term, scopes)

Bases: tuple

Create new instance of Query(term, scopes)

scopes: Alias for field number 1

term: Alias for field number 0

exception biothings.web.query.builder.RawQueryInterrupt(data)[source]: Bases: Exception

class biothings.web.query.builder.SQLQueryBuilder(tables, default_scopes=('id',), default_limit=10)[source]

Bases: object

build(q, **options)[source]

biothings.web.query.engine

Search Execution Engine

Take the output of the query builder and feed to the corresponding database engine. This stage typically resolves the db destination from a biothing_type and applies presentation and/or networking parameters.

Example:

>>> from biothings.web.query import ESQueryBackend
>>> from elasticsearch import Elasticsearch
>>> from elasticsearch_dsl import Search

>>> backend = ESQueryBackend(Elasticsearch())
>>> backend.execute(Search().query("match", _id="1017"))

>>> _["hits"]["hits"][0]["_source"].keys()
dict_keys(['taxid', 'symbol', 'name', ... ])

class biothings.web.query.engine.AsyncESQueryBackend(client, indices=None, scroll_time='1m', scroll_size=1000, multisearch_concurrency=5, total_hits_as_int=True)[source]

Bases: ESQueryBackend

Execute an Elasticsearch query

async execute(query, **options)[source]

Execute the corresponding query. Must return an awaitable. May override to add more. Handle uncaught exceptions.

Options:: fetch_all: also return a scroll_id for this query (default: false) biothing_type: which type’s corresponding indices to query (default in config.py)

class biothings.web.query.engine.ESQueryBackend(client, indices=None)[source]

Bases: object

adjust_index(original_index, query, **options)[source]: Override to get specific ES index.

execute(query, **options)[source]

exception biothings.web.query.engine.EndScrollInterrupt[source]: Bases: ResultInterrupt

class biothings.web.query.engine.MongoQueryBackend(client, collections)[source]

Bases: object

execute(query, **options)[source]

exception biothings.web.query.engine.RawResultInterrupt(data)[source]: Bases: ResultInterrupt

exception biothings.web.query.engine.ResultInterrupt(data)[source]: Bases: Exception

class biothings.web.query.engine.SQLQueryBackend(client)[source]

Bases: object

execute(query, **options)[source]

biothings.web.query.formatter

Search Result Formatter

Transform the raw query result into consumption-friendly structures by possibly removing from, adding to, and/or flattening the raw response from the database engine for one or more individual queries.

class biothings.web.query.formatter.Doc(dict=None, /, **kwargs)[source]

Bases: FormatterDict

{: “_id”: … , “_score”: … , …

}

class biothings.web.query.formatter.ESResultFormatter(licenses=None, license_transform=None, field_notes=None, excluded_keys=())[source]

Bases: ResultFormatter

Class to transform the results of the Elasticsearch query generated prior in the pipeline. This contains the functions to extract the final document from the elasticsearch query result in `Elasticsearch Query`_. This also contains the code to flatten a document etc.

transform(response, **options)[source]

Transform the query response to a user-friendly structure. Mainly deconstruct the elasticsearch response structure and hand over to transform_doc to apply the options below.

Options:

# generic transformations for dictionaries # —————————————— dotfield: flatten a dictionary using dotfield notation _sorted: sort keys alaphabetically in ascending order always_list: ensure the fields specified are lists or wrapped in a list allow_null: ensure the fields specified are present in the result,

the fields may be provided as type None or [].

# additional multisearch result transformations # ———————————————— template: base dict for every result, for example: {“success”: true} templates: a different base for every result, replaces the setting above template_hit: a dict to update every positive hit result, default: {“found”: true} template_miss: a dict to update every query with no hit, default: {“found”: false}

# document format and content management # ————————————— biothing_type: result document type to apply customized transformation.

for example, add license field basing on document type’s metadata.

one: return the individual document if there’s only one hit. ignore this setting: if there are multiple hits. return None if there is no hit. this option is not effective when aggregation results are also returned in the same query.

native: bool, if the returned result is in python primitive types. version: bool, if _version field is kept. score: bool, if _score field is kept. with_total: bool, if True, the response will include max_total documents,

and a message to tell how many query terms return greater than the max_size of hits. The default is False. An example when with_total is True: {

‘max_total’: 100, ‘msg’: ‘12 query terms return > 1000 hits, using from=1000 to retrieve the remaining hits’, ‘hits’: […]

}

transform_aggs(res)[source]

Transform the aggregations field and make it more presentable. For example, these are the fields of a two level nested aggregations:

aggregations.<term>.doc_count_error_upper_bound aggregations.<term>.sum_other_doc_count aggregations.<term>.buckets.key aggregations.<term>.buckets.key_as_string aggregations.<term>.buckets.doc_count aggregations.<term>.buckets.<nested_term>.* (recursive)

After the transformation, we’ll have:

facets.<term>._type facets.<term>.total facets.<term>.missing facets.<term>.other facets.<term>.terms.count facets.<term>.terms.term facets.<term>.terms.<nested_term>.* (recursive)

Note the first level key change doesn’t happen here.

transform_hit(path, doc, options)[source]

Transform an individual search hit result. By default add licenses for the configured fields.

If a source has a license url in its metadata, Add “_license” key to the corresponding fields. Support dot field representation field alias.

If we have the following settings in web_config.py

LICENSE_TRANSFORM = {: “exac_nontcga”: “exac”, “snpeff.ann”: “snpeff”

},

Then GET /v1/variant/chr6:g.38906659G>A should look like: {

“exac”: {
“_license”: “http://bit.ly/2H9c4hg”, “af”: 0.00002471},

“exac_nontcga”: {
“_license”: “http://bit.ly/2H9c4hg”, <— “af”: 0.00001883}, …

} And GET /v1/variant/chr14:g.35731936G>C could look like: {

“snpeff”: {
“_license”: “http://bit.ly/2suyRKt”, “ann”: [{“_license”: “http://bit.ly/2suyRKt”, <—

“effect”: “intron_variant”, “feature_id”: “NM_014672.3”, …}, {“_license”: “http://bit.ly/2suyRKt”, <— “effect”: “intron_variant”, “feature_id”: “NM_001256678.1”, …}, …]

}, …

}

The arrow marked fields would not exist without the setting lines.

transform_mapping(mapping, prefix=None, search=None)[source]: Transform Elasticsearch mapping definition to user-friendly field definitions metadata results.

trasform_jmespath(path: str, doc, options) → None[source]

Transform any target field in doc using jmespath query syntax. The jmespath query parameter value should have the pattern of “<target_list_fieldname>|<jmespath_query_expression>” <target_list_fieldname> can be any sub-field of the input doc using dot notation, e.g. “aaa.bbb”.

If empty or “.”, it will be the root field.

The flexible jmespath syntax allows to filter/transform any nested objects in the input doc on the fly. The output of the jmespath transformation will then be used to replace the original target field value. .. rubric:: Examples

filtering an array sub-field
jmespath=tags|[?name==`Metadata`] # filter tags array by name field jmespath=aaa.bbb|[?(sub_a==`val_a`||sub_a==`val_aa`)%26%26sub_b==`val_b`] # use %26%26 for &&

static traverse(obj, leaf_node=False)

Output path-dictionary pairs. For example, input: {

‘exac_nontcga’: {‘af’: 0.00001883}, ‘gnomad_exome’: {‘af’: {‘af’: 0.0000119429, ‘af_afr’: 0.000123077}}, ‘snpeff’: {‘ann’: [{‘effect’: ‘intron_variant’,

‘feature_id’: ‘NM_014672.3’}, {‘effect’: ‘intron_variant’, ‘feature_id’: ‘NM_001256678.1’}]}

} will be translated to a generator: (

(“exac_nontcga”, {“af”: 0.00001883}), (“gnomad_exome.af”, {“af”: 0.0000119429, “af_afr”: 0.000123077}), (“gnomad_exome”, {“af”: {“af”: 0.0000119429, “af_afr”: 0.000123077}}), (“snpeff.ann”, {“effect”: “intron_variant”, “feature_id”: “NM_014672.3”}), (“snpeff.ann”, {“effect”: “intron_variant”, “feature_id”: “NM_001256678.1”}), (“snpeff.ann”, [{ … },{ … }]), (“snpeff”, {“ann”: [{ … },{ … }]}), (‘’, {‘exac_nontcga’: {…}, ‘gnomad_exome’: {…}, ‘snpeff’: {…}})

) or when traversing leaf nodes: (

(‘exac_nontcga.af’, 0.00001883), (‘gnomad_exome.af.af’, 0.0000119429), (‘gnomad_exome.af.af_afr’, 0.000123077), (‘snpeff.ann.effect’, ‘intron_variant’), (‘snpeff.ann.feature_id’, ‘NM_014672.3’), (‘snpeff.ann.effect’, ‘intron_variant’), (‘snpeff.ann.feature_id’, ‘NM_001256678.1’)

)

class biothings.web.query.formatter.FormatterDict(dict=None, /, **kwargs)[source]

Bases: UserDict

collapse(key)[source]

exclude(keys)[source]

include(keys)[source]

wrap(key, kls)[source]

class biothings.web.query.formatter.Hits(dict=None, /, **kwargs)[source]

Bases: FormatterDict

{

“total”: … , “hits”: [

{ … }, { … }, …

]

}

class biothings.web.query.formatter.MongoResultFormatter[source]

Bases: ResultFormatter

transform(result, **options)[source]

class biothings.web.query.formatter.ResultFormatter[source]

Bases: object

transform(response)[source]

transform_mapping(mapping, prefix=None, search=None)[source]

exception biothings.web.query.formatter.ResultFormatterException[source]: Bases: Exception

class biothings.web.query.formatter.SQLResultFormatter[source]

Bases: ResultFormatter

transform(result, **options)[source]

biothings.web.query.pipeline

class biothings.web.query.pipeline.AsyncESQueryPipeline(builder, backend, formatter, **settings)[source]

Bases: QueryPipeline

async fetch(**kwargs)[source]

async search(**kwargs)[source]

class biothings.web.query.pipeline.ESQueryPipeline(builder=None, backend=None, formatter=None, *args, **kwargs)[source]

Bases: QueryPipeline

fetch(id, **options)[source]

search(q, **options)[source]

class biothings.web.query.pipeline.MongoQueryPipeline(builder, backend, formatter, **settings)[source]: Bases: QueryPipeline

class biothings.web.query.pipeline.QueryPipeline(builder, backend, formatter, **settings)[source]

Bases: object

fetch(id, **options)[source]

search(q, **options)[source]

exception biothings.web.query.pipeline.QueryPipelineException(code: int = 500, summary: str = '', details: object = None)[source]

Bases: Exception

code: int = 500

details: object = None

summary: str = ''

exception biothings.web.query.pipeline.QueryPipelineInterrupt(data)[source]: Bases: QueryPipelineException

class biothings.web.query.pipeline.SQLQueryPipeline(builder, backend, formatter, **settings)[source]: Bases: QueryPipeline

biothings.web.query.pipeline.capturesESExceptions(func)[source]