Web component

The BioThings SDK web component contains tools used to generate and customize an API, given an Elasticsearch index with data. The web component uses the Tornado Web Server to respond to incoming API requests.

Server boot script

A simple Biothings API implementation.

  • Process command line arguments to setup the API.

  • Add additional applicaion settings like handlers.

  • port: the port to start the API on, default 8000

  • debug: start the API in debug mode, default False

  • address: the address to start the API on, default 0.0.0.0

  • autoreload: restart the server when file changes, default False

  • conf: choose an alternative setting, default config

  • dir: path to app directory. default: current working directory

index_base.main(app_settings=None, use_curl=False)

Start a Biothings API Server

Parameters
  • app_handlers – additional web handlers to add to the app

  • app_settings`Tornado application settings dictionary

<http://www.tornadoweb.org/en/stable/web.html#tornado.web.Application.settings>`_ :param use_curl: Overide the default simple_httpclient with curl_httpclient <https://www.tornadoweb.org/en/stable/httpclient.html>

Settings

Config module

BiothingWebSettings

class biothings.web.settings.BiothingWebSettings(config=None, parent=None, **kwargs)[source]

A container for the settings that configure the web API.

  • Environment variables can override settings of the same names.

  • Default values are defined in biothings.web.settings.default.

Parameters

config – a module that configures this biothing or its fully qualified name, or its module file path.

configure_logger(logger)[source]

Configure a logger’s formatter to use the format defined in this web setting.

get_app(settings=False, handlers=None)[source]

Return the tornado.web.Application defined by this settings. This is primarily how an HTTP server interacts with this class. Additional settings and handlers accepted as parameters.

static load_class(kls)[source]

Ensure config is a module. If config does not evaluate, Return default if it’s provided.

static load_module(config, default=None)[source]

Ensure config is a module. If config does not evaluate, Return default if it’s provided.

validate()[source]

Validate the settings defined for this web server.

BiothingESWebSettings

class biothings.web.settings.BiothingESWebSettings(config=None, parent=None, **kwargs)[source]

With additional settings pecific to an elasticsearch backend.

Parameters

config – a module that configures this biothing

or its fully qualified name, or its module file path.

validate()[source]

Additional ES settings to validate.

Handlers

BaseHandler

class biothings.web.handlers.BaseHandler(application, request, **kwargs)[source]

Parent class of all handlers, only direct descendant of tornado.web.RequestHandler,

data_received(chunk)[source]

Implement this method to handle streamed request data.

get_sentry_client()[source]

Override default and retrieve from tornado setting instead.

get_template_path()[source]

Override to customize template path for each handler.

By default, we use the template_path application setting. Return None to load templates relative to the calling file.

log_exception(*args, **kwargs)[source]

Only attempt to report to Sentry when the client is setup. Discard when API key is not set or raven is not installed.

FrontPageHandler

class biothings.web.handlers.FrontPageHandler(application, request, **kwargs)[source]

StatusHandler

class biothings.web.handlers.StatusHandler(application, request, **kwargs)[source]

Handles requests to check the status of the server. Use set_status instead of raising exception so that no error will be propogated to sentry monitoring.

BaseESRequestHandler

class biothings.web.handlers.BaseESRequestHandler(application, request, **kwargs)[source]
initialize(biothing_type=None)[source]

Hook for subclass initialization. Called for each request.

A dictionary passed as the third argument of a url spec will be supplied as keyword arguments to initialize().

Example:

class ProfileHandler(RequestHandler):
    def initialize(self, database):
        self.database = database

    def get(self, username):
        ...

app = Application([
    (r'/user/(.*)', ProfileHandler, dict(database=database)),
    ])
parse_exception(exception)[source]

Return customized error message basing on exception types.

prepare()[source]

Extract body and url query parameters into functional groups. Typify predefined user inputs patterns here. Rules:

  • Inputs are combined and then separated into functional catagories.

  • Duplicated query or body arguments will overwrite the previous value.

Extend to add more customizations.

BiothingHandler

class biothings.web.handlers.BiothingHandler(application, request, **kwargs)[source]

Biothings Annotation Endpoint

URL pattern examples:

/{pre}/{ver}/{typ}/? /{pre}/{ver}/{typ}/([^/]+)/?

queries a term against a pre-determined field that represents the id of a document, like _id and dbsnp.rsid

GET -> {…} or [{…}, …] POST -> [{…}, …]

pre_finish_hook(options, res)[source]

Empty result in GET triggers 404. Keep _version, discard _score field.

pre_query_builder_hook(options)[source]

Annotation query has default scopes. Annotation query include _version field.

QueryHandler

class biothings.web.handlers.QueryHandler(application, request, **kwargs)[source]

Biothings Query Endpoint

URL pattern examples:

/{pre}/{ver}/{typ}/query/? /{pre}/{ver}//query/?

GET -> {…} POST -> [{…}, …]

pre_finish_hook(options, res)[source]

Override this in subclasses. Could implement additional result translation.

pre_query_builder_hook(options)[source]

Override this in subclasses. At this stage, we have the cleaned user input available. Might be a good place to implement input based tracking.

MetadataFieldHandler

class biothings.web.handlers.MetadataFieldHandler(application, request, **kwargs)[source]

GET /metadata/fields

MetadataSourceHandler

class biothings.web.handlers.MetadataSourceHandler(application, request, **kwargs)[source]

GET /metadata

extras(_meta)[source]

Override to add app specific metadata

ESRequestHandler

class biothings.web.handlers.ESRequestHandler(application, request, **kwargs)[source]

Default Implementation of ES Query Pipelines

pre_finish_hook(options, res)[source]

Override this in subclasses. Could implement additional result translation.

pre_query_builder_hook(options)[source]

Override this in subclasses. At this stage, we have the cleaned user input available. Might be a good place to implement input based tracking.

pre_query_hook(options, query)[source]

Override this in subclasses. By default, return raw query, if requested. Might want to persist this behavior by calling super().

pre_transform_hook(options, res)[source]

Override this in subclasses. By default, return query response, if requested. Might want to persist this behavior by calling super().

BaseAPIHandler

class biothings.web.handlers.BaseAPIHandler(application, request, **kwargs)[source]
initialize()[source]

Hook for subclass initialization. Called for each request.

A dictionary passed as the third argument of a url spec will be supplied as keyword arguments to initialize().

Example:

class ProfileHandler(RequestHandler):
    def initialize(self, database):
        self.database = database

    def get(self, username):
        ...

app = Application([
    (r'/user/(.*)', ProfileHandler, dict(database=database)),
    ])
on_finish()[source]

This is a tornado lifecycle hook. Override to provide tracking features.

parse_exception(exception)[source]

Return customized error message basing on exception types.

prepare()[source]

Extract body and url query parameters into functional groups. Typify predefined user inputs patterns here. Rules:

  • Inputs are combined and then separated into functional catagories.

  • Duplicated query or body arguments will overwrite the previous value.

Extend to add more customizations.

set_default_headers()[source]

Override this to set HTTP headers at the beginning of the request.

For example, this is the place to set a custom Server header. Note that setting such headers in the normal flow of request processing may not do what you want, since headers may be reset during error handling.

write(chunk)[source]

Override to write output basing on the specified format.

write_error(status_code, **kwargs)[source]

Override to implement custom error pages.

write_error may call write, render, set_header, etc to produce output as usual.

If this error was caused by an uncaught exception (including HTTPError), an exc_info triple will be available as kwargs["exc_info"]. Note that this exception may not be the “current” exception for purposes of methods like sys.exc_info() or traceback.format_exc.

APISpecificationHandler

class biothings.web.handlers.APISpecificationHandler(application, request, **kwargs)[source]

Pipeline

Elasticsearch Query Builder

class biothings.web.pipeline.ESQueryBuilder(web_settings)[source]

Build an Elasticsearch query with elasticsearch-dsl

build(q, options)[source]

Build a query according to q and options. This is the public method called by API handlers.

Options:

q: string query or queries scopes: fields to query q(s)

_source: fields to return size: maximum number of hits to return from: starting index of result list to return sort: customized sort keys for result list explain: include es scoring information userquery: customized function to interpret q regexs: substitution groups to infer scopes

aggs: customized aggregation string facet_size: maximum number of agg results

  • additional es keywords are passed through for example: ‘explain’, ‘version’ …

default_match_query(q, scopes, options)[source]

Override this to customize default match query. By default it implements a multi_match query.

default_string_query(q, options)[source]

Override this to customize default string query. By default it implements a query string query.

Elasticsearch Query Execution

class biothings.web.pipeline.ESQueryBackend(web_settings)[source]

Execute an Elasticsearch query

async execute(query, options)[source]

Execute the corresponding query. Must return an awaitable. May override to add more. Handle uncaught exceptions.

Options:

Required: either an es-dsl query object or scroll_id Optional:

fetch_all: also return a scroll_id for this query (default: false) biothing_type: which type’s corresponding indices to query (default in config.py)

Elasticsearch Result Transformer

class biothings.web.pipeline.ESResultTransform(web_settings)[source]

Class to transform the results of the Elasticsearch query generated prior in the pipeline. This contains the functions to extract the final document from the elasticsearch query result in `Elasticsearch Query`_. This also contains the code to flatten a document etc.

static option_allow_null(path, obj, fields)[source]

The specified fields should be set to None if it does not exist. When flattened, the field could be converted to an empty list.

static option_always_list(path, obj, fields)[source]

The specified fields, if exist, should be set to a list type. None converts to an emtpy list [] instead of [None].

classmethod option_dotfield(dic, options)[source]

Flatten a dictionary. #TODO examples

static option_sorted(_, obj)[source]

Sort a container in-place.

transform(response, options)[source]

Transform the query response to a user-friendly structure.

Options:

dotfield: flatten a dictionary using dotfield notation _sorted: sort keys alaphabetically in ascending order always_list: ensure the fields specified are lists or wrapped in a list allow_null: ensure the fields specified are present in the result,

the fields may be provided as type None or [].

biothing_type: result document type to apply customized transformation.

for example, add license field basing on document type’s metadata.

# only related to multiqueries template: base dict for every result, for example: {“success”: true} templates: a different base for every result, replaces the setting above template_hit: a dict to update every positive hit result, default: {“found”: true} template_miss: a dict to update every query with no hit, default: {“found”: false}

transform_aggs(res)[source]

Transform the aggregations field and make it more presentable. For example, these are the fields of a two level nested aggregations:

aggregations.<term>.doc_count_error_upper_bound aggregations.<term>.sum_other_doc_count aggregations.<term>.buckets.key aggregations.<term>.buckets.key_as_string aggregations.<term>.buckets.doc_count aggregations.<term>.buckets.<nested_term>.* (recursive)

After the transformation, we’ll have:

facets.<term>._type facets.<term>.total facets.<term>.missing facets.<term>.other facets.<term>.terms.count facets.<term>.terms.term facets.<term>.terms.<nested_term>.* (recursive)

Note the first level key change doesn’t happen here.

transform_hit(path, doc, options)[source]

By default add licenses

If a source has a license url in its metadata, Add “_license” key to the corresponding fields. Support dot field representation field alias.

If we have the following settings in web_config.py

LICENSE_TRANSFORM = {

“exac_nontcga”: “exac”, “snpeff.ann”: “snpeff”

},

Then GET /v1/variant/chr6:g.38906659G>A should look like: {

“exac”: {

“_license”: “http://bit.ly/2H9c4hg”, “af”: 0.00002471},

“exac_nontcga”: {

“_license”: “http://bit.ly/2H9c4hg”, <— “af”: 0.00001883}, …

} And GET /v1/variant/chr14:g.35731936G>C could look like: {

“snpeff”: {

“_license”: “http://bit.ly/2suyRKt”, “ann”: [{“_license”: “http://bit.ly/2suyRKt”, <—

“effect”: “intron_variant”, “feature_id”: “NM_014672.3”, …}, {“_license”: “http://bit.ly/2suyRKt”, <— “effect”: “intron_variant”, “feature_id”: “NM_001256678.1”, …}, …]

}, …

}

The arrow marked fields would not exist without the setting lines.

transform_mapping(mapping, prefix, search)[source]

Transform Elasticsearch mapping definition to user-friendly field definitions metadata result

static traverse(obj, leaf_node=False)

Output path-dictionary pairs. For example, input: {

‘exac_nontcga’: {‘af’: 0.00001883}, ‘gnomad_exome’: {‘af’: {‘af’: 0.0000119429, ‘af_afr’: 0.000123077}}, ‘snpeff’: {‘ann’: [{‘effect’: ‘intron_variant’,

‘feature_id’: ‘NM_014672.3’}, {‘effect’: ‘intron_variant’, ‘feature_id’: ‘NM_001256678.1’}]}

} will be translated to a generator: (

(“exac_nontcga”, {“af”: 0.00001883}), (“gnomad_exome.af”, {“af”: 0.0000119429, “af_afr”: 0.000123077}), (“gnomad_exome”, {“af”: {“af”: 0.0000119429, “af_afr”: 0.000123077}}), (“snpeff.ann”, {“effect”: “intron_variant”, “feature_id”: “NM_014672.3”}), (“snpeff.ann”, {“effect”: “intron_variant”, “feature_id”: “NM_001256678.1”}), (“snpeff.ann”, [{ … },{ … }]), (“snpeff”, {“ann”: [{ … },{ … }]}), (‘’, {‘exac_nontcga’: {…}, ‘gnomad_exome’: {…}, ‘snpeff’: {…}})

) or when traversing leaf nodes: (

(‘exac_nontcga.af’, 0.00001883), (‘gnomad_exome.af.af’, 0.0000119429), (‘gnomad_exome.af.af_afr’, 0.000123077), (‘snpeff.ann.effect’, ‘intron_variant’), (‘snpeff.ann.feature_id’, ‘NM_014672.3’), (‘snpeff.ann.effect’, ‘intron_variant’), (‘snpeff.ann.feature_id’, ‘NM_001256678.1’)

)