Web component¶
The BioThings SDK web component contains tools used to generate and customize an API, given an Elasticsearch index with data. The web component uses the Tornado Web Server to respond to incoming API requests.
Server boot script¶
A simple Biothings API implementation.
Process command line arguments to setup the API.
Add additional applicaion settings like handlers.
port
: the port to start the API on, default 8000debug
: start the API in debug mode, default Falseaddress
: the address to start the API on, default 0.0.0.0autoreload
: restart the server when file changes, default Falseconf
: choose an alternative setting, default configdir
: path to app directory. default: current working directory
-
index_base.
main
(app_settings=None, use_curl=False)¶ Start a Biothings API Server
- Parameters
app_handlers – additional web handlers to add to the app
app_settings – `Tornado application settings dictionary
<http://www.tornadoweb.org/en/stable/web.html#tornado.web.Application.settings>`_ :param use_curl: Overide the default simple_httpclient with curl_httpclient <https://www.tornadoweb.org/en/stable/httpclient.html>
Settings¶
Config module¶
BiothingWebSettings¶
-
class
biothings.web.settings.
BiothingWebSettings
(config=None, parent=None, **kwargs)[source]¶ A container for the settings that configure the web API.
Environment variables can override settings of the same names.
Default values are defined in biothings.web.settings.default.
- Parameters
config – a module that configures this biothing or its fully qualified name, or its module file path.
-
configure_logger
(logger)[source]¶ Configure a logger’s formatter to use the format defined in this web setting.
-
get_app
(settings=False, handlers=None)[source]¶ Return the tornado.web.Application defined by this settings. This is primarily how an HTTP server interacts with this class. Additional settings and handlers accepted as parameters.
-
static
load_class
(kls)[source]¶ Ensure config is a module. If config does not evaluate, Return default if it’s provided.
Handlers¶
BaseHandler¶
-
class
biothings.web.handlers.
BaseHandler
(application, request, **kwargs)[source]¶ Parent class of all handlers, only direct descendant of tornado.web.RequestHandler,
FrontPageHandler¶
BaseESRequestHandler¶
-
class
biothings.web.handlers.
BaseESRequestHandler
(application, request, **kwargs)[source]¶ -
initialize
(biothing_type=None)[source]¶ Hook for subclass initialization. Called for each request.
A dictionary passed as the third argument of a url spec will be supplied as keyword arguments to initialize().
Example:
class ProfileHandler(RequestHandler): def initialize(self, database): self.database = database def get(self, username): ... app = Application([ (r'/user/(.*)', ProfileHandler, dict(database=database)), ])
-
prepare
()[source]¶ Extract body and url query parameters into functional groups. Typify predefined user inputs patterns here. Rules:
Inputs are combined and then separated into functional catagories.
Duplicated query or body arguments will overwrite the previous value.
Extend to add more customizations.
-
BiothingHandler¶
-
class
biothings.web.handlers.
BiothingHandler
(application, request, **kwargs)[source]¶ Biothings Annotation Endpoint
URL pattern examples:
/{pre}/{ver}/{typ}/? /{pre}/{ver}/{typ}/([^/]+)/?
queries a term against a pre-determined field that represents the id of a document, like _id and dbsnp.rsid
GET -> {…} or [{…}, …] POST -> [{…}, …]
QueryHandler¶
-
class
biothings.web.handlers.
QueryHandler
(application, request, **kwargs)[source]¶ Biothings Query Endpoint
URL pattern examples:
/{pre}/{ver}/{typ}/query/? /{pre}/{ver}//query/?
GET -> {…} POST -> [{…}, …]
MetadataFieldHandler¶
MetadataSourceHandler¶
ESRequestHandler¶
-
class
biothings.web.handlers.
ESRequestHandler
(application, request, **kwargs)[source]¶ Default Implementation of ES Query Pipelines
-
pre_finish_hook
(options, res)[source]¶ Override this in subclasses. Could implement additional result translation.
-
pre_query_builder_hook
(options)[source]¶ Override this in subclasses. At this stage, we have the cleaned user input available. Might be a good place to implement input based tracking.
-
BaseAPIHandler¶
-
class
biothings.web.handlers.
BaseAPIHandler
(application, request, **kwargs)[source]¶ -
initialize
()[source]¶ Hook for subclass initialization. Called for each request.
A dictionary passed as the third argument of a url spec will be supplied as keyword arguments to initialize().
Example:
class ProfileHandler(RequestHandler): def initialize(self, database): self.database = database def get(self, username): ... app = Application([ (r'/user/(.*)', ProfileHandler, dict(database=database)), ])
-
prepare
()[source]¶ Extract body and url query parameters into functional groups. Typify predefined user inputs patterns here. Rules:
Inputs are combined and then separated into functional catagories.
Duplicated query or body arguments will overwrite the previous value.
Extend to add more customizations.
-
set_default_headers
()[source]¶ Override this to set HTTP headers at the beginning of the request.
For example, this is the place to set a custom
Server
header. Note that setting such headers in the normal flow of request processing may not do what you want, since headers may be reset during error handling.
-
write_error
(status_code, **kwargs)[source]¶ Override to implement custom error pages.
write_error
may call write, render, set_header, etc to produce output as usual.If this error was caused by an uncaught exception (including HTTPError), an
exc_info
triple will be available askwargs["exc_info"]
. Note that this exception may not be the “current” exception for purposes of methods likesys.exc_info()
ortraceback.format_exc
.
-
Pipeline¶
Elasticsearch Query Builder¶
-
class
biothings.web.pipeline.
ESQueryBuilder
(web_settings)[source]¶ Build an Elasticsearch query with elasticsearch-dsl
-
build
(q, options)[source]¶ Build a query according to q and options. This is the public method called by API handlers.
Options:
q: string query or queries scopes: fields to query q(s)
_source: fields to return size: maximum number of hits to return from: starting index of result list to return sort: customized sort keys for result list explain: include es scoring information userquery: customized function to interpret q regexs: substitution groups to infer scopes
aggs: customized aggregation string facet_size: maximum number of agg results
additional es keywords are passed through for example: ‘explain’, ‘version’ …
-
Elasticsearch Query Execution¶
-
class
biothings.web.pipeline.
ESQueryBackend
(web_settings)[source]¶ Execute an Elasticsearch query
-
async
execute
(query, options)[source]¶ Execute the corresponding query. Must return an awaitable. May override to add more. Handle uncaught exceptions.
- Options:
Required: either an es-dsl query object or scroll_id Optional:
fetch_all: also return a scroll_id for this query (default: false) biothing_type: which type’s corresponding indices to query (default in config.py)
-
async
Elasticsearch Result Transformer¶
-
class
biothings.web.pipeline.
ESResultTransform
(web_settings)[source]¶ Class to transform the results of the Elasticsearch query generated prior in the pipeline. This contains the functions to extract the final document from the elasticsearch query result in `Elasticsearch Query`_. This also contains the code to flatten a document etc.
-
static
option_allow_null
(path, obj, fields)[source]¶ The specified fields should be set to None if it does not exist. When flattened, the field could be converted to an empty list.
-
static
option_always_list
(path, obj, fields)[source]¶ The specified fields, if exist, should be set to a list type. None converts to an emtpy list [] instead of [None].
-
transform
(response, options)[source]¶ Transform the query response to a user-friendly structure.
- Options:
dotfield: flatten a dictionary using dotfield notation _sorted: sort keys alaphabetically in ascending order always_list: ensure the fields specified are lists or wrapped in a list allow_null: ensure the fields specified are present in the result,
the fields may be provided as type None or [].
- biothing_type: result document type to apply customized transformation.
for example, add license field basing on document type’s metadata.
# only related to multiqueries template: base dict for every result, for example: {“success”: true} templates: a different base for every result, replaces the setting above template_hit: a dict to update every positive hit result, default: {“found”: true} template_miss: a dict to update every query with no hit, default: {“found”: false}
-
transform_aggs
(res)[source]¶ Transform the aggregations field and make it more presentable. For example, these are the fields of a two level nested aggregations:
aggregations.<term>.doc_count_error_upper_bound aggregations.<term>.sum_other_doc_count aggregations.<term>.buckets.key aggregations.<term>.buckets.key_as_string aggregations.<term>.buckets.doc_count aggregations.<term>.buckets.<nested_term>.* (recursive)
After the transformation, we’ll have:
facets.<term>._type facets.<term>.total facets.<term>.missing facets.<term>.other facets.<term>.terms.count facets.<term>.terms.term facets.<term>.terms.<nested_term>.* (recursive)
Note the first level key change doesn’t happen here.
-
transform_hit
(path, doc, options)[source]¶ By default add licenses
If a source has a license url in its metadata, Add “_license” key to the corresponding fields. Support dot field representation field alias.
If we have the following settings in web_config.py
- LICENSE_TRANSFORM = {
“exac_nontcga”: “exac”, “snpeff.ann”: “snpeff”
},
Then GET /v1/variant/chr6:g.38906659G>A should look like: {
- “exac”: {
“_license”: “http://bit.ly/2H9c4hg”, “af”: 0.00002471},
- “exac_nontcga”: {
“_license”: “http://bit.ly/2H9c4hg”, <— “af”: 0.00001883}, …
} And GET /v1/variant/chr14:g.35731936G>C could look like: {
- “snpeff”: {
“_license”: “http://bit.ly/2suyRKt”, “ann”: [{“_license”: “http://bit.ly/2suyRKt”, <—
“effect”: “intron_variant”, “feature_id”: “NM_014672.3”, …}, {“_license”: “http://bit.ly/2suyRKt”, <— “effect”: “intron_variant”, “feature_id”: “NM_001256678.1”, …}, …]
}, …
}
The arrow marked fields would not exist without the setting lines.
-
transform_mapping
(mapping, prefix, search)[source]¶ Transform Elasticsearch mapping definition to user-friendly field definitions metadata result
-
static
traverse
(obj, leaf_node=False)¶ Output path-dictionary pairs. For example, input: {
‘exac_nontcga’: {‘af’: 0.00001883}, ‘gnomad_exome’: {‘af’: {‘af’: 0.0000119429, ‘af_afr’: 0.000123077}}, ‘snpeff’: {‘ann’: [{‘effect’: ‘intron_variant’,
‘feature_id’: ‘NM_014672.3’}, {‘effect’: ‘intron_variant’, ‘feature_id’: ‘NM_001256678.1’}]}
} will be translated to a generator: (
(“exac_nontcga”, {“af”: 0.00001883}), (“gnomad_exome.af”, {“af”: 0.0000119429, “af_afr”: 0.000123077}), (“gnomad_exome”, {“af”: {“af”: 0.0000119429, “af_afr”: 0.000123077}}), (“snpeff.ann”, {“effect”: “intron_variant”, “feature_id”: “NM_014672.3”}), (“snpeff.ann”, {“effect”: “intron_variant”, “feature_id”: “NM_001256678.1”}), (“snpeff.ann”, [{ … },{ … }]), (“snpeff”, {“ann”: [{ … },{ … }]}), (‘’, {‘exac_nontcga’: {…}, ‘gnomad_exome’: {…}, ‘snpeff’: {…}})
) or when traversing leaf nodes: (
(‘exac_nontcga.af’, 0.00001883), (‘gnomad_exome.af.af’, 0.0000119429), (‘gnomad_exome.af.af_afr’, 0.000123077), (‘snpeff.ann.effect’, ‘intron_variant’), (‘snpeff.ann.feature_id’, ‘NM_014672.3’), (‘snpeff.ann.effect’, ‘intron_variant’), (‘snpeff.ann.feature_id’, ‘NM_001256678.1’)
)
-
static