biothings.cli
Entrypoint for the biothings-cli tool
- biothings.cli.check_module_import_status(module: str) bool[source]
Verify that we can import a module prior to proceeding with creating our commandline tooling that depends on those modules
- biothings.cli.main()[source]
The entrypoint for running the BioThings CLI to test your local data plugin
biothings.cli.dataplugin
Module for creating the cli interface for the dataplugin interface
- biothings.cli.dataplugin.clean_data(plugin_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a35133ed0>]=None, dump: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a34e98050>]=False, upload: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a34e98190>]=False, clean_all: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a34e982d0>]=False)[source]
Delete all dumped files and/or drop uploaded sources tables
- biothings.cli.dataplugin.create_data_plugin(name: ~typing.Annotated[str, <typer.models.OptionInfo object at 0x734a351320d0>], multi_uploaders: ~typing.Annotated[bool, <typer.models.OptionInfo object at 0x734a35131e50>] = False, parallelizer: ~typing.Annotated[bool, <typer.models.OptionInfo object at 0x734a35132210>] = False)[source]
Create a new data plugin from a pre-defined template
- biothings.cli.dataplugin.dump_and_upload(plugin_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a35132d50>]=None)[source]
Sequentially execute the dump and upload commands
Operation Order: 1) downloads source data files to local file system 2) converts them into JSON documents 3) uploads those JSON documents to the source database.
- biothings.cli.dataplugin.dump_source(plugin_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a35132350>]=None, show_dump: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a35132490>]=True)[source]
Download the source data files to the local file system
- biothings.cli.dataplugin.index_plugin(plugin_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a34e98410>]=None, sub_source_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a34e98550>]=None)[source]
[red][bold](experimental)[/bold][/red] Create an elaticsearch index from a data source database
Our quick-index function that provides a way for quickly creating an elasticsearch index from a source backend
We currently only support converting between MongoDB -> Elasticsearch for indexing
[green]NOTE[/green] Only works correctly if the upload command has been run
- biothings.cli.dataplugin.inspect_source(plugin_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a35133390>]=None, sub_source_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a351334d0>]='', mode: Annotated[str | None, <typer.models.OptionInfo object at 0x734a35133610>]='type, stats', limit: Annotated[int | None, <typer.models.OptionInfo object at 0x734a35133750>]=None, merge: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a35133890>]=False, output: Annotated[str | None, <typer.models.OptionInfo object at 0x734a351339d0>]=None)[source]
Derive detailed information about the document data structure from the parsed documents
[green]NOTE[/green] Only works correctly if the upload command has been run
- biothings.cli.dataplugin.listing(plugin_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a35132e90>]=None, dump: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a35132fd0>]=True, upload: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a35133110>]=True, hubdb: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a35133250>]=False)[source]
List dumped files, uploaded sources, or internal hubdb contents
- biothings.cli.dataplugin.serve(plugin_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a35133b10>]=None, host: Annotated[str | None, <typer.models.OptionInfo object at 0x734a35133c50>]='localhost', port: Annotated[int | None, <typer.models.OptionInfo object at 0x734a35133d90>]=9999)[source]
Run a simple API server for serving documents from the source database
For example, we have a source_name = “test” with the following document structure: doc = {
“_id”: “123”, “key”: {
“a”: {“b”: “1”}, “x”: [
{“y”: “3”, “z”: “4”}, “5”
]
}
}
An API server will run at http://host:port/<your source name>/ (e.g http://localhost:9999/test/)
See all available sources on the index page: http://localhost:9999/
List all docs: http://localhost:9999/test/ (default is to return the first 10 docs)
Paginate doc list: http://localhost:9999/test/?start=10&limit=10
Retrieve a doc by id: http://localhost:9999/test/123
- Filter out docs with one or multiple fielded terms:
http://localhost:9999/test/?q=key.a.b:1 (query by any field with dot notation like key.a.b=1)
http://localhost:9999/test/?q=key.a.b:1%20AND%20key.x.y=3 (find all docs that match two fields)
http://localhost:9999/test/?q=key.x.z:4* (field value can contain wildcard * or ?)
http://localhost:9999/test/?q=key.x:5&start=10&limit=10 (pagination also works)
- biothings.cli.dataplugin.upload_source(plugin_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a35132710>]=None, batch_limit: Annotated[int | None, <typer.models.OptionInfo object at 0x734a35132850>]=10000, parallel: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a35132990>]=False, show_upload: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a35132ad0>]=True)[source]
Parse the downloaded data files from the dump operation and upload to the source database
Default database is sqlite3, but mongodb is supported if configured and an instance is setup
[green]NOTE[/green] Only works correctly if the dump command has been run
- biothings.cli.dataplugin.validate_manifest(plugin_name: Annotated[str | None, <typer.models.OptionInfo object at 0x734a34e98690>]=None, show_schema: Annotated[bool | None, <typer.models.OptionInfo object at 0x734a34e987d0>]=None) None[source]
[red][bold](experimental)[/bold][/red] Validate a provided manifest file via JSONSchema
Performs jsonschema validation against the manifest file. Will not perform validation against the potential loading of modules within the manifest
if the –show-schema argument is applied, then display the biothings manifest schema
The schema is located within the biothings repository at the following path relative to root: <biothings/hub/dataplugin/loaders/schema/manifest.json>
For a reference about jsonschema itself, see the following: https://json-schema.org/
biothings.cli.dataplugin_hub
biothings.cli.utils
Utility functions for the biothings-cli tool
These are semantically separated from the operations in that these functions aide in helping the operations perform a task. Usually anything releated to plugin metadata, job handling, and data manipulation should logically exist here
- biothings.cli.utils.clean_dumped_files(data_folder: str | Path, plugin_name: str)[source]
Remove all dumped files by a data plugin in the data folder.
- biothings.cli.utils.clean_uploaded_sources(working_dir, plugin_name)[source]
Remove all uploaded sources by a data plugin in the working directory.
- biothings.cli.utils.display_inspection_table(source_name: str, mode: str, inspection_mapping: dict, validate: bool = True)[source]
- biothings.cli.utils.get_manifest_content(working_dir: str | Path) dict[source]
return the manifest content of the data plugin in the working directory
- biothings.cli.utils.get_plugin_name(plugin_name=None, with_working_dir=True)[source]
return a valid plugin name (the folder name contains a data plugin) When plugin_name is provided as None, it use the current working folder. when with_working_dir is True, returns (plugin_name, working_dir) tuple
- biothings.cli.utils.get_uploaded_collections(src_db, uploaders)[source]
A helper function to get the uploaded collections in the source database
- biothings.cli.utils.get_uploaders(working_dir: Path) List[str][source]
A helper function to get the uploaders from the manifest file in the working directory used in show_uploaded_sources function below
- biothings.cli.utils.process_inspect(source_name, mode, limit, merge) dict[source]
Perform inspect for the given source. It’s used in do_inspect function below
- biothings.cli.utils.show_dumped_files(data_folder: str | Path, plugin_name: str) None[source]
A helper function to show the dumped files in the data folder
- biothings.cli.utils.show_source_build(build_instance: DataBuilder, build_configuration_name: str)[source]
A helper function to show the build information for the plugin source
- async biothings.cli.utils.show_source_index(index_name: str, index_manager: IndexManager, elasticsearch_mapping: dict)[source]
A helper function to show the elasticsearch index for the plugin source
biothings.cli.web_app
- class biothings.cli.web_app.BaseHandler(application: Application, request: HTTPServerRequest, **kwargs: Any)[source]
Bases:
RequestHandler- set_default_headers()[source]
Override this to set HTTP headers at the beginning of the request.
For example, this is the place to set a custom
Serverheader. Note that setting such headers in the normal flow of request processing may not do what you want, since headers may be reset during error handling.
- class biothings.cli.web_app.CLIApplication(db, table_space: List[str], **settings)[source]
Bases:
ApplicationThe main application class, which defines the routes and handlers.
- class biothings.cli.web_app.DocHandler(application: Application, request: HTTPServerRequest, **kwargs: Any)[source]
Bases:
BaseHandlerThe handler for the detail view of a document, e.g. /<source>/<doc_id/
- class biothings.cli.web_app.HomeHandler(application: Application, request: HTTPServerRequest, **kwargs: Any)[source]
Bases:
BaseHandlerthe handler for the landing page, which lists all available routes
- class biothings.cli.web_app.QueryHandler(application: Application, request: HTTPServerRequest, **kwargs: Any)[source]
Bases:
BaseHandlerThe handler for return a list of docs matching the query terms passed to “q” parameter e.g. /<source>/?q=<query>
- async biothings.cli.web_app.get_available_routes(db, table_space) Tuple[list, list][source]
return a list available URLs/routes based on the table_space and the actual collections in the database