biothings.hub.datarelease

biothings.hub.datarelease.set_pending_to_publish(col_name)[source]
biothings.hub.datarelease.set_pending_to_release_note(col_name)[source]

biothings.hub.datarelease.publisher

class biothings.hub.datarelease.publisher.BasePublisher(envconf, log_folder, es_backups_folder, *args, **kwargs)[source]

Bases: BaseManager, BaseStatusRegisterer

property category
clean_stale_status()[source]

During startup, search for action in progress which would have been interrupted and change the state to “canceled”. Ex: some donwloading processes could have been interrupted, at startup, “downloading” status should be changed to “canceled” so to reflect actual state on these datasources. This must be overriden in subclass.

property collection

Return collection object used to fetch doc in which we store status

create_bucket(bucket_conf, credentials)[source]
get_pinfo()[source]

Return dict containing information about the current process (used to report in the hub)

get_pre_post_previous_result(build_doc, key_value)[source]

In order to start a pre- or post- pipeline, a first previous result, fed all along the pipeline to the next step, has to be defined, and depends on the type of publisher.

get_predicates()[source]
get_release_note_filename(build_version)[source]
load_build(key_name, stage=None)[source]
publish_release_notes(release_folder, build_version, s3_release_folder, s3_release_bucket, aws_key, aws_secret, prefix='release_')[source]
register_status(bdoc, status, transient=False, init=False, **extra)[source]
run_pre_post(key, stage, key_value, repo_conf, build_doc)[source]

Run pre- and post- publish steps (stage) for given key (eg. “snapshot”, “diff”). key_value is the value of the key inside “key” dict (such as a snapshot name or a build name) These steps are defined in config file.

setup()[source]
setup_log(build_name=None)[source]
step_archive(step_conf, build_doc, previous)[source]
step_upload(step_conf, build_doc, previous)[source]
step_upload_s3(step_conf, build_doc, previous)[source]
template_out_conf(build_doc)[source]
trigger_release_note(doc, **kwargs)[source]

Launch a release note generation given a src_build document. In order to know the first collection to compare with, get_previous_collection() method is used. release_note() method will get **kwargs for more optional parameters.

class biothings.hub.datarelease.publisher.DiffPublisher(diff_manager, *args, **kwargs)[source]

Bases: BasePublisher

get_pre_post_previous_result(build_doc, key_value)[source]

In order to start a pre- or post- pipeline, a first previous result, fed all along the pipeline to the next step, has to be defined, and depends on the type of publisher.

get_release_note_filename(build_version)[source]
post_publish(build_name, repo_conf, build_doc)[source]

Post-publish hook, running steps declared in config, but also whatever would be defined in a sub-class

pre_publish(previous_build_name, repo_conf, build_doc)[source]

Pre-publish hook, running steps declared in config, but also whatever would be defined in a sub-class

publish(build_name, previous_build=None, steps=('pre', 'reset', 'upload', 'meta', 'post'))[source]

Publish diff files and metadata about the diff files, release note, etc… on s3. Using build_name, a src_build document is fetched, and a diff release is searched. If more than one diff release is found, “previous_build” must be specified to pick the correct one. - steps:

  • pre/post: optional steps processed as first and last steps.

  • reset: highly recommended, reset synced flag in diff files so they won’t get skipped when used…

  • upload: upload diff_folder content to S3

  • meta: publish/register the version as available for auto-updating hubs

reset_synced(diff_folder, backend=None)[source]

Remove “synced” flag from any pyobj file in diff_folder

run_post_publish_diff(build_name, repo_conf, build_doc)[source]
run_pre_publish_diff(previous_build_name, repo_conf, build_doc)[source]
exception biothings.hub.datarelease.publisher.PublisherException[source]

Bases: Exception

class biothings.hub.datarelease.publisher.ReleaseManager(diff_manager, snapshot_manager, poll_schedule=None, *args, **kwargs)[source]

Bases: BaseManager, BaseStatusRegisterer

DEFAULT_DIFF_PUBLISHER_CLASS

alias of DiffPublisher

DEFAULT_SNAPSHOT_PUBLISHER_CLASS

alias of SnapshotPublisher

build_release_note(old_colname, new_colname, note=None) ReleaseNoteSource[source]

Build a release note containing most significant changes between build names “old_colname” and “new_colname”. An optional end note can be added to bring more specific information about the release.

Return a dictionary containing significant changes.

clean_stale_status()[source]

During startup, search for action in progress which would have been interrupted and change the state to “canceled”. Ex: some donwloading processes could have been interrupted, at startup, “downloading” status should be changed to “canceled” so to reflect actual state on these datasources. This must be overriden in subclass.

property collection

Return collection object used to fetch doc in which we store status

configure(release_confdict)[source]

Configure manager with release “confdict”. See config_hub.py in API for the format.

create_release_note(old, new, filename=None, note=None, format='txt')[source]

Generate release note files, in TXT and JSON format, containing significant changes summary between target collections old and new. Output files are stored in a diff folder using generate_folder(old,new).

‘filename’ can optionally be specified, though it’s not recommended as the publishing pipeline, using these files, expects a filenaming convention.

‘note’ is an optional free text that can be added to the release note, at the end.

txt ‘format’ is the only one supported for now.

create_release_note_from_build(build_doc)[source]
get_pinfo()[source]

Return dict containing information about the current process (used to report in the hub)

get_predicates()[source]
get_release_note(old, new, format='txt', prefix='release_*')[source]
load_build(key_name, stage=None)[source]
poll(state, func)[source]

Search for source in collection ‘col’ with a pending flag list containing ‘state’ and and call ‘func’ for each document found (with doc as only param)

publish(publisher_env, snapshot_or_build_name, *args, **kwargs)[source]
publish_build(build_doc)[source]
publish_diff(publisher_env, build_name, previous_build=None, steps=('pre', 'reset', 'upload', 'meta', 'post'))[source]
publish_snapshot(publisher_env, snapshot, build_name=None, previous_build=None, steps=('pre', 'meta', 'post'))[source]
register_status(bdoc, stage, status, transient=False, init=False, **extra)[source]
release_info(env=None, remote=False)[source]
reset_synced(old, new)[source]

Reset sync flags for diff files produced between “old” and “new” build. Once a diff has been applied, diff files are flagged as synced so subsequent diff won’t be applied twice (for optimization reasons, not to avoid data corruption since diff files can be safely applied multiple times). In any needs to apply the diff another time, diff files needs to reset.

setup()[source]
setup_log(build_name=None)[source]
class biothings.hub.datarelease.publisher.SnapshotPublisher(snapshot_manager, *args, **kwargs)[source]

Bases: BasePublisher

get_pre_post_previous_result(build_doc, key_value)[source]

In order to start a pre- or post- pipeline, a first previous result, fed all along the pipeline to the next step, has to be defined, and depends on the type of publisher.

post_publish(snapshot_name, repo_conf, build_doc)[source]

Post-publish hook, running steps declared in config, but also whatever would be defined in a sub-class

pre_publish(snapshot_name, repo_conf, build_doc)[source]

Pre-publish hook, running steps declared in config, but also whatever would be defined in a sub-class

publish(snapshot, build_name=None, previous_build=None, steps=('pre', 'meta', 'post'))[source]

Publish snapshot metadata to S3. If snapshot repository is of type “s3”, data isn’t actually uploaded/published since it’s already there on s3. If type “fs”, some “pre” steps can be added to the RELEASE_CONFIG paramater to archive and upload it to s3. Metadata about the snapshot, release note, etc… is then uploaded in correct buckets as defined in config, and “post” steps can be run afterward.

Though snapshots don’t need any previous version to be applied on, a release note with significant changes between current snapshot and a previous version could have been generated. By default, snapshot name is used to pick one single build document and from the document, get the release note information.

run_post_publish_snapshot(snapshot_name, repo_conf, build_doc)[source]
run_pre_publish_snapshot(snapshot_name, repo_conf, build_doc)[source]

biothings.hub.datarelease.releasenote

class biothings.hub.datarelease.releasenote.ReleaseNoteSource(old_src_build_reader: ReleaseNoteSrcBuildReader, new_src_build_reader: ReleaseNoteSrcBuildReader, diff_stats_from_metadata_file: dict, addon_note: str)[source]

Bases: object

diff_build_stats() dict[source]
diff_datasource_info() dict[source]
diff_datasource_mapping() dict[source]
to_dict() dict[source]
class biothings.hub.datarelease.releasenote.ReleaseNoteSrcBuildReader(src_build_doc: dict)[source]

Bases: object

attach_cold_src_build_reader(other: ReleaseNoteSrcBuildReader)[source]

Attach a cold src_build reader.

It’s required that self is a hot src_builder reader and other is cold.

property build_id: str
property build_stats: dict
property build_version: str
property cold_collection_name: str
property datasource_mapping: dict
property datasource_stats: dict
property datasource_versions: dict
has_cold_collection() bool[source]
class biothings.hub.datarelease.releasenote.ReleaseNoteSrcBuildReaderAdapter(src_build_reader: ReleaseNoteSrcBuildReader)[source]

Bases: object

property build_stats
property datasource_info
class biothings.hub.datarelease.releasenote.ReleaseNoteTxt(source: ReleaseNoteSource)[source]

Bases: object

save(filepath)[source]