gbif_registrar._utilities

Utility functions for internal use only.

Module Contents

Functions

_check_completeness(registrations)

Checks registrations for completeness.

_check_group_registrations(registrations)

Checks uniqueness of dataset group registrations.

_check_synchronized(registrations)

Checks if registrations have been synchronized.

_check_local_dataset_group_id_format(registrations)

Checks the format of the local_dataset_group_id.

_check_local_dataset_id(registrations)

Checks registrations for unique local_dataset_id.

_check_local_dataset_id_format(registrations)

Checks the format of the local_dataset_id.

_check_local_endpoints(registrations)

Checks uniqueness of local dataset endpoints.

_check_one_to_one_cardinality(data, col1, col2)

Checks for one-to-one cardinality between two columns of a dataframe.

_delete_local_dataset_endpoints(gbif_dataset_uuid)

Deletes all local dataset endpoints from a GBIF dataset.

_expected_cols()

Returns expected columns of the registrations file.

_get_gbif_dataset_uuid(local_dataset_group_id, ...)

Returns the gbif_dataset_uuid value.

_get_local_dataset_endpoint(local_dataset_id)

Returns the local_dataset_endpoint value.

_get_local_dataset_group_id(local_dataset_id)

Returns the local_dataset_group_id value.

_is_synchronized(local_dataset_id, registrations_file)

Checks if a local dataset is synchronized with the GBIF registry.

_post_local_dataset_endpoint(local_dataset_endpoint, ...)

Posts a local dataset endpoint to GBIF.

_post_new_metadata_document(local_dataset_id, ...)

Posts a new metadata document to GBIF.

_read_gbif_dataset_metadata(gbif_dataset_uuid)

Reads the metadata of a GBIF dataset.

_read_local_dataset_metadata(local_dataset_id)

Reads the metadata document for a local dataset.

_read_registrations_file(registrations_file)

Returns the registrations file as a Pandas dataframe.

_request_gbif_dataset_uuid()

Requests a GBIF dataset UUID value from GBIF.

gbif_registrar._utilities._check_completeness(registrations)

Checks registrations for completeness.

A complete registration has values for all fields except (perhaps) synchronized, which is not essential for uploading to GBIF.

Parameters:

registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`_read_registrations_file` to create this.

Return type:

None

Warns:

UserWarning – If any registrations are incomplete.

gbif_registrar._utilities._check_group_registrations(registrations)

Checks uniqueness of dataset group registrations.

Registrations can be part of a group, the most recent of which is considered to be the authoritative version of the series.

Parameters:

registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`_read_registrations_file` to create this.

Return type:

None

Warns:

If `local_dataset_group_id` and `gbif_dataset_uuid` don’t have one-to-one – cardinality.

gbif_registrar._utilities._check_synchronized(registrations)

Checks if registrations have been synchronized.

Registrations contain all the information needed for GBIF to successfully crawl the corresponding dataset and post to the GBIF data portal. Boolean True/False values in the synchronized field indicate the dataset has been synchronized.

Parameters:

registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`_read_registrations_file` to create this.

Return type:

None

Warns:

If a registration has not yet been crawled.

gbif_registrar._utilities._check_local_dataset_group_id_format(registrations)

Checks the format of the local_dataset_group_id.

registrationspandas.DataFrame

A dataframe of the registrations file. Use`_read_registrations_file` to create this.

Return type:

None

Warns:
  • If local_dataset_group_id does not have the truncated data package ID

  • format used by the Environmental Data Initiative (EDI), i.e.

  • `scope.identifier`.

gbif_registrar._utilities._check_local_dataset_id(registrations)

Checks registrations for unique local_dataset_id.

Each registration is represented by a unique primary key, i.e. the local_dataset_id.

Parameters:

registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`_read_registrations_file` to create this.

Return type:

None

Warns:

UserWarning – If values in the local_dataset_id column are not unique.

gbif_registrar._utilities._check_local_dataset_id_format(registrations)

Checks the format of the local_dataset_id.

Parameters:

registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`read_registrations_file` to create this.

Return type:

None

Warns:
  • If local_dataset_id does not have the data package ID format used by the

  • Environmental Data Initiative (EDI), i.e. `scope.identifier.revision`.

Examples

>>> registrations = _read_registrations_file('tests/registrations.csv')
>>> _check_local_dataset_id_format(registrations)
gbif_registrar._utilities._check_local_endpoints(registrations)

Checks uniqueness of local dataset endpoints.

Registrations each have a unique endpoint, which is crawled by GBIF and referenced to from the associated GBIF dataset page.

Parameters:

registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`_read_registrations_file` to create this.

Return type:

None

Warns:

If `local_dataset_id` and `local_dataset_endpoint` don’t have one-to-one – cardinality.

gbif_registrar._utilities._check_one_to_one_cardinality(data, col1, col2)

Checks for one-to-one cardinality between two columns of a dataframe.

This is a helper function used in a couple registration checks.

Parameters:
  • data (pandas.DataFrame) –

  • col1 (str) – Column name

  • col2 (str) – Column name

Return type:

None

Warns:

If `col1` and `col2` don’t have one-to-one cardinality.

gbif_registrar._utilities._delete_local_dataset_endpoints(gbif_dataset_uuid)

Deletes all local dataset endpoints from a GBIF dataset.

Parameters:

gbif_dataset_uuid (str) – The registration identifier assigned by GBIF to the local dataset.

Returns:

Will raise an exception if the DELETE fails.

Return type:

None

Notes

This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.

gbif_registrar._utilities._expected_cols()

Returns expected columns of the registrations file.

Returns:

The expected columns of the registrations file.

Return type:

list

gbif_registrar._utilities._get_gbif_dataset_uuid(local_dataset_group_id, registrations)

Returns the gbif_dataset_uuid value.

Parameters:
  • local_dataset_group_id (str) – The dataset group identifier in the EDI repository.

  • registrations (pandas dataframe) – The registrations file as a dataframe. Use the _read_registrations_file function to create this.

Returns:

The gbif_dataset_uuid value. This is the UUID assigned by GBIF to the local dataset group identifier. A new value will be returned if a gbif_dataset_uuid value doesn’t already exist for a local_dataset_group_id in the registrations file.

Return type:

str

gbif_registrar._utilities._get_local_dataset_endpoint(local_dataset_id)

Returns the local_dataset_endpoint value.

Parameters:

local_dataset_id (str) – The dataset identifier in the EDI repository.

Returns:

The local_dataset_endpoint URL value. This is the URL GBIF will crawl to access the local dataset.

Return type:

str

Notes

This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.

gbif_registrar._utilities._get_local_dataset_group_id(local_dataset_id)

Returns the local_dataset_group_id value.

Parameters:

local_dataset_id (str) – The dataset identifier in the EDI repository.

Returns:

The local_dataset_group_id value.

Return type:

str

gbif_registrar._utilities._is_synchronized(local_dataset_id, registrations_file)

Checks if a local dataset is synchronized with the GBIF registry.

Parameters:
  • local_dataset_id (str) – The identifier of the dataset in the EDI repository.

  • registrations_file (str) – Path of the registrations file.

Returns:

True if the dataset is synchronized, False otherwise.

Return type:

bool

Notes

The local dataset is synchronized if the local dataset publication date (listed in the EML) and the local dataset endpoint match those of the GBIF instance.

gbif_registrar._utilities._post_local_dataset_endpoint(local_dataset_endpoint, gbif_dataset_uuid)

Posts a local dataset endpoint to GBIF.

Parameters:
  • local_dataset_endpoint (str) – This is the URL for downloading the dataset (.zip archive) at the EDI repository. Use the _get_local_dataset_endpoint function in the utilities module to obtain this value.

  • gbif_dataset_uuid (str) – The registration identifier assigned by GBIF to the local dataset group.

Returns:

Will raise an exception if the POST fails.

Return type:

None

Notes

This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.

gbif_registrar._utilities._post_new_metadata_document(local_dataset_id, gbif_dataset_uuid)

Posts a new metadata document to GBIF.

Parameters:
  • local_dataset_id (str) – The identifier of the dataset in the EDI repository.

  • gbif_dataset_uuid (str) – The registration identifier assigned by GBIF to the local dataset group.

Returns:

Will raise an exception if the POST fails.

Return type:

None

Notes

This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.

gbif_registrar._utilities._read_gbif_dataset_metadata(gbif_dataset_uuid)

Reads the metadata of a GBIF dataset.

Parameters:

gbif_dataset_uuid (str) – The registration identifier assigned by GBIF to the local dataset.

Returns:

A dictionary containing the metadata of the GBIF dataset.

Return type:

dict

Notes

This is high-level metadata, not the full EML document.

This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.

gbif_registrar._utilities._read_local_dataset_metadata(local_dataset_id)

Reads the metadata document for a local dataset.

Parameters:

local_dataset_id (str) – The identifier of the dataset in the EDI repository.

Returns:

The metadata document for the local dataset in XML format.

Return type:

str

Notes

This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.

gbif_registrar._utilities._read_registrations_file(registrations_file)

Returns the registrations file as a Pandas dataframe.

Parameters:

registrations_file (str) – Path of the registrations file.

Returns:

The registrations file as a Pandas dataframe.

Return type:

DataFrame

gbif_registrar._utilities._request_gbif_dataset_uuid()

Requests a GBIF dataset UUID value from GBIF.

Returns:

The GBIF dataset UUID value. This is the UUID assigned by GBIF to the local dataset group.

Return type:

str

Notes

This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.