gbif_registrar._utilities
Utility functions for internal use only.
Module Contents
Functions
|
Checks registrations for completeness. |
|
Checks uniqueness of dataset group registrations. |
|
Checks if registrations have been synchronized. |
|
Checks the format of the local_dataset_group_id. |
|
Checks registrations for unique local_dataset_id. |
|
Checks the format of the local_dataset_id. |
|
Checks uniqueness of local dataset endpoints. |
|
Checks for one-to-one cardinality between two columns of a dataframe. |
|
Deletes all local dataset endpoints from a GBIF dataset. |
Returns expected columns of the registrations file. |
|
|
Returns the gbif_dataset_uuid value. |
|
Returns the local_dataset_endpoint value. |
|
Returns the local_dataset_group_id value. |
|
Checks if a local dataset is synchronized with the GBIF registry. |
|
Posts a local dataset endpoint to GBIF. |
|
Posts a new metadata document to GBIF. |
|
Reads the metadata of a GBIF dataset. |
|
Reads the metadata document for a local dataset. |
|
Returns the registrations file as a Pandas dataframe. |
Requests a GBIF dataset UUID value from GBIF. |
- gbif_registrar._utilities._check_completeness(registrations)
Checks registrations for completeness.
A complete registration has values for all fields except (perhaps) synchronized, which is not essential for uploading to GBIF.
- Parameters:
registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`_read_registrations_file` to create this.
- Return type:
None
- Warns:
UserWarning – If any registrations are incomplete.
- gbif_registrar._utilities._check_group_registrations(registrations)
Checks uniqueness of dataset group registrations.
Registrations can be part of a group, the most recent of which is considered to be the authoritative version of the series.
- Parameters:
registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`_read_registrations_file` to create this.
- Return type:
None
- Warns:
If `local_dataset_group_id` and `gbif_dataset_uuid` don’t have one-to-one – cardinality.
- gbif_registrar._utilities._check_synchronized(registrations)
Checks if registrations have been synchronized.
Registrations contain all the information needed for GBIF to successfully crawl the corresponding dataset and post to the GBIF data portal. Boolean True/False values in the synchronized field indicate the dataset has been synchronized.
- Parameters:
registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`_read_registrations_file` to create this.
- Return type:
None
- Warns:
If a registration has not yet been crawled.
- gbif_registrar._utilities._check_local_dataset_group_id_format(registrations)
Checks the format of the local_dataset_group_id.
- registrationspandas.DataFrame
A dataframe of the registrations file. Use`_read_registrations_file` to create this.
- Return type:
None
- Warns:
If local_dataset_group_id does not have the truncated data package ID
format used by the Environmental Data Initiative (EDI), i.e.
`scope.identifier`.
- gbif_registrar._utilities._check_local_dataset_id(registrations)
Checks registrations for unique local_dataset_id.
Each registration is represented by a unique primary key, i.e. the local_dataset_id.
- Parameters:
registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`_read_registrations_file` to create this.
- Return type:
None
- Warns:
UserWarning – If values in the local_dataset_id column are not unique.
- gbif_registrar._utilities._check_local_dataset_id_format(registrations)
Checks the format of the local_dataset_id.
- Parameters:
registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`read_registrations_file` to create this.
- Return type:
None
- Warns:
If local_dataset_id does not have the data package ID format used by the
Environmental Data Initiative (EDI), i.e. `scope.identifier.revision`.
Examples
>>> registrations = _read_registrations_file('tests/registrations.csv') >>> _check_local_dataset_id_format(registrations)
- gbif_registrar._utilities._check_local_endpoints(registrations)
Checks uniqueness of local dataset endpoints.
Registrations each have a unique endpoint, which is crawled by GBIF and referenced to from the associated GBIF dataset page.
- Parameters:
registrations (pandas.DataFrame) – A dataframe of the registrations file. Use`_read_registrations_file` to create this.
- Return type:
None
- Warns:
If `local_dataset_id` and `local_dataset_endpoint` don’t have one-to-one – cardinality.
- gbif_registrar._utilities._check_one_to_one_cardinality(data, col1, col2)
Checks for one-to-one cardinality between two columns of a dataframe.
This is a helper function used in a couple registration checks.
- Parameters:
data (pandas.DataFrame) –
col1 (str) – Column name
col2 (str) – Column name
- Return type:
None
- Warns:
If `col1` and `col2` don’t have one-to-one cardinality.
- gbif_registrar._utilities._delete_local_dataset_endpoints(gbif_dataset_uuid)
Deletes all local dataset endpoints from a GBIF dataset.
- Parameters:
gbif_dataset_uuid (str) – The registration identifier assigned by GBIF to the local dataset.
- Returns:
Will raise an exception if the DELETE fails.
- Return type:
None
Notes
This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.
- gbif_registrar._utilities._expected_cols()
Returns expected columns of the registrations file.
- Returns:
The expected columns of the registrations file.
- Return type:
list
- gbif_registrar._utilities._get_gbif_dataset_uuid(local_dataset_group_id, registrations)
Returns the gbif_dataset_uuid value.
- Parameters:
local_dataset_group_id (str) – The dataset group identifier in the EDI repository.
registrations (pandas dataframe) – The registrations file as a dataframe. Use the _read_registrations_file function to create this.
- Returns:
The gbif_dataset_uuid value. This is the UUID assigned by GBIF to the local dataset group identifier. A new value will be returned if a gbif_dataset_uuid value doesn’t already exist for a local_dataset_group_id in the registrations file.
- Return type:
str
- gbif_registrar._utilities._get_local_dataset_endpoint(local_dataset_id)
Returns the local_dataset_endpoint value.
- Parameters:
local_dataset_id (str) – The dataset identifier in the EDI repository.
- Returns:
The local_dataset_endpoint URL value. This is the URL GBIF will crawl to access the local dataset.
- Return type:
str
Notes
This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.
- gbif_registrar._utilities._get_local_dataset_group_id(local_dataset_id)
Returns the local_dataset_group_id value.
- Parameters:
local_dataset_id (str) – The dataset identifier in the EDI repository.
- Returns:
The local_dataset_group_id value.
- Return type:
str
- gbif_registrar._utilities._is_synchronized(local_dataset_id, registrations_file)
Checks if a local dataset is synchronized with the GBIF registry.
- Parameters:
local_dataset_id (str) – The identifier of the dataset in the EDI repository.
registrations_file (str) – Path of the registrations file.
- Returns:
True if the dataset is synchronized, False otherwise.
- Return type:
bool
Notes
The local dataset is synchronized if the local dataset publication date (listed in the EML) and the local dataset endpoint match those of the GBIF instance.
- gbif_registrar._utilities._post_local_dataset_endpoint(local_dataset_endpoint, gbif_dataset_uuid)
Posts a local dataset endpoint to GBIF.
- Parameters:
local_dataset_endpoint (str) – This is the URL for downloading the dataset (.zip archive) at the EDI repository. Use the _get_local_dataset_endpoint function in the utilities module to obtain this value.
gbif_dataset_uuid (str) – The registration identifier assigned by GBIF to the local dataset group.
- Returns:
Will raise an exception if the POST fails.
- Return type:
None
Notes
This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.
- gbif_registrar._utilities._post_new_metadata_document(local_dataset_id, gbif_dataset_uuid)
Posts a new metadata document to GBIF.
- Parameters:
local_dataset_id (str) – The identifier of the dataset in the EDI repository.
gbif_dataset_uuid (str) – The registration identifier assigned by GBIF to the local dataset group.
- Returns:
Will raise an exception if the POST fails.
- Return type:
None
Notes
This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.
- gbif_registrar._utilities._read_gbif_dataset_metadata(gbif_dataset_uuid)
Reads the metadata of a GBIF dataset.
- Parameters:
gbif_dataset_uuid (str) – The registration identifier assigned by GBIF to the local dataset.
- Returns:
A dictionary containing the metadata of the GBIF dataset.
- Return type:
dict
Notes
This is high-level metadata, not the full EML document.
This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.
- gbif_registrar._utilities._read_local_dataset_metadata(local_dataset_id)
Reads the metadata document for a local dataset.
- Parameters:
local_dataset_id (str) – The identifier of the dataset in the EDI repository.
- Returns:
The metadata document for the local dataset in XML format.
- Return type:
str
Notes
This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.
- gbif_registrar._utilities._read_registrations_file(registrations_file)
Returns the registrations file as a Pandas dataframe.
- Parameters:
registrations_file (str) – Path of the registrations file.
- Returns:
The registrations file as a Pandas dataframe.
- Return type:
DataFrame
- gbif_registrar._utilities._request_gbif_dataset_uuid()
Requests a GBIF dataset UUID value from GBIF.
- Returns:
The GBIF dataset UUID value. This is the UUID assigned by GBIF to the local dataset group.
- Return type:
str
Notes
This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.