:py:mod:`gbif_registrar._utilities` =================================== .. py:module:: gbif_registrar._utilities .. autoapi-nested-parse:: Utility functions for internal use only. Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: gbif_registrar._utilities._check_completeness gbif_registrar._utilities._check_group_registrations gbif_registrar._utilities._check_synchronized gbif_registrar._utilities._check_local_dataset_group_id_format gbif_registrar._utilities._check_local_dataset_id gbif_registrar._utilities._check_local_dataset_id_format gbif_registrar._utilities._check_local_endpoints gbif_registrar._utilities._check_one_to_one_cardinality gbif_registrar._utilities._delete_local_dataset_endpoints gbif_registrar._utilities._expected_cols gbif_registrar._utilities._get_gbif_dataset_uuid gbif_registrar._utilities._get_local_dataset_endpoint gbif_registrar._utilities._get_local_dataset_group_id gbif_registrar._utilities._is_synchronized gbif_registrar._utilities._post_local_dataset_endpoint gbif_registrar._utilities._post_new_metadata_document gbif_registrar._utilities._read_gbif_dataset_metadata gbif_registrar._utilities._read_local_dataset_metadata gbif_registrar._utilities._read_registrations_file gbif_registrar._utilities._request_gbif_dataset_uuid .. py:function:: _check_completeness(registrations) Checks registrations for completeness. A complete registration has values for all fields except (perhaps) `synchronized`, which is not essential for uploading to GBIF. :param registrations: A dataframe of the registrations file. Use`_read_registrations_file` to create this. :type registrations: pandas.DataFrame :rtype: None :Warns: **UserWarning** -- If any registrations are incomplete. .. py:function:: _check_group_registrations(registrations) Checks uniqueness of dataset group registrations. Registrations can be part of a group, the most recent of which is considered to be the authoritative version of the series. :param registrations: A dataframe of the registrations file. Use`_read_registrations_file` to create this. :type registrations: pandas.DataFrame :rtype: None :Warns: **If `local_dataset_group_id` and `gbif_dataset_uuid` don't have one-to-one** -- cardinality. .. py:function:: _check_synchronized(registrations) Checks if registrations have been synchronized. Registrations contain all the information needed for GBIF to successfully crawl the corresponding dataset and post to the GBIF data portal. Boolean True/False values in the `synchronized` field indicate the dataset has been synchronized. :param registrations: A dataframe of the registrations file. Use`_read_registrations_file` to create this. :type registrations: pandas.DataFrame :rtype: None :Warns: **If a registration has not yet been crawled.** .. py:function:: _check_local_dataset_group_id_format(registrations) Checks the format of the local_dataset_group_id. registrations : pandas.DataFrame A dataframe of the registrations file. Use`_read_registrations_file` to create this. :rtype: None :Warns: * **If local_dataset_group_id does not have the truncated data package ID** * **format used by the Environmental Data Initiative (EDI), i.e.** * **`scope.identifier`.** .. py:function:: _check_local_dataset_id(registrations) Checks registrations for unique local_dataset_id. Each registration is represented by a unique primary key, i.e. the `local_dataset_id`. :param registrations: A dataframe of the registrations file. Use`_read_registrations_file` to create this. :type registrations: pandas.DataFrame :rtype: None :Warns: **UserWarning** -- If values in the `local_dataset_id` column are not unique. .. py:function:: _check_local_dataset_id_format(registrations) Checks the format of the local_dataset_id. :param registrations: A dataframe of the registrations file. Use`read_registrations_file` to create this. :type registrations: pandas.DataFrame :rtype: None :Warns: * **If local_dataset_id does not have the data package ID format used by the** * **Environmental Data Initiative (EDI), i.e. `scope.identifier.revision`.** .. rubric:: Examples >>> registrations = _read_registrations_file('tests/registrations.csv') >>> _check_local_dataset_id_format(registrations) .. py:function:: _check_local_endpoints(registrations) Checks uniqueness of local dataset endpoints. Registrations each have a unique endpoint, which is crawled by GBIF and referenced to from the associated GBIF dataset page. :param registrations: A dataframe of the registrations file. Use`_read_registrations_file` to create this. :type registrations: pandas.DataFrame :rtype: None :Warns: **If `local_dataset_id` and `local_dataset_endpoint` don't have one-to-one** -- cardinality. .. py:function:: _check_one_to_one_cardinality(data, col1, col2) Checks for one-to-one cardinality between two columns of a dataframe. This is a helper function used in a couple registration checks. :param data: :type data: pandas.DataFrame :param col1: Column name :type col1: str :param col2: Column name :type col2: str :rtype: None :Warns: **If `col1` and `col2` don't have one-to-one cardinality.** .. py:function:: _delete_local_dataset_endpoints(gbif_dataset_uuid) Deletes all local dataset endpoints from a GBIF dataset. :param gbif_dataset_uuid: The registration identifier assigned by GBIF to the local dataset. :type gbif_dataset_uuid: str :returns: Will raise an exception if the DELETE fails. :rtype: None .. rubric:: Notes This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this. .. py:function:: _expected_cols() Returns expected columns of the registrations file. :returns: The expected columns of the registrations file. :rtype: list .. py:function:: _get_gbif_dataset_uuid(local_dataset_group_id, registrations) Returns the gbif_dataset_uuid value. :param local_dataset_group_id: The dataset group identifier in the EDI repository. :type local_dataset_group_id: str :param registrations: The registrations file as a dataframe. Use the _read_registrations_file function to create this. :type registrations: pandas dataframe :returns: The gbif_dataset_uuid value. This is the UUID assigned by GBIF to the local dataset group identifier. A new value will be returned if a gbif_dataset_uuid value doesn't already exist for a local_dataset_group_id in the registrations file. :rtype: str .. py:function:: _get_local_dataset_endpoint(local_dataset_id) Returns the local_dataset_endpoint value. :param local_dataset_id: The dataset identifier in the EDI repository. :type local_dataset_id: str :returns: The local_dataset_endpoint URL value. This is the URL GBIF will crawl to access the local dataset. :rtype: str .. rubric:: Notes This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this. .. py:function:: _get_local_dataset_group_id(local_dataset_id) Returns the local_dataset_group_id value. :param local_dataset_id: The dataset identifier in the EDI repository. :type local_dataset_id: str :returns: The local_dataset_group_id value. :rtype: str .. py:function:: _is_synchronized(local_dataset_id, registrations_file) Checks if a local dataset is synchronized with the GBIF registry. :param local_dataset_id: The identifier of the dataset in the EDI repository. :type local_dataset_id: str :param registrations_file: Path of the registrations file. :type registrations_file: str :returns: True if the dataset is synchronized, False otherwise. :rtype: bool .. rubric:: Notes The local dataset is synchronized if the local dataset publication date (listed in the EML) and the local dataset endpoint match those of the GBIF instance. .. py:function:: _post_local_dataset_endpoint(local_dataset_endpoint, gbif_dataset_uuid) Posts a local dataset endpoint to GBIF. :param local_dataset_endpoint: This is the URL for downloading the dataset (.zip archive) at the EDI repository. Use the _get_local_dataset_endpoint function in the utilities module to obtain this value. :type local_dataset_endpoint: str :param gbif_dataset_uuid: The registration identifier assigned by GBIF to the local dataset group. :type gbif_dataset_uuid: str :returns: Will raise an exception if the POST fails. :rtype: None .. rubric:: Notes This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this. .. py:function:: _post_new_metadata_document(local_dataset_id, gbif_dataset_uuid) Posts a new metadata document to GBIF. :param local_dataset_id: The identifier of the dataset in the EDI repository. :type local_dataset_id: str :param gbif_dataset_uuid: The registration identifier assigned by GBIF to the local dataset group. :type gbif_dataset_uuid: str :returns: Will raise an exception if the POST fails. :rtype: None .. rubric:: Notes This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this. .. py:function:: _read_gbif_dataset_metadata(gbif_dataset_uuid) Reads the metadata of a GBIF dataset. :param gbif_dataset_uuid: The registration identifier assigned by GBIF to the local dataset. :type gbif_dataset_uuid: str :returns: A dictionary containing the metadata of the GBIF dataset. :rtype: dict .. rubric:: Notes This is high-level metadata, not the full EML document. This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this. .. py:function:: _read_local_dataset_metadata(local_dataset_id) Reads the metadata document for a local dataset. :param local_dataset_id: The identifier of the dataset in the EDI repository. :type local_dataset_id: str :returns: The metadata document for the local dataset in XML format. :rtype: str .. rubric:: Notes This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this. .. py:function:: _read_registrations_file(registrations_file) Returns the registrations file as a Pandas dataframe. :param registrations_file: Path of the registrations file. :type registrations_file: str :returns: The registrations file as a Pandas dataframe. :rtype: DataFrame .. py:function:: _request_gbif_dataset_uuid() Requests a GBIF dataset UUID value from GBIF. :returns: The GBIF dataset UUID value. This is the UUID assigned by GBIF to the local dataset group. :rtype: str .. rubric:: Notes This function requires authentication with GBIF. Use the load_configuration function from the authenticate module to do this.