Opendata
Rucio has native support for Opendata which was introduced in v38.0.0
.
It is an evolving feature and allows to tag already registered Rucio DIDs as Opendata and to add additional metadata (json-compatible).
Rucio is able to expose these Opendata DIDs in a dedicated Opendata endpont, returning useful information such as a list with all Opendata DIDs or the Opendata details of a given DID.
Opendata CLI
Adding a DID to the Opendata catalog
Any Rucio DID can be added to the Opendata catalog.
This command will create a Rucio DID to use in this tutorial (skip if you already have some DIDs to work with).
rucio scope add demo --account root
DID=demo:demo-1
rucio did add --type dataset $DID
To add a DID to the Opendata catalog:
rucio opendata did add $DID
A list of all Opendata DIDs may be feched by the following command. It accepts additional option such as filters.
Since the initial state of an Opendata DID is draft
, we can filter for that.
rucio opendata did list --state draft
We can also show details for a given Opendata DID
rucio opendata did show $DID --files --meta
this command supports multiple flags, such as files
used to list all the files for this DID or meta
to show Opendata metadata.
The --public
flag can also be used to perform the request against the Opendata public endpoint. This will only work if the Opendata DID is marked as public in the Opendata catalog.
Updating an Opendata DID
The Opendata DIDs can be enriched with some additional Opendata-specific data.
A DOI can be added to the DID via
rucio opendata did update $DID --doi 10.1234/abcd.56789
The DOI must be a valid DOI string and globally unique in the Rucio Opendata catalog.
A JSON object may be added as Opendata metadata
rucio opendata did update $DID --meta '{"key":"value"}'
This Opendata metadata will be available in the show
command and in the
Public Opendata
An Opendata DID can be marked as public
.
Public Opendata DIDs will be exposed publicly in the Rucio server without the need of any kind of authentication.
In order to become public, the DID must be closed if not a file.
A Rucio DID may be closed via the following command
rucio did update --close $DID
After the DID is closed, it can be set to public via
rucio opendata did update $DID --state public
An Opendata DID can be reverted to a non-public state by transferring it to the suspended
state.
rucio opendata did update $DID --state suspended
There is an exclusive Rucio endpoint for public Opendata called opendata_public
.
For production deployments we recommend a dedicated Rucio server with only the opendata_public
enabled, as this server instace is able to process unauthenticated requests.
If this server is accessible to other services related to Opendata such as the Opendata Portal, it can provide updated information related to the Opendata DIDs registered in Rucio.
REST API
The REST API for Opendata is available as part of the Rucio REST API documentation.
The most important feature is that users are able to send requests to the public opendata endpoint without any kind of autentication. This can be used to establish synchronization between Rucio and a third party app such as an Opendata Portal.