Skip to main content

Setting up a Rucio demo environment

Prerequisites

We provide a containerised version of the Rucio development environment for a quick start. Our containers are ready-made for Docker, which means you need to have a working Docker installation. To install Docker for your platform, please refer to the Docker installation guide, for example, for CentOS follow these instructions for the Docker Community Edition. Please make sure that you install this recent Docker version especially if you are on CentOS, i.e. its default version is ancient and does not support some features we rely on.

Start the Docker daemon with sudo systemctl start docker. You can confirm that Docker is running properly by executing (might need sudo:

docker run hello-world

If successful, this will print an informational message telling you that you are ready to go. Now, also install the docker-compose helper tool with sudo yum install docker-compose (might need EPEL enabled). You are now ready to install the Rucio development environment.

Preparing the environment

The first step is to check if SELinux is running. SELinux will block access to the directories mounted inside the container, and so depending on you node might need to be put in permissive mode with setenforce permissive.

The second step is to fork the main Rucio repository on GitHub by clicking the yellow Fork Star button, and then clone your private forked Rucio repository to your /dev/rucio. Afterwards add the main upstream repository as an additional remote to be able to submit pull requests later on:

cd ~/dev
git clone git@github.com:<your_username>/rucio.git
cd rucio
git remote add upstream git@github.com:rucio/rucio.git
git fetch --all

Now, ensure that the .git/config is proper, i.e., mentioning your full name and email address, and create the .githubtoken file that contains a full access token from Github Account Settings.

Next, startup the Rucio development environment with docker-compose. There are three different types: a standard one to just run the unittests and do basic development, which includes just Rucio without any transfer capabilities. One slightly larger one, which includes the File Transfer Service (FTS) and three XrootD storage servers to develop upload/download and transfers capabilities. And a third large one, which adds the full monitoring stack with Logstash, Elasticsearch, Kibana and Grafana.

Using the standard environment

Run the containers using docker-compose (again might need sudo):

docker-compose --file etc/docker/dev/docker-compose.yml up -d

And verify that it is running properly:

docker ps

This should show you a few running containers: the Rucio server, the PostgreSQL database and the Graphite monitoring.

Finally, you can jump into the container with:

docker exec -it dev_rucio_1 /bin/bash

To verify that everything is in order, you can now either run the full unit tests or only set up the database. Running the full testing suite takes ~10 minutes:

tools/run_tests.sh

Alternatively, you can bootstrap the test environment once with the -ioption and then selectively or repeatedly run test case modules, test case groups, or even single test cases, for example:

tools/run_tests.sh -i
pytest -v --full-trace lib/rucio/tests/test_replica.py
pytest -v --full-trace lib/rucio/tests/test_replica.py:TestReplicaCore
pytest -v --full-trace lib/rucio/tests/test_replica.py:TestReplicaCore.test_delete_replicas_from_datasets

Using the environment including storage

Again run the containers using docker-compose:

docker-compose --file etc/docker/dev/docker-compose.yml --profile storage up -d

This should show you a few more running containers: the Rucio server, the PostgreSQL database, FTS and its associated MySQL database, the Graphite monitoring, and three XrootD storage servers.

With this container you can upload and download data to/from the storage and submit data transfers. To set this up, add the -r option to the setup.

tools/run_tests.sh -ir

This creates a few random files and uploads them, creates a few datasets and containers, and requests a replication rule for the container, which starts in state REPLICATING. To demonstrate the transfer capability, the daemons can be run in single-execution mode in order:

rucio rule-info <rule-id>

rucio-conveyor-submitter --run-once
rucio-conveyor-poller --run-once --older-than 0
rucio-conveyor-finisher --run-once

rucio rule-info <rule-id>

On the second display of the rule, its state has cleared to OK.

Using the environment including monitoring

Again run the containers using docker-compose:

docker-compose --file etc/docker/dev/docker-compose.yml --profile storage --profile monitoring up -d

Now you will have the same containers as before plus a full monitoring stack with Logstash, Elasticsearch, Kibana and Grafana.

To create some events and write them to Elasticsearch first run again the tests as before:

tools/run_tests.sh -ir

Then you will have to run the transfer daemons (conveyor-*) and messaging daemon (hermes) to send the events to ActiveMQ. There a script for that which repeats these daemons in single execution mode from the section in a loop:

run_daemons

When all the daemons ran you will be able to find the events in Kibana. If you run the docker environment on you local machine you can access Kibana at <http://localhost:5601>. The necessary index pattern will be added automatically. There is also one dashboard available in Kibana. If it is running on remote machine you can SSH forward it:

ssh -L 5601:127.0.0.1:5601 <hostname>

Additionally, there is also a Grafana server running with one simple dashboard. You can access it at <http://localhost:3000>. The default credentials are "admin/admin". Also ActiveMQ web console can be accessed at <http://localhost:8161>.

If you would like to continously create some transfers and events there are scripts available for that. Open two different shells and in one run:

create_monit_data

And in the other run:

run_daemons

Development

The idea for containerised development is that you use your host machine to edit the files, and test the changes within the container environment. On your host machine, you should be able to:

cd ~/dev/rucio
emacs <file>

To see your changes in action the recommended way is to jump twice into the container in parallel. One terminal to follow the output of the Rucio server with a shortcut to tail the logfiles (logshow), and one terminal to actually run interactive commands:

From your host, get a separate Terminal 1 (the Rucio "server log show"):

docker exec -it dev_rucio_1 /bin/bash
logshow

Terminal 1 can now be left open, and then from your host go into a new Terminal 2 (the "interactive" terminal):

docker exec -it dev_rucio_1 /bin/bash
rucio whoami

The command will output in Terminal 2, and at the same time the server debug output will be shown in Terminal 1.

The same logshow is also available in the FTS container:

docker exec -it dev_fts_1 /bin/bash
logshow

Development tricks

Server changes

If you edit server-side files, e.g. in lib/rucio/web, and your changes are not showing up then it is usually helpful to flush the memcache and force the webserver to restart without having to restart the container. Inside the container execute:

echo 'flush_all' | nc localhost 11211 && httpd -k graceful**

Database access

The default database is PostgreSQL, and docker-compose is configured to open its port to the host machine. Using your favourite SQL navigator, e.g., DBeaver, you can connect to the database using the default access on localhost:5432 to database name rucio, schema name dev, with username rucio and password secret.

Docker is eating my disk space

You can reclaim this with:

docker system prune -f --volumes

Where do I find the Dockerfile

This container can be found on Dockerhub as rucio/rucio-dev, and the corresponding Dockerfile is also available. It provides a Rucio environment which allows you to mount your local code in the containers bin, lib, and tools directory. The container is set up to run against a PostgreSQL database with fsync and most durability features for the WAL disabled to improve testing IO throughput. Tests and checks can be run against the development code without having to rebuild the container.

I need a Docker based on another branch! (not rucio/master)

In such cases, you can download the Rucio container files and e.g. choose to modify the dev container before build:

cd /opt
sudo git clone https://github.com/rucio/containers
cd ../containers/dev

Change anything you need, e.g. the code branch cloned to your docker container:

# from
RUN git clone https://github.com/rucio/rucio.git /tmp/rucio
# to e.g.:
RUN git clone --single-branch --branch next https://github.com/rucio/rucio.git /tmp/rucio
#build your docker
sudo docker build -t rucio/rucio-dev

Compose as usual using docker-compose:

cd /opt/rucio
sudo docker-compose --file etc/docker/dev/docker-compose.yml up -d

Start the daemons

Daemons are not running in the docker environment, but all daemons support single-execution mode with the --run-once argument. Reset the system first with:

tools/run_tests.sh -ir

Some files are created. Let's add them to a new dataset:

rucio add-dataset test:mynewdataset
rucio attach test:mynewdataset test:file1 test:file2 test:file3 test:file4

If you run the command below, the files are not in the RSE XRD3, but only in XRD1 and 2.:

$ **rucio list-file-replicas test:mynewdataset**
+-------+-------+-----------+----------+-----------------------------------+
| SCOPE | NAME | FILESIZE | ADLER32 | RSE: REPLICA |
|-------|-------|-----------|----------|-----------------------------------|
| test | file1 | 10.486 MB | 141a641e | XRD1: root://xrd1:1094//rucio/... |
| test | file2 | 10.486 MB | fdfa7eea | XRD1: root://xrd1:1094//rucio/... |
| test | file3 | 10.486 MB | c669167d | XRD2: root://xrd2:1095//rucio/... |
| test | file4 | 10.486 MB | 65786e49 | XRD2: root://xrd2:1095//rucio/... |
+-------+-------+-----------+----------+-----------------------------------+

So let's add a new rule on our new dataset to oblige Rucio to create replicas also on XRD3:

rucio add-rule test:mynewdataset 1 XRD3**
1aadd685d891400dba050ad43e71fea9**

Now we can check the status of the rule. We will see there are 4 files in Replicating state:

rucio rule-info 1aadd685d891400dba050ad43e71fea9|grep Locks**
Locks OK/REPLICATING/STUCK: 0/4/0**

Now we can run the daemons. First the rule evaluation daemon (judge-evaluator) will pick up our rule. Then the transfer submitter daemon (conveyor-submitter) will send the newly created transfers requests to the FTS server. After that, the transfer state check daemon (conveyor-poller) will retrieve from FTS the transfer state information. Finally, the transfer sign-off daemon (conveyor-finisher) updates the internal state of the Rucio catalogue to reflect the changes.:

rucio-judge-evaluator --run-once**
rucio-conveyor-submitter --run-once**
rucio-conveyor-poller --run-once**
rucio-conveyor-finisher --run-once**

If we see the state of the rule now, we see the locks are OK:

rucio rule-info 1aadd685d891400dba050ad43e71fea9|grep Locks
Locks OK/REPLICATING/STUCK: 4/0/0**

And if we look at the replicas of the dataset, we see the there are replicas of the files also in XRD3:

$ rucio list-file-replicas test:mynewdataset
+-------+-------+-----------+----------+-----------------------------------+
| SCOPE | NAME | FILESIZE | ADLER32 | RSE: REPLICA |
|-------|-------|-----------|----------|-----------------------------------|
| test | file1 | 10.486 MB | 141a641e | XRD3: root://xrd3:1096//rucio/... |
| test | file1 | 10.486 MB | 141a641e | XRD1: root://xrd1:1094//rucio/... |
| test | file2 | 10.486 MB | fdfa7eea | XRD3: root://xrd3:1096//rucio/... |
| test | file2 | 10.486 MB | fdfa7eea | XRD1: root://xrd1:1094//rucio/... |
| test | file3 | 10.486 MB | c669167d | XRD2: root://xrd2:1095//rucio/... |
| test | file3 | 10.486 MB | c669167d | XRD3: root://xrd3:1096//rucio/... |
| test | file4 | 10.486 MB | 65786e49 | XRD2: root://xrd2:1095//rucio/... |
| test | file4 | 10.486 MB | 65786e49 | XRD3: root://xrd3:1096//rucio/... |
+-------+-------+-----------+----------+-----------------------------------+