Skip to content

Uploadclient

Classes

UploadClient

UploadClient(_client=None, logger=None, tracing=True)

Initialize the UploadClient with the necessary configuration to manage file uploads.

This method is used to create a new UploadClient instance that can upload files. It allows the use of an existing Rucio Client, a custom logger, and tracing for debug information during the upload process.

PARAMETER DESCRIPTION
_client

An existing Rucio Client instance to reuse. If not provided, a new one is created.

TYPE: Optional[Client] DEFAULT: None

logger

A logger function. If not provided, the default Python logger is used.

TYPE: Optional[LoggerFunction] DEFAULT: None

tracing

Indicates whether to enable tracing to capture upload activity details.

TYPE: bool DEFAULT: True

RAISES DESCRIPTION
InputValidationError

If the client account is not found or is invalid, preventing upload setup.

Functions

upload
upload(
    items,
    summary_file_path=None,
    traces_copy_out=None,
    ignore_availability=False,
    activity=None,
)

Uploads one or more files to an RSE (Rucio Storage Element) and optionally registers them.

An overview of this method's performed actions: 1. Collects and validates file info from the passed items (directories may be also included), ensuring valid paths exist on the local filesystem. If an RSE expression is provided, a single RSE is picked at random from it.

  1. Checks the RSE's availability for writing (unless ignore_availability is True).

  2. Optionally registers each file in the Rucio Catalog, handling the DID creation, dataset creation/attachment, and replication rules as needed.

  3. Uploads the files using the underlying protocol handlers and verifies checksums if desired/possible. Partial or failed uploads raise exceptions.

  4. (Optional) Produces a JSON summary file at summary_file_path, listing the final PFNs, checksums, and other info for all successfully uploaded files.

PARAMETER DESCRIPTION
items

A sequence of dictionaries, each describing a file to upload (or a directory to be scanned). For each item, the supported keys are:

  • path (PathTypeAlias, required): The local path to the file or directory. If this is a directory and recursive is True, the directory (and its subdirectories) are traversed.

  • rse (str, required): The target RSE or an RSE expression where the upload should be placed. If an expression is provided (e.g., "tier=1"), one RSE from that expression is chosen randomly.

  • did_scope (str, not required): The Rucio scope in which to register the file DID. Defaults to user.<account>.

  • did_name (str, not required): The logical filename in Rucio. Defaults to the local basename if not provided.

  • lifetime (int, not required): The lifetime (in seconds) to apply when creating a new replication rule. For file uploads without a dataset, a new rule with that lifetime is created if the file DID does not already exist in Rucio. For a new dataset, the dataset is created with a rule using this lifetime, but if the dataset already exists and you specify a lifetime, an error is raised.

    Note: lifetime is not automatically applied to nested containers or datasets in recursive mode.

  • impl (str, not required): Name of the protocol implementation to be used for uploading this item. For example, "rucio.rse.protocols.gfal.Default".

  • pfn (str, not required): Allows you to explicitly set the Physical File Name (PFN) for the upload, determining exactly where the file is placed on the storage. However, for deterministic RSEs, specifying a PFN causes the client to skip registering the file under the usual deterministic scheme. For non-deterministic RSEs, you can still force the file to be registered in the Rucio catalog after being uploaded, using no_register=False along with register_after_upload=True (or by manually handling the registration later).

  • force_scheme (str, not required): Enforces the use of a specific protocol scheme (e.g., davs, https) during file uploads. If the selected protocol is not compatible, the upload will stop and raise an error instead of falling back to any other scheme.

  • transfer_timeout (int, not required): A maximum duration (in seconds) to wait for each individual file transfer to complete. If the file transfer does not finish before this timeout elapses, the operation will be aborted and retried one last time. When transfer_timeout is None, no specific timeout is enforced, and the transfer may continue until it completes or fails for another reason.

  • guid (str, not required): If provided, Rucio will use this GUID. If not provided and the file is “pool.root” with no_register unset, Rucio tries to extract the GUID via pool_extractFileIdentifier, raising an error if that fails. Otherwise, a random GUID will be generated.

  • no_register (bool, not required, default=False): If set to True, the file is not registered in the Rucio Catalog, i.e., there is no DID creation, no replica entry, and no rules. This is appropriate if you plan to register the replica or create rules separately.

    Note: If recursive=True, the method still creates datasets and/or containers for the directories when needed.

  • register_after_upload (bool, not required, default=False): If set to True, the file is uploaded first, and only then is the DID created or updated in the Catalog. This can be useful when you want the actual data on storage before finalizing the registration. By default (False), the file is registered in Rucio before the physical upload if no_register is False.

  • recursive (bool, not required, default=False): If set to True, the method treats the specified path as a directory and (depending on the combination with other parameters) recursively traverses its subdirectories, mapping them into container/dataset hierarchies. Single top-level file paths are ignored, but individual files found in subdirectories are processed. Empty directories or non-existent paths also produce a warning. If False, then top-level file paths or the direct children-files of the given top-level directory are only processed (subdirectories are ignored, and no container structure is created).

  • dataset_scope / dataset_name (str, not required): To register uploaded files into a dataset DID, you need to specify both dataset_name and dataset_scope. With no_register=False, the client ensures {dataset_scope}:{dataset_name} exists (creating it with a replication rule if it doesn't), or simply attaching new files if it does. If the dataset already exists and you specify a new lifetime, or if a checksum mismatch is detected, registration fails. In non-recursive mode, only files in the top-level directory are attached to the dataset and subdirectories are skipped with a warning. In recursive mode, the client aims to create containers for directories containing only subdirectories and datasets for directories containing only files (raising an error if the top-level folder mixes files and directories). If the top-level directory has subdirectories, the user-supplied dataset_name is effectively ignored at that level (each subdirectory becomes its own dataset or container); if there are no subdirectories, the entire folder is registered as a single dataset.

  • dataset_meta (dict, not required): Additional metadata (e.g., {'project': 'myProject'}) to attach to the newly created dataset when: the dataset does not already exist, recursive=False, no_register=False and both dataset_scope and dataset_name are provided.

    Note: If multiple files share the same dataset_scope and dataset_name, then if a dataset is created, it considers only the first item’s dataset_meta.

TYPE: Iterable[FileToUploadDict]

summary_file_path

If specified, a JSON file is created with a summary of each successfully uploaded file, including checksum, PFN, scope, and name entries.

TYPE: Optional[Union[str, PathLike[str]]] DEFAULT: None

traces_copy_out

A list reference for collecting the trace dictionaries that Rucio generates while iterating over each file. A new trace dictionary is appended to this list for each file considered (even those ultimately skipped or already on the RSE).

TYPE: Optional[list[TraceBaseDict]] DEFAULT: None

ignore_availability

If set to True, the RSE's "write availability" is not enforced. By default, this is False, and an RSE marked as unavailable for writing will raise an error.

TYPE: bool DEFAULT: False

activity

If you are uploading files without a parent dataset, this string sets the “activity” on the replication rule that Rucio creates for each file (e.g., "Analysis"), which can affect RSE queue priorities.

Note: If your files are uploaded into a dataset, the dataset’s replication rule does not use this activity parameter.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
int

Status code (0 if all files were uploaded successfully).

RAISES DESCRIPTION
NoFilesUploaded

Raised if none of the requested files could be uploaded.

NotAllFilesUploaded

Raised if some files were successfully uploaded, but others failed.

RSEWriteBlocked

Raised if ignore_availability=False but the chosen RSE does not allow writing.

InputValidationError

Raised if mandatory fields are missing, if conflicting DIDs are found, or if no valid files remain after input parsing.

Examples:

Example

Upload a single local file to the CERN-PROD RSE and write a JSON summary to upload_summary.json:

from rucio.client.uploadclient import UploadClient
upload_client = UploadClient()
items = [
    {"path": "/data/file1.txt",
     "rse": "CERN-PROD",            # target RSE
     "did_scope": "user.alice",     # optional; defaults to user.<account>
     "did_name": "file1.txt"}       # optional; defaults to basename
]
upload_client.upload(items, summary_file_path="upload_summary.json")

Recursively upload every file found under /data/dataset into a new dataset user.alice:mydataset on a random RSE that matches the expression tier=1; collect per-file trace dictionaries for later inspection:

traces: list[TraceBaseDict] = []
dir_item = {
    "path": "/data/dataset",
    "rse": "tier=1",                # RSE expression; one will be chosen
    "recursive": True,
    "dataset_scope": "user.alice",
    "dataset_name": "mydataset",
    "dataset_meta": {"project": "demo"},
}
upload_client.upload([dir_item], traces_copy_out=traces)
preferred_impl
preferred_impl(rse_settings, domain)

Select a suitable protocol implementation for read, write, and delete operations on the given RSE and domain.

This method checks the local client configuration (under the [upload] preferred_impl setting) and compares it against the list of protocols declared in rse_settings. It attempts to find a protocol that supports the required I/O operations (read, write, delete) in the specified domain. If multiple preferred protocols are listed in the config, it iterates in order and returns the first viable match.

PARAMETER DESCRIPTION
rse_settings

A dictionary describing RSE details, including available protocols and their domains.

TYPE: RSESettingsDict

domain

The network domain (e.g., 'lan' or 'wan') in which the protocol must support all operations.

TYPE: str

RETURNS DESCRIPTION
Optional[str]

The name of a protocol implementation that can handle read/write/delete for the specified domain, or None if no suitable protocol was found.

Functions