Medical Bioinformatics and e-Bioscience

Data Management on the Grid

Grid jobs need to access files located remotely. These files can be sent from the workstation to the job and back every time using a mechanism called "Sandbox", which unfortunately only works for small files. It can be also inconvenient to send the files back and forth all the time if the network connection for the workstation is not very fast. Normally the files needed for a job are stored on "grid storage resources".

At the e-BioInfra we encourage users to access data exclusively through the Logical File Catalog (LFC), which provides a virtual interface to other distributed storage systems. We advise to access files only via the LFC using the VBrowser. In this manner the files receive intuitive names (/grid/vlemed/mydirectory/myfile), they can be stored in several grid sites ("replicated") and still be manipulated easily from a graphical interface. Sometimes, however, it is necessary to perform actions that are not well supported (yet) by the VBrowser. For example, carefully setting of access privileges can be more efficient with command-line utilities. Also, files can become corrupted after long periods of time because of unplanned or planned changes in the storage servers (e.g., change of domain name, disk crash, software upgrade, or bad luck). These sanitary actions normally need to be performed using command-line utilities installed in gLite grid interface systems (grid ui).

More links:

How it works

The story is a bit more complex than this, but in a nutshell this is what matters.

The LFC is actually a database that stores links from Logical File Names ( LFN) to Grid Unique Identifiers ( GUID), which correspond to files on the grid. Note that directories exist only as LFNs. Many LFNs can point to the same GUID, similarly to "aliases" or "links" in regular operating systems (see figure 1).

lfn-guid-surl Figure 1 - Mapping between Logical File Name (LFN), Grid Unique Identifier (GUID) and Physical Unique Resource Locator (coined Site URL or SURL).

Physical files are stored on some data server of the grid. These are called Storage Element (SE, when only one disk is associated) or Storage Resource Manager (SRM, when more disks are available at the site) - see figure 2. The link to a physical file is called "SURL" (Site Unique Resource Location). Various physical copies of a file can be associated with a single GUID, each of them is called a "replica". The LFC manages all associations between LFN, GUID, SURL and replicas. It is important to select the location of the physical files among the sites that support the VO, for example, with sufficient space, security agreements, and maintenance.

In the VBrowser replica information is seen by right-clicking on the file, selecting the submenu lfc and then selecting the option View Replica's. It is also possible to indicate the locations of choice to write new files on the grid: set replicaCreationMode to "Preferred", indicate SEs or SRMs on listPreferredSE ). See more information on VBrowser.

Figure 2 - Grid file access using the LFC from a VBrowser or gLite command-line interface. The physical files are stored in Sites a, b, c, and the LFC is located in Site X. The translation between names of logical and physical files is performed by the LFC, so the users do not need to know the SURLs.

Security

As in regular operating systems, it is possible to define file permissions (sometimes called "access control lists")

  • only the owner can read/write or file/directory
  • only members of the same VO can read/write the file/directory
  • Note: although it is possible to have subgroups in a VO, we skip this here because at the moment we do not have any groups at the vlemed VO.
Controlling access to data stored on the grid is not trivial because there are several systems and protocols involved (see figure 2). Access control needs to be defined consistently in all of them to guarantee desired security level. For example, it is in principle to access the same file in various ways, using one or more LFNs or one or more SURLs. Moreover, it is possible to obtain the SURL for a given LFN and directly access the physical file (see figure 3). Although this is not recommended because it can lead to inconsistencies (dead links on the LFC), it is a valid operation. Note that if the file permissions are different at the LFC and the SE, the access control will be inconsistent.

Figure 3 - Physical files can be directly manipulated with SURLs using lcg-* command-line utilities.

To restrict the access it is necessary to make sure that all the involved systems support the desired access control mode and that file permissions are correctly set in all of them. This may vary according for each site, because it depends on the middleware adopted and the management policy.

Two valuable sources of information are:

  • Wiki @ SARA.nl: Detailed information about how to protect files at the LFC level.
  • Wiki @ NIKHEF: detailed information about all levels of access control, including testing of the various funcionalities.
IMPORTANT: It is necessary to set permissions at the LFC and storage elements levels explicitly! This means that only setting up the ACL at the LFC is not enough to guarantee security. The underlying storage elements should also be configured.

Below a summary of the various mechanisms for controlling grid file permissions, where DPM, dCache and StoRM are different programs that implement SRM protocol.

Type granularity access control command futher info sites
LFC only for directories user, VO lfc-* link
DPM file/dir user, VO, group levels dpfn-{set,get}acl link nikhef, lsgrid?
dCache file/dir? only VO srm-{get,set}-permissions link sara
StoRM none none - link groningen

Virtual Organisations and Groups

Users on the Grid are organized in so called Virtual Organisations. This organisational model also translates to data management. Therefore, as explained above, files can be stored and protected to single users or to VO's.

A feature available to VO Administrators is to create additional (sub)groups or roles. When these attributes are given to VO memebers by the VO Admin, an additional level of permission organisation can be achieved. If there are situations where you want to share files with just a subset of people in a VO, you can request your VO admin to create a group for you and the others, and then permission can be given to only those people for accessing the data in question.

If a group is created, and you are a member of it, you need to enable your proxy to carry this permission. The following command creates a proxy for a subgroup "secret" in the vlemed VO:

$ voms-proxy-init --voms vlemed:/vlemed/secret

Now all new files on SRM and LFC will be created with the group set to "vlemed/secret".

If you want to change the permissions on existing files or directories there are various tools you can use.

The following command gives the permissions for a directory on the LFC:

$ lfc-getacl /grid/vlemed/mark
# file: /grid/vlemed/mark
# owner: /O=dutchgrid/O=users/O=uva/OU=wins/CN=Mark Alexander Santcroos
# group: vlemed
user::rwx
group::rwx        #effective:rwx
other::r-x
default:user::rwx
default:group::rwx
default:other::r-x

Note the "default" entries. These are here because the permissions are for a directory, and the default permissions are applied to new files created in this directory.

The following command gets the permissions of a file on the LFC:

lfc-getacl /grid/vlemed/mark/newfile1.txt
# file: /grid/vlemed/mark/newfile1.txt
# owner: /O=dutchgrid/O=users/O=uva/OU=wins/CN=Mark Alexander Santcroos
# group: vlemed
user::rw-
group::rw-        #effective:rw-
other::r--

To change the permissions on the LFC, you can use the following command (in this case to remove read access for "others"):

$ lfc-setacl -m o:: /grid/vlemed/mark/newfile1.txt

This entry is on the LFC is only the catalog entry, and doesn't actually protect the file on the storage element. To get the the URL(s) for the actual storage files use the following command:

$ lcg-lr lfn:/grid/vlemed/mark/newfile1.txt
srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/vlemed/vletgenerated/2010-03-02/newfile1_txt_7d7a6523-6d14-4387-afe6-a45d83fcbee1
srm://tbn18.nikhef.nl/dpm/nikhef.nl/home/vlemed/vletgenerated/2010-03-02/newfile1_txt_cd4032d4-003d-4313-92fe-4bb59c46daa7

Now we know that the file is stored on a DPM storage element called: "tbn18.nikhef.nl". To act on the permissions here, we need the command "dpns-getacl" and "dpns-setacl".

$ export DPNS_HOST=tbn18.nikhef.nl
$ dpns-getacl /dpm/nikhef.nl/home/vlemed/vletgenerated/2010-03-02/newfile1_txt_cd4032d4-003d-4313-92fe-4bb59c46daa7

What can go wrong

When you list the file with the VBrowser, only the information on the LFC database is used and the content of the file is not accessed. Only when you open the file for example with the text editor, the SURL is retrieved and a file access is performed. It is therefore possible that a file looks ok for superficial browsing, but somehow the actual content is temporary or permanently unvailable.

This can happen because:

  • an SE is removed from the grid
  • an SE changes name (normally means that the old data is lost)
  • an LFN no longer has a SURL
  • a file on the SE is corrupted/lost
You'll notice that these files cannot be eaily deleted with the VBrowser, so you'll need to login on some glite UI system and fix it by hand.

Below you will find scripts that can help you fix things more easily. See also SARA's wikipage on data management.

IMPORTANT: This remains a tricky thing to do, so please contact us in case of any doubt.

Useful Scripts

These scripts write many messages on the stderr (>2). The most useful (and readable) output is written to the stdout (>). If you want to separate the two types of messages invoke like this:

script.sh parameters >2 file_with_errors > file_with_output

Checking if all files in a directory are ok

checkAndCleanLFCDir-rec.sh recursively searches files in all subdirectories,retrieves the SURLs for all replicas and tests if they are still available. Optionally the program can force a delete of all the files with corrupted SURLs (argument clean). Note: this script calls the other scripts: checkLFCFile.sh, forceDeleteLFCFile.sh, listFilesInLFCDir.sh

Download

Run:

checkAndCleanLFCDir-rec.sh <directory> [clean]

Examples:

checkAndCleanLFCDir-rec.sh /grid/vlemed/silvia

(will test all files)

checkAndCleanLFCDir-rec.sh /grid/vlemed/silvia clean

(will test all files and delete the files with a corrupted SURL)

Checking if given files are ok

checkLFCFile.sh retrieves the SURLs for all replicas of a given LFN and tests if they are still available. The LFNs are read from the stdin, and the messages are written to the stderr.

On the std out the program writes one line for each tested file containing fiels separated by space:

  • OK or ERROR
  • LFN
  • SURL
Download

Run:

checkLFCFile.sh <op>
op =2: use lcg-ls to test (only tries to list file)
op =3: use lcg-cp to test (tries to download the file)


Examples:

echo "/grid/vlemed/silvia/myfile" | checkLFCFile.sh 2 > list
(will check one file, write messages on the terminal and output the status + SURL on the file "list")
checkLFCFile.sh 2 > errors
(will read file names from terminal, write messages on the file "error" and output the status + SURL on the terminal)

Deleting given corrupted files from the LFC

forceDeleteLFCFile .sh removes a given file from the LFC even if one or more of the replicas are corrupted. The LFNs are read from the stdin, and the messages are written to the stderr. Download

Run:

forceDeleteLFCFile.sh

Examples:

echo "/grid/vlemed/silvia/myfile" | forceDeleteLFCFile.sh
(will delete one file)

Listing complete LFN paths for files in a given directory

listFilesInLFCDir .sh lists all files in a given directory (read from stdin). Writes complete file paths to the stdout and messages to the stderr. Not recursive for subdirectories.

Download

Run:

listFilesInLFCDir.sh

Examples:

echo "/grid/vlemed/silvia/" | listFilesInLFCDir.sh > list
(will write messages to the stderr and write a list of all files in this directory in the file "list")

echo "/grid/vlemed/silvia/" | listFilesInLFCDir.sh | checkLFCFile \
| grep ERROR | cut -d" " -f2 | forceDeleteLFCFile.sh
(will delete all files that do not have valid SURLs)

Download

File: Date:
checkAndCleanLFCDir-rec (includes all scripts) 2009-06-10
checkLFCFile.sh 2009-06-10
listFilesInLFCDir.sh 2009-06-10
forceDeleteLFCFile.sh 2009-06-10

For simplicity, we will always publish here only the latest version. Please contact us for feedback, describing in detail the problem / question / suggestion / feature request / any-other-thing.

Contact Us!

Things we need to look at

  • glite secure storage service
  • is there a umask-like mechanism?
  • is the protection at the directory level inherited by subdirs and files?
  • what happens when the middleware is upgraded? can we know for sure that the ACL's will remain valid?
Topic revision: r23 - 2011-10-07 - ShayanShahand
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback