DBFS

Oracle customers purchasing Exadata are not permitted to install an HBA in their new hardware in order to connect to the legacy SAN. As ASM storage is provided internally, this limitation does not affect most production systems as external data can still be accessed via the Ethernet network. However, using a network protocol such as NFS may not be the preferred option during initial migration and data warehouse ETL processes. In these cases Oracle recommends staging data in a Database File System (DBFS). This post examines some basic concepts behind this new feature. The installation was tested on Oracle 11.2.0.1.0 running on 32-bit Enterprise Linux 5.

DBFS was introduced in Oracle 11.2. It enables the database to be used as a POSIX-compatible file system. It is based on the SecureFile technology which was introduced in Oracle 11.1 to support LOB functionality. Consequently DBFS is documented in the Oracle Database SecureFiles and Large Objects Developer's Guide. At the time of writing DBFS only appears to be supported on Linux (both 32-bit and 64-bit). As a Solaris option is now available with Exadata, presumably DBFS will be extended to support this operating system at some point in the future.

Prerequisites

DBFS is based on the FUSE library infrastructure. FUSE (Filesystem in USErspace) is an open-source project that supports the implementation of a fully functional file-system in a user-space program. More information can be found about the FUSE library on its project page (http://fuse.sourceforge.net). FUSE is delivered as a set of source files which must be compiled and linked on the target libraries. FUSE runs as a dynamic kernel module; it is not necessary to recompile the kernel during installation.

Prior to compiling the FUSE library, the Linux kernel development RPM must be installed. This package and its dependencies may already be installed. To verify use:

$ rpm -q kernel-devel
kernel-devel-2.6.18-92.el5

If this package does not exist it can be installed from the distribution media or using yum.

FUSE can be downloaded from http://fuse.sourceforge.net and delivered as a zipped archive called fuse-2.7.3.tar.gz. I copied this to a new directory owned by the root user called /root/fuse. Make sure you download version 2.7.3; subsequent versions will not work with the Oracle client.

As the root user unzip the archive using:

[root@server1]# tar -xzvf fuse-2.7.3.tar.gz

This will create a subdirectory, in this case called /root/fuse/fuse-2.7.3.

Before building the FUSE library determine the kernel directory for the current kernel. On Red Hat and Oracle Enterprise Linux this directory will have the format:

/usr/src/kernels/`uname -r`-`uname -p`

For example:

/usr/src/kernels/2.6.18-92.el5-i686

Change to the FUSE source directory and run the configure script specifying the kernel directory for the with-kernel parameter:

[root@server1]# cd /root/fuse/fuse-2.7.3
[root@server1]# ./configure --prefix=/usr --with-kernel=/usr/src/kernels/2.6.18-92.el5-i686

Build and install the library:

[root@server1]# make
[root@server1]# make install

The documentation recommends performing the following steps to complete the installation. I am not entirely convinced that any of these steps are necessary when the previous steps have been performed by the root user. However they all appear to be harmless.

[root@server1]# /sbin/depmod
[root@server1]# /sbin/modprobe fuse
[root@server1]# chmod 666 /dev/fuse
[root@server1]# echo "/sbin/modprobe fuse" >> /etc/rc.modules

Creating a DBFS file system

It is necessary to create a tablespace for the DBFS file system. For a simple test the new tablespace does not need to be very large. For example:

CREATE TABLESPACE dbfs1
DATAFILE '/u01/oradata/TEST/dbfs01.dbf' SIZE 100M;

I initially attempted to create a DBFS file system using the SYS (AS SYSDBA) user. However, the create script failed when SYS attempted to GRANT privileges to itself so I switched to a non-default user called US01. In order to manage DBFS the user must be assigned the DBFS_ROLE. For example:

[oracle@server1]$ sqlplus / as sysdba
SQL> GRANT dbfs_role TO us01

To create a file system use the dbms_create_filesystem.sql script which is located in $ORACLE_HOME/rdbms/admin. This script calls dbms_create_filesystem_advanced.sql which is located in the same directory. In my version, SQL*Plus was unable to locate the latter script unless it was in the current working directory. Therefore I had to change directory to $ORACLE_HOME/rdbms/admin before executing SQL*Plus. For example:

[oracle@server1]$ cd $ORACLE_HOME/rdbms/admin
[oracle@server1]$ sqlplus us01/us01
SQL> @dbfs_create_filesystem.sql DBFS1 dbfs1

The dbfs_create_filesystem.sql script takes two parameters; the tablespace name (DBFS1) and the file system name (dbfs1). In retrospect I should have assigned different names to the tablespace and file system.

Mounting a DBFS file system

Now the new file system exists within the database. However, we cannot access it externally yet. A new utility called dbms_client which is shipped in $ORACLE_HOME/bin in Oracle 11.2.0.1 and above.allows the DBFS file system to be mounted and managed from the command line. The 11.2.0.1 version of this utility is quite basic and it is easy to imagine customer feedback will drive improvements in functionality in subsequent releases.

As the root user create a mount point. In this case the mount point will be called /mnt/dbfs1 and will be owned by the oracle user:

[root@server1]# mkdir /mnt/dbfs1
[root@server1]# chown oracle:dba /mnt/dbfs1

In Oracle 11.2.0.1, unlike almost all other tools and utilities in $ORACLE_HOME/bin the dbms_client utility does not have a wrapper script. On my server invoking dbfs_client failed with the following error:

[oracle@server1]$ cd $ORACLE_HOME/bin
[oracle@server1]$ ./dbfs_client
./dbfs_client: error while loading shared libraries: libclntsh.so.11.1: cannot open shared object file: No such file or directory

This error occurs because the LD_LIBRARY_PATH environment variable has not been set correctly. This is an oversight, but one that is likely to be increasingly common as this environment variable is normally set by the other wrapper scripts. To fix the error set LD_LIBRARY_PATH, preferably in .bash_profile. For example:

export LD_LIBRARY_PATH=$ORACLE_HOME/lib

The client should now start and display the following help message:

[oracle@server1]$ cd $ORACLE_HOME/bin
[oracle@server1]$ ./dbfs_client
usage: ./dbfs_client @ [options]
  db_user:              Name of Database user that owns DBFS content repository filesystem(s)
  db_server:            A valid connect string for Oracle database server
                        (for example, hrdb_host:1521/hrservice)
  mountpoint:           Path to mount Database File System(s)
                        All the file systems owned by the database user will be seen at the mountpoint.
DBFS options:
  -o direct_io          Bypass the Linux page cache. Gives much better performance for large files.
                        Programs in the file system cannot be executed with this option.
                        This option is recommended when DBFS is used as an ETL staging area.
  -o wallet             Run ./dbfs_client in background.
                        Wallet must be configured to get credentials.
  -o failover           ./dbfs_client fails over to surviving database instance with no data loss.
                        Some performance cost on writes, especially for small files.
  -o allow_root         Allows root access to the filesystem.
                        This option requires setting 'user_allow_other' parameter in '/etc/fuse.conf'.
  -o allow_other        Allows other users access to the file system.
                        This option requires setting 'user_allow_other' parameter in '/etc/fuse.conf'.
  -o rw                 Mount the filesystem read-write. [Default]
  -o ro                 Mount the filesystem read-only. Files cannot be modified.
  -o trace_file=STR     Tracing  | 'syslog'
  -o trace_level=N      Trace Level: 1->DEBUG, 2->INFO, 3->WARNING, 4->ERROR, 5->CRITICAL [Default: 4]
  -h                    help
  -V                    version

Note that in Oracle 11.2.0.1 the above usage message omits the --command option.

This version of the dbfs_client utility does not support the bequeath protocol, so on the server it is not possible to identify the target instance using the $ORACLE_SID environment variable. It is therefore necessary to ensure that a TNS address is specified in $TNS_ADMIN/tnsnames.ora.

TEST=
(DESCRIPTION=
  (ADDRESS=
    (PROTOCOL=TCP)(HOST=vm7)(PORT=1521)
  )
  (CONNECT_DATA=(SERVICE_NAME=TEST))
)

It is also possible to specify a database service and DBFS does support instance failover though I have not yet tested it in a RAC environment.

To mount a DBFS file system use the following syntax:

dbfs_client @ [options]

For example:

[oracle@server1]$ cd $ORACLE_HOME/bin
[oracle@server1]$ ./dbfs_client us01@TEST /mnt/dbfs1

A password will be requested. In this case it is the Oracle database password for the US01 user.

By default dbfs_client will block until the file system is un-mounted.

In this version it is possible to workaround the blocking of dbfs_client using the nohup command to run the utility in the background. For example:

[oracle@server1]$ cd $ORACLE_HOME/bin
[oracle@server1]$ nohup ./dbfs_client us01@TEST /mnt/dbfs1 &

In another shell session verify that the file system has been mounted using the mount command. For example:

[oracle@server1]$ mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sdb1 on /u01/oradata type ext3 (rw)
/dev/sdc1 on /home/oracle/stage type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
dbfs on /mnt/dbfs1 type fuse (rw,nosuid,nodev,max_read=1048576,default_permissions,user=oracle)

At this point it is possible to navigate into the new file system from the Linux shell using commands such as pwd, cd, mkdir, cp, rm etc.

For example:

[oracle@server1]$ cd /mnt/dbfs1/dbfs1
[oracle@server1]$ pwd
/mnt/dbfs1/dbfs1
[oracle@server1]$ mkdir dir1
[oracle@server1]$ cd dir1
[oracle@server1]$ pwd
/mnt/dbfs1/dbfs1/dir1
[oracle@server1]$ cp /tmp/gp/*.csv /mnt/dbfs1/dbfs1/dir1
[oracle@server1]$ ls -l
total 42
-rw-r--r-- 1 oracle oinstall  1129 Jul  7 19:33 circuit.csv
-rw-r--r-- 1 oracle oinstall   161 Jul  7 12:02 classification.csv
-rw-r--r-- 1 oracle oinstall   535 Jul  7 12:02 country.csv
-rw-r--r-- 1 oracle oinstall 12762 Jul  7 12:02 driver.csv
-rw-r--r-- 1 oracle oinstall   708 Jul  7 12:02 engine.csv
-rw-r--r-- 1 oracle oinstall   506 Jul  7 12:02 grandprix.csv
-rw-r--r-- 1 oracle oinstall 22385 Jul  7 12:02 race.csv
-rw-r--r-- 1 oracle oinstall   550 Jul  7 12:02 season.csv
-rw-r--r-- 1 oracle oinstall  1487 Jul  7 12:02 team.csv

It is possible to copy files in and out of the DBFS using standard operating system commands.

Unmounting a DBFS filesystem

To unmount the DBFS file system use the fusemount command. For example:

[oracle@server1]$ fusermount -u /mnt/dbfs1

Creating an Oracle Wallet

It is also possible to include the filesystem in /etc/fstab so that it can be mounted using the mount command. In order to ensure that the mount command is executed with the correct Oracle credentials, an Oracle wallet must be created. Note that use of Oracle Wallets may require licensing of the Oracle Advanced Security option. There are several steps to this process:

As the oracle user create a subdirectory for the wallet e.g.:

[oracle@server1]$ mkdir /home/oracle/wallet

As the oracle user create the wallet.

[oracle@server1]$ mkstore -wrl /home/oracle/wallet -create
Oracle Secret Store Tool : Version 11.2.0.1.0 - Production
Copyright (c) 2004, 2009, Oracle and/or its affiliates. All rights reserved.

Enter password:
Enter password again:

This creates the following files in /home/oracle/wallet:

[oracle@server1]$ ls -l /home/oracle/wallet
-rw------- 1 oracle oinstall 3589 Jul  7 11:11 cwallet.sso
-rw------- 1 oracle oinstall 3512 Jul  7 11:11 ewallet.p12

Edit $TNS_ADMIN/sqlnet.ora and add the following entries:

WALLET_LOCATION=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/home/oracle/wallet)))

SQLNET.WALLET_OVERRIDE=TRUE

Create the DBConnectString credential in the wallet specifying the Oracle username and password that should be associated with this credential. In this case both the username and password are "us01".

[oracle@server1]$ mkstore -wrl /home/oracle/wallet -createCredential DBConnectString us01 us01
Oracle Secret Store Tool : Version 11.2.0.1.0 - Production
Copyright (c) 2004, 2009, Oracle and/or its affiliates. All rights reserved.

Enter wallet password:  << Oracle11gR2 >> 

Create credential oracle.security.client.connect_string1

Following creation of this credential it should be possible to mount the new DBFS file system using the wallet. For example:

[oracle@server1]$ dbfs_client -o wallet /@DBConnectString /mnt/dbfs1

Note that the "/@" syntax specifies that the connection should obtain its credentials from the Oracle wallet.

In Oracle 11.2.0.1, the above command mounts the DBFS file system and returns to the command line without blocking.

Configuring the mount command

A few more steps are required to enable the mount command to use an entry in /etc/fstab.

The 11.2 documentation recommends that a symbolic link should be created between $ORACLE_HOME/bin/dbfs_client and /sbin/mount.dbfs. In my opinion the recommended syntax has been reversed. I believe that the following is correct:

[root@server1]# ln -s $ORACLE_HOME/bin/dbfs_client /sbin/mount.dbfs

However, I would not recommend using a symbolic link. Although I modified the bash profile for my root user to include $LD_LIBRARY_PATH environment variable, the mount command appear to unset this variable when it called the mount.dbfs helper. Therefore it may be more appropriate to create /sbin/mount.dbfs as a script containing commands similar to the following:

#!/bin/bash
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
nohup $ORACLE_HOME/bin/dbfs_client $* 2> /dev/null &

A new operating system group is required for fuse. For example:

[root@server1]# /usr/sbin/groupadd fuse

The user that will be mounting the DBFS file system should be added to the fuse group. I used the oracle user which on my server already belonged to primary group oinstall (specified in /etc/passwd) and secondary group dba (specified in /etc/group). I used the following command:

[root@server1]# usermod -G dba,fuse oracle

The above command adds the oracle user to both the dba and fuse groups. Take care when specifying this command - it requires a list of all secondary groups so in this example if only the fuse group is specified (usermod -G fuse oracle) then the oracle user will be removed from the dba group. The changes should be written to /etc/group.

Finally the following line should be added to /etc/fstab:

/sbin/mount.dbfs#/@DBConnectString /mnt/dbfs1 fuse rw,user,noauto 0 0

Note the use of the Oracle wallet credential.

If the above configuration has been completed successfully it should be possible to mount the new DBFS file system using the mount command. For example:

[root@server1]# mount /mnt/dbfs1

The command will mount the file system and then block until the file system is unmounted. Again the nohup command can be used to ensure the mount command runs as a background process:

[root@server1]# nohup mount /mnt/dbfs1 &

I remain unconvinced that enabling use of the mount command for the DBFS file system justifies the amount of configuration required. It is not possible to configure DBFS to be auto-mounted during a system boot as this is not supported by the current version of FUSE. There is also a timing issue - the database must already be started before the DBFS file system can be mounted. In view of potential uses for DBFS (migration and data warehouse loading) it is probably sufficient to mount the DBFS file systems manually when required.

Executing Commands

In addition to the shell interface it is also possible to execute commands using dbfs_client. Whilst this is unnecessary when connected directly to the database server, it could be useful when the dbfs_client utility is being executed on another client.

Unfortunately in Oracle 11.2.0.1 dbfs_client does not appear to be sufficiently mature to accept options and to execute commands in the same invocation. This means that it is not possible to execute an operating system whilst using a wallet. Consequently it is necessary to manually enter a password for each execution of dbfs_client which restricts scripting options.

The following are examples of dbfs_client operating system commands:

$ dbfs_client us01@TEST --command mkdir dbfs:/dbfs1/dir1
$ dbfs_client us01@TEST --command cp /home/oracle/gp dbfs:/dbfs1/dir1
$ dbfs_client us01@TEST --command ls -l dbfs:/dbfs1/dir1
-rw-r--r--            oracle        oinstall          725859    Jul 07 19:33    dbfs:/dbfs1/dir1/car.csv
-rw-r--r--            oracle        oinstall            1129    Jul 07 19:33    dbfs:/dbfs1/dir1/circuit.csv
-rw-r--r--            oracle        oinstall             708    Jul 07 12:02    dbfs:/dbfs1/dir1/engine.csv
-rw-r--r--            oracle        oinstall             550    Jul 07 12:02    dbfs:/dbfs1/dir1/season.csv
-rw-r--r--            oracle        oinstall             161    Jul 07 12:02    dbfs:/dbfs1/dir1/classification.csv
-rw-r--r--            oracle        oinstall             506    Jul 07 12:02    dbfs:/dbfs1/dir1/grandprix.csv
-rw-r--r--            oracle        oinstall           22385    Jul 07 12:02    dbfs:/dbfs1/dir1/race.csv
-rw-r--r--            oracle        oinstall           12762    Jul 07 12:02    dbfs:/dbfs1/dir1/driver.csv
-rw-r--r--            oracle        oinstall             535    Jul 07 12:02    dbfs:/dbfs1/dir1/country.csv
-rw-r--r--            oracle        oinstall            1487    Jul 07 12:02    dbfs:/dbfs1/dir1/team.csv
$ dbfs_client us01@TEST --command rm dbfs:/dbfs1/dir1/car.csv

When copying large files to the DBFS file system, -o directio may be an attractive dbfs_client option as it bypasses the operating system page cache.

Conclusion

DBFS is relatively easy to set up and manage. In Oracle 11.2.0.1 there are still plenty of niggling design and usability issues. However I am confident that these issues will be rapidly resolved as DBFS appears to be a fundamental component of the Exadata strategy and should therefore experience accelerated development for a few releases. It is important to understand the differences between DBFS and ACFS. Note that in Oracle 11.2.0.2 a separate licence is required for ACFS on some platforms.