db_intro
NAME
db - the DB library overview and introduction
DESCRIPTION
The DB library is a family of groups of functions that
provides a modular programming interface to transactions
and record-oriented file access. The library includes
support for transactions, locking, logging and file page
caching, as well as various indexed access methods. Many
of the functional groups (e.g., the file page caching
functions) are useful independent of the other DB func-
tions, although some functional groups are explicitly
based on other functional groups (e.g., transactions and
logging). For a general description of the DB package,
see db_intro(3).
The DB library does not provide user interfaces, data
entry GUI's, SQL support or any of the other standard
user-level database interfaces. What it does provide are
the programmatic building blocks that allow you to easily
embed database-style functionality and support into other
objects or interfaces.
ARCHITECTURE
The DB library supports two different models of applica-
tions: client-server and embedded.
In the client-server model, a database server is created
by writing an application that accepts requests via some
form of IPC and issues calls to the DB functions based on
those queries. In this model, applications are client
programs that attach to the server and issue queries. The
client-server model trades performance for protection, as
it does not require that the applications share a protec-
tion domain with the server, but IPC/RPC is generally
slower than a function call. In addition, this model sim-
plifies the creation of network client-server applica-
tions.
In the embedded model, an application links the DB library
directly into its address space. This provides for faster
access to database functionality, but means that the
applications sharing log files, lock manager, transaction
manager or memory pool manager have the ability to read,
write, and corrupt each other's data.
It is the application designer's responsibility to select
the appropriate model for their application.
Applications require a single include file, <db.h>, which
must be installed in an appropriate location on the sys-
tem.
The DB library is made up of five major subsystems, as
follows:
Access methods
The access methods subsystem is made up of general-
purpose support for creating and accessing files for-
matted as B+tree's, hashed files, and fixed and vari-
able length records. These modules are useful in the
absence of transactions for processes that want fast,
formatted file support. See db_open(3) and db_cur-
sor(3).
Locking
The locking subsystem is a general-purpose lock man-
ager used by DB. This module is useful in the
absence of the rest of the DB package for processes
that want a fast, configurable lock manager. See
db_lock(3) for more information.
Logging
The logging subsystem is the logging support used to
support the DB transaction model. It is largely spe-
cific to the DB package, and unlikely to be used
elsewhere. See db_log(3) for more information.
Memory Pool
The memory pool subsystem is the general-purpose
shared memory buffer pool used by DB. This module is
useful outside of the DB package for processes that
want page-oriented, cached, shared file access. See
db_mpool(3) for more information.
Transactions
The transaction subsystem implements the DB transac-
tion model. It is largely specific to the DB pack-
age. See db_txn(3) for more information.
There are several stand-alone utilities that support the
DB environment. They are as follows:
db_archive
The db_archive utility supports database backup,
archival and log file administration. See
db_archive(1) for more information.
db_recover
The db_recover utility runs after an unexpected DB or
system failure to restore the database to a consis-
tent state. See db_recover(1) for more information.
db_checkpoint
The db_checkpoint utility runs as a daemon process,
monitoring the database log and periodically issuing
checkpoints. See db_checkpoint(1) for more informa-
tion.
db_deadlock
The db_deadlock utility runs as a daemon process,
periodically traversing the database lock structures
and aborting transactions when it detects a deadlock.
See db_deadlock(1) for more information.
db_dump
The db_dump utility writes a copy of the database to
a flat-text file in a portable format. See
db_dump(1) for more information.
db_load
The db_load utility reads the flat-text file produced
by db_dump, and loads it into a database file. See
db_load(1) for more information.
db_stat
The db_stat utility displays statistics for databases
and database environments. See db_stat(1) for more
information.
NAMING AND THE DB ENVIRONMENT
The DB application environment is described by the
db_appinit(3) manual page. The db_appinit function is
used to create a consistent naming scheme for all of the
subsystems sharing a DB environment. If db_appinit is not
called by a DB application, naming is performed as speci-
fied by the manual page for the specific subsystem.
DB applications that run with additional privilege should
always call the db_appinit function to initialize DB nam-
ing for their application. This ensures that the environ-
ment variables DB_HOME and TMPDIR will only be used if the
application explicitly specifies that they are safe.
ADMINISTERING THE DB ENVIRONMENT
A DB environment consists of a database home directory and
all the long-running daemons necessary to ensure continued
functioning of DB and its applications. In the presence
of transactions, the checkpoint daemon, db_checkpoint,
must be run as long as there are applications present (see
db_checkpoint(1) for details). When locking is being
used, the deadlock detection daemon, db_deadlock, must be
run as long as there are applications present (see
db_deadlock(1) for details). The db_archive utility pro-
vides information to facilitate log reclamation and cre-
ation of database snapshots (see db_archive(1) for
details. After application or system failure, the
db_recover utility must be run before any applications are
restarted to return the database to a consistent state
(see db_recover(1) for details).
The simplest way to administer a DB application environ-
ment is to create a single ``home'' directory which houses
all the files for the applications that are sharing the DB
environment. In this model, the shared memory regions
(i.e., the locking, logging, memory pool, and transaction
regions) and log files will be stored in the specified
directory hierarchy. In addition, all data files speci-
fied using relative pathnames will be named relative to
this home directory. When recovery needs to be run (e.g.,
after system or application failure), this directory is
specified as the home directory to db_recover(1), and the
system is restored to a consistent state, ready for the
applications to be restarted.
In situations where further customization is desired, such
as placing the log files on a separate device, it is rec-
ommended that the application installation process create
a configuration file named ``DB_CONFIG'' in the database
home directory, specifying the customization. See
db_appinit(3) for details on this procedure.
The DB architecture does not support placing the shared
memory regions on remote filesystems, e.g., the Network
File System (NFS) and the Andrew File System (AFS). For
this reason, the database home directory must reside on a
local filesystem. Databases, log files and temporary
files may be placed on remote filesystems, although the
application may incur a performance penalty for so doing.
It is important to realize that all applications sharing a
single home directory implicitly trust each other. They
have access to each other's data as it resides in the
shared memory buffer pool and will share resources such as
buffer space and locks. At the same time, any applica-
tions that access the same files must share an environment
if consistency is to be maintained across the different
applications.
SIGNALS
When applications using DB receive signals, it is impor-
tant that they exit gracefully, discarding any DB locks
that they may hold. This is normally done by setting a
flag when a signal arrives, and then checking for that
flag periodically within the application. Specifically,
the signal handler should not attempt to release locks
and/or close the database handles itself. This is not
guaranteed to work correctly and the results are unde-
fined.
If an application exits holding a lock, the situation is
no different than if the application crashed, and all
applications participating in the database environment
must be shutdown, and then recovery must be performed. If
this is not done, the locks that the application held can
cause unresolvable deadlocks inside the database, and
applications may then hang.
MULTI-THREADING
The DB library is not itself multi-threaded. The library
was deliberately architected to not use threads internally
because of the portability problems that using threads
within the library would introduce.
DB supports multi-threaded applications with the caveat
that it loads and calls functions that are commonly avail-
able in C language environments and which may not them-
selves be thread-safe. Other than this usage, DB has no
static data and maintains no local context between calls
to DB functions. To ensure that applications can safely
use threads in the context of DB, porters to new operating
systems and/or C libraries must confirm that the system
and C library functions used by the DB library are thread-
safe.
Object handles returned from DB library functions are
free-threaded, i.e., threads may use handles concurrently,
by specifying the DB_THREAD flag to db_appinit(3) and the
other subsystem open functions.
There are a few additional caveats concerning using
threads to access the DB library:
1 Spinlocks must have been implemented for the com-
piler/architecture combination. Attempting to
specify the DB_THREAD flag will fail if spinlocks
are not available.
2 The DB_THREAD flag must be specified for all sub-
systems either explicitly or via the db_appinit
function. Setting the DB_THREAD flag inconsis-
tently may result in database corruption.
3 Only a single thread may call the close function
for a returned database or subsystem handle. See
db_open(3) and the appropriate subsystem manual
pages for more information.
4 Either the DB_DBT_MALLOC or DB_DBT_USERMEM flags
must be set in a DBT used for key or data
retrieval. See db_dbt(3) for more information.
5 The DB_CURRENT, DB_NEXT and DB_PREV flags to the
log_get function may not be used by a free-threaded
handle. If such calls are necessary, a thread
should explicitly create a unique DB_LOG handle by
calling log_open(3). See db_log(3) for more infor-
mation.
6 Each database operation (i.e., any call to a func-
tion underlying the handles returned by db_open(3)
and db_cursor(3)) is normally performed on behalf
of a unique locker. If, within a single thread of
control, multiple calls on behalf of the same
locker are desired, then transactions must be used.
For example, consider the case where a cursor scan
locates a record, and then based on that record,
accesses some other item in the database. If these
are done using the default lockers for the handle,
there is no guarantee that these two operations
will not conflict. If the application wishes to
guarantee that the operations do not conflict,
locks must be obtained on behalf of a transaction,
instead of the default locker id, and a transaction
must be specified to the cursor creation and the
subsequent db call.
7 Transactions may not span threads, i.e., each
transaction must begin and end in the same thread,
and each transaction may only be used by a single
thread.
ERROR RETURNS
Except for the historic dbm and hsearch interfaces (see
db_dbm(3) and db_hsearch(3)), DB does not use the global
variable errno to return error values to the calling pro-
cess or thread. The return values for all DB functions
can be grouped into three categories:
0 A return value of 0 indicates that the operation was
successful.
>0 A return value that is greater than 0 indicates that
there was a system error. The errno value returned
by the system is returned by the function, e.g., when
a DB function is unable to allocate memory, the
return value from the function will be ENOMEM.
<0 A return value that is less than 0 indicates a condi-
tion that was not a system failure, but was not an
unqualified success, either. For example, a routine
to retrieve a key/data pair from the database may
return DB_NOTFOUND when the key/data pair does not
appear in the database, as opposed to the value of 0
which would be returned if the key/data pair were
found in the database. All such special values
returned by DB functions are less than 0 in order to
avoid conflict with possible values of errno.
There are two special return values that are somewhat sim-
ilar in meaning, are returned in similar situations, and
therefore might be confused: DB_NOTFOUND and DB_KEYEMPTY.
The DB_NOTFOUND error return indicates that the requested
key/data pair did not exist in the database or that start-
or end-of-file has been reached. The DB_KEYEMPTY error
return indicates that the requested key/data pair
logically exists but was never explicitly created by the
application (the recno access method will automatically
create key/data pairs under some circumstances, see
db_open(3) for more information), or that the requested
key/data pair was deleted and is currently in a deleted
state.
DATABASE AND PAGE SIZES
DB stores database file page numbers as unsigned 32-bit
numbers and database file page sizes as unsigned 16-bit
numbers. This results in a maximum database size of 2^48.
The minimum database page size is 512 bytes, resulting in
a minimum maximum database size of 2^41.
DB is potentially further limited if the host system does
not have filesystem support for files larger than 2^32,
including seeking to absolute offsets within such files.
The maximum btree depth is 255.
BYTE ORDERING
The database files created by DB can be created in either
little or big-endian formats. By default, the native for-
mat of the machine on which the database is created will
be used. Any format database can be used on a machine
with a different native format, although it is possible
that the application will incur a performance penalty for
the run-time conversion.
EXTENDING DB
DB includes tools to simplify the development of applica-
tion-specific logging and recovery. Specifically, given a
description of the information to be logged, these tools
will automatically create logging functions (functions
that take the values as parameters and construct a single
record that is written to the log), read functions (func-
tions that read a log record and unmarshall the values
into a structure that maps onto the values you chose to
log), a print function (for debugging), templates for the
recovery functions, and automatic dispatching to your
recovery functions.
EXAMPLES
There are several different ways that the DB library is
used:
1 Applications that want to use formatted files to
store data, and are unconcerned with concurrent
access and loss of data due to catastrophic fail-
ure. Generally, these applications create short-
lived databases that are discarded or recreated
when the system fails. Such applications will only
be concerned with the DB access methods. The DB
access methods will use the memory pool subsystem,
but the application is unlikely to be aware of
this. See the files examples/ex_access.c and exam-
ples/ex_btrec.c in the DB source distribution for C
language code examples of how such applications
might use the DB library.
2 Applications similar to #1, but that also wish to
use db_appinit(3) for environment initialization.
See the file examples/ex_appinit.c in the DB source
distribution for a C language code example of how
such an application might use the DB library.
3 Applications that wish to transaction protect
structures other than the DB access methods. See
the file examples/ex_trans.c in the DB source dis-
tribution for a C language code example of how such
an application might use the DB library.
4 Applications that use the DB access methods, but
are concerned about catastrophic failure, and
therefore want to transaction protect the underly-
ing DB files. See the file examples/ex_tpcb.c in
the DB source distribution for a C language code
example of how such an application might use the DB
library.
5 Applications that want to buffer input files other
than the DB access method files. See the file
examples/ex_mpool.c in the DB source distribution
for a C language code example of how such an appli-
cation might use the DB library.
6 Applications that want a general purpose lock man-
ager separate from locking support for the DB
access methods. See the file examples/ex_lock.c in
the DB source distribution for a C language code
example of how such an application might use the DB
library.
COMPATIBILITY
The DB 2.0 library provides backward compatible interfaces
for the historic UNIX dbm(3), ndbm(3) and hsearch(3)
interfaces. See db_dbm(3) and db_hsearch(3) for further
information on these interfaces. It also provides a back-
ward compatible interface for the historic DB 1.85
release. DB 2.0 does not provide database compatibility
for any of the above interfaces, and existing databases
must be converted manually. To convert existing databases
from the DB 1.85 format to the DB 2.0 format, review the
db_dump185(1) and db_load(1) manual pages.
The name space in DB 2.0 has been changed from that of
previous DB versions, notably version 1.85, for portabil-
ity and consistency reasons. The only name collisions in
the two libraries are the names used by the dbm(3),
ndbm(3), hsearch(3) and the DB 1.85 compatibility inter-
faces. To include both DB 1.85 and DB 2.0 in a single
library, remove the dbm(3), ndbm(3) and hsearch(3) inter-
faces from either of the two libraries, and the DB 1.85
compatibility interface from the DB 2.0 library. This can
be done by editing the library Makefiles and reconfiguring
and rebuilding the DB 2.0 library. Obviously, if you use
the historic interfaces, you will get the version in the
library from which you did not remove it. Similarly, you
will not be able to access DB 2.0 files using the DB 1.85
compatibility interface, since you have removed that from
the library as well.
It is possible to simply relink applications written to
the DB 1.85 interface against the DB 2.0 library. Recom-
pilation of such applications is slightly more complex.
When the DB 2.0 library is installed, it installs two
include files, db.h and db_185.h. The former file is
likely to replace the DB 1.85 version's include file which
had the same name. If this did not happen, recompiling DB
1.85 applications to use the DB 2.0 library is simple:
recompile as done historically, and load against the DB
2.0 library instead of the DB 1.85 library. If, however,
the DB 2.0 installation process has replaced the system's
db.h include file, replace the application's include of
db.h with inclusion of db_185.h, recompile as done histor-
ically, and then load against the DB 2.0 library.
Applications written using the historic interfaces of the
DB library should not require significant effort to port
to the DB 2.0 interfaces. While the functionality has
been greatly enhanced in DB 2.0, the historic interface
and functionality and is largely unchanged. Reviewing the
application's calls into the DB library and updating those
calls to the new names, flags and return values should be
sufficient.
While loading applications that use the DB 1.85 interfaces
against the DB 2.0 library, or converting DB 1.85 function
calls to DB 2.0 function calls will work, reconsidering
your application's interface to the DB database library in
light of the additional functionality in DB 2.0 is recom-
mended, as it is likely to result in enhanced application
performance.
SEE ALSO
db_archive(1), db_checkpoint(1), db_deadlock(1), db_dump(1),
db_intro(3), db_load(1), db_recover(1), db_stat(1),
db_appinit(3), db_cursor(3), db_dbm(3), db_lock(3), db_log(3),
db_mpool(3), db_open(3), db_txn(3)
LIBTP: Portable, Modular Transactions for UNIX, Margo Seltzer,
Michael Olson, USENIX proceedings, Winter 1992.