db_txn
NAME
db_txn - DB transaction management
SYNOPSIS
#include <db.h>
int
txn_open(const char *dir,
int flags, int mode, DB_ENV *dbenv, DB_TXNMGR **regionp);
int
txn_begin(DB_TXNMGR *txnp, DB_TXN *pid, DB_TXN **tid);
int
txn_prepare(DB_TXN *tid);
int
txn_commit(DB_TXN *tid);
int
txn_abort(DB_TXN *tid);
u_int32_t
txn_id(DB_TXN *tid);
int
txn_checkpoint(const DB_TXNMGR *txnp, long kbyte, long min);
int
txn_close(DB_TXNMGR *txnp);
int
txn_unlink(const char *dir, int force, DB_ENV *dbenv);
int
txn_stat(DB_TXNMGR *txnp,
DB_TXN_STAT **statp, void *(*db_malloc)(size_t));
DESCRIPTION
The DB library is a family of groups of functions that
provides a modular programming interface to transactions
and record-oriented file access. The library includes
support for transactions, locking, logging and file page
caching, as well as various indexed access methods. Many
of the functional groups (e.g., the file page caching
functions) are useful independent of the other DB func-
tions, although some functional groups are explicitly
based on other functional groups (e.g., transactions and
logging). For a general description of the DB package,
see db_intro(3).
This manual page describes the specific details of the DB
transaction support.
The db_txn functions are the library interface that pro-
vides transaction semantics. Full transaction support is
provided by a collection of modules that provide inter-
faces to the services required for transaction processing.
These services are recovery (see db_log(3)), concurrency
control (see db_lock(3)), and the management of shared
data (see db_mpool(3)). Transaction semantics can be
applied to the access methods described in db_open(3)
through function call parameters.
The model intended for transactional use (and that is used
by the access methods) is that write-ahead logging is pro-
vided by db_log(3) to record both before- and after-image
logging. Locking follows a two-phase protocol (i.e., all
locks are released at transaction commit).
txn_open
The txn_open function copies a pointer, to the transaction
region identified by the directory dir, into the memory
location referenced by regionp.
If the dbenv argument to txn_open was initialized using
db_appinit, dir is interpreted as described by
db_appinit(3).
Otherwise, if dir is not NULL, it is interpreted relative
to the current working directory of the process. If dir
is NULL, the following environment variables are checked
in order: ``TMPDIR'', ``TEMP'', and ``TMP''. If one of
them is set, transaction region files are created relative
to the directory it specifies. If none of them are set,
the first possible one of the following directories is
used: /var/tmp, /usr/tmp, /temp, /tmp, C:/temp and C:/tmp.
All files associated with the transaction region are cre-
ated in this directory. This directory must already exist
when txn_open is called. If the transaction region
already exists, the process must have permission to read
and write the existing files. If the transaction region
does not already exist, it is optionally created and ini-
tialized.
The flags and mode arguments specify how files will be
opened and/or created when they don't already exist. The
flags value is specified by or'ing together one or more of
the following values:
DB_CREATE
Create any underlying files, as necessary. If the
files do not already exist and the DB_CREATE flag is
not specified, the call will fail.
DB_THREAD
Cause the DB_TXNMGR handle returned by the txn_open
function to be useable by multiple threads within a
single address space, i.e., to be ``free-threaded''.
DB_TXN_NOSYNC
On transaction commit, do not synchronously flush the
log. This means that transactions exhibit the ACI
(atomicity, consistency and isolation) properties,
but not D (durability), i.e., database integrity will
be maintained but it is possible that some number of
the most recently committed transactions may be
undone during recovery instead of being redone.
The number of transactions that are potentially at
risk is governed by how often the log is checkpointed
(see db_checkpoint(1)) and how many log updates can
fit on a single log page.
All files created by the transaction subsystem are created
with mode mode (as described in chmod(2)) and modified by
the process' umask value at the time of creation (see
umask(2)). The group ownership of created files is based
on the system and directory defaults, and is not further
specified by DB.
The transaction subsystem is configured based on the dbenv
argument to txn_open, which is a pointer to a structure of
type DB_ENV (typedef'd in <db.h>). It is expected that
applications will use a single DB_ENV structure as the
argument to all of the subsystems in the DB package. In
order to ensure compatibility with future releases of DB,
all fields of the DB_ENV structure that are not explicitly
set should be initialized to 0 before the first time the
structure is used. Do this by declaring the structure
external or static, or by calling the C library routine
bzero(3) or memset(3).
The fields of DB_ENV used by txn_open are described below.
As references to the DB_ENV structure may be maintained by
txn_open, it is necessary that the DB_ENV structure and
memory it references be valid until after the txn_close
function is called. If dbenv is NULL or any of its fields
are set to 0, defaults appropriate for the system are used
where possible.
The following DB_ENV fields may be initialized before
calling txn_open:
void *(*db_errcall)(char *db_errpfx, char *buffer);
FILE *db_errfile;
const char *db_errpfx;
int db_verbose;
The error fields of the DB_ENV structure behave as
described for db_appinit(3).
int (*db_yield)(void);
The db_yield field of the DB_ENV structure behaves as
described for db_appinit(3).
DB_LOG *lg_info;
The logging region that is being used for this trans-
action environment. The lg_info field contains a
return value from the function log_open. Logging is
required for transaction environments, and it is an
error to not specify a logging region.
DB_LOCKTAB *lk_info;
The locking region that is being used for this trans-
action environment. The lk_info field contains a
return value from the function lock_open. If lk_info
is NULL, no locking is done in this transaction envi-
ronment.
unsigned int tx_max;
The maximum number of simultaneous transactions that
are supported. This bounds the size of backing files
and is used to derive limits for the size of the lock
region and logfiles. When there are more than tx_max
concurrent transactions, calls to txn_begin may cause
backing files to grow. If tx_max is 0, a default
value is used.
int (*tx_recover)(DB_LOG *logp, DBT *log_rec,
DB_LSN *lsnp, int redo, void *info);
A function that is called by txn_abort during trans-
action abort. This function takes five arguments:
logp A pointer to the transaction log (DB_LOG *).
log_rec
A log record.
lsnp A pointer to a log sequence number (DB_LSN *).
redo An integer value that is set to one of the fol-
lowing values:
DB_TXN_BACKWARD_ROLL
The log is being read backward to determine
which transactions have been committed and
which transactions were not (and should
therefore be aborted during recovery).
DB_TXN_FORWARD_ROLL
The log is being played forward, any trans-
action ids encountered that have not been
entered into the list referenced by info
should be ignored.
DB_TXN_OPENFILES
The log is being read to open all the files
required to perform recovery.
DB_TXN_REDO
Redo the operation described by the log
record.
DB_TXN_UNDO
Undo the operation described by the log
record.
info An opaque pointer used to reference the list of
transaction IDs encountered during recovery.
If recover is NULL, the default is that only DB
access method operations are transaction protected,
and the default recover function will be used.
The txn_open function returns the value of errno on fail-
ure and 0 on success.
txn_begin
The txn_begin function creates a new transaction in the
designated transaction manager, copying a pointer to a
DB_TXN that uniquely identifies it into the memory refer-
enced by tid. If the pid argument is non-NULL, the new
transaction is a nested transaction with the transaction
indicated by pid as its parent.
Transactions may not span threads, i.e., each transaction
must begin and end in the same thread, and each transac-
tion may only be used by a single thread.
The txn_begin function returns the value of errno on fail-
ure and 0 on success.
txn_prepare
The txn_prepare function initiates the beginning of a two
phase commit. In a distributed transaction environment,
db can be used as a local transaction manager. In this
case, the distributed transaction manager must send pre-
pare messages to each local manager. The local manager
must then issue a txn_prepare and await its successful
return before responding to the distributed transaction
manager. Only after the distributed transaction manager
receives successful responses from all of its prepare mes-
sages should it issue any commit messages.
The txn_prepare function returns the value of errno on
failure and 0 on success.
txn_commit
The txn_commit function ends the transaction specified by
the tid argument. A commit log record is written and
flushed to disk as are all previously written log records.
If the transaction is nested, its locks are acquired by
the parent transaction else its locks are released. Any
applications that require strict two-phase locking must
not release any locks explicitly, leaving them all to be
released by txn_commit.
The txn_commit function returns the value of errno on
failure and 0 on success.
txn_abort
The txn_abort function causes an abnormal termination of
the transaction. The log is played backwards and any nec-
essary recovery operations are initiated through the
recover function specified to txn_open. After recovery is
completed, all locks held by the transaction are acquired
by the parent transaction in the case of a nested transac-
tion or released in the case of a non-nested transaction.
As is the case for txn_commit, applications that require
strict two phase locking should not explicitly release any
locks.
The txn_abort function returns the value of errno on fail-
ure and 0 on success.
txn_id
The txn_id function returns the unique transaction id
associated with the specified transaction. Locking calls
made on behalf of this transaction should use the value
returned from txn_id as the locker parameter to the
lock_get or lock_vec calls.
txn_close
The txn_close function detaches a process from the trans-
action environment specified by the DB_TXNMGR pointer.
All mapped regions are unmapped and any allocated
resources are freed. Any uncommitted transactions are
aborted.
In addition, if the dir argument to txn_open was NULL and
dbenv was not initialized using db_appinit, all files cre-
ated for this shared region will be removed, as if
txn_unlink were called.
When multiple threads are using the DB_TXNMGR handle con-
currently, only a single thread may call the txn_close
function.
The txn_close function returns the value of errno on fail-
ure and 0 on success.
txn_unlink
The txn_unlink function destroys the transaction region
identified by the directory dir, removing all files used
to implement the transaction region. (The directory dir
is not removed.) If there are processes that have called
txn_open without calling txn_close (i.e., there are pro-
cesses currently using the transaction region), txn_unlink
will fail without further action, unless the force flag is
set, in which case txn_unlink will attempt to remove the
transaction region files regardless of any processes still
using the transaction region.
The result of attempting to forcibly destroy the region
when a process has the region open is unspecified. Pro-
cesses using a shared memory region maintain an open file
descriptor for it. On UNIX systems, the region removal
should succeed and processes that have already joined the
region should continue to run in the region without
change, however processes attempting to join the transac-
tion region will either fail or attempt to create a new
region. On other systems, e.g., WNT, where the unlink(2)
system call will fail if any process has an open file
descriptor for the file, the region removal will fail.
In the case of catastrophic or system failure, database
recovery must be performed (see db_recovery(1) or the
DB_RECOVER flags to db_appinit(3)). Alternatively, if
recovery is not required because no database state is
maintained across failures, it is possible to clean up a
transaction region by removing all of the files in the
directory specified to the txn_open function, as transac-
tion region files are never created in any directory other
than the one specified to txn_open. Note, however, that
this has the potential to remove files created by the
other DB subsystems in this database environment.
The txn_unlink function returns the value of errno on
failure and 0 on success.
txn_checkpoint
The txn_checkpoint function syncs the underlying memory
pool, writes a checkpoint record to the log and then
flushes the log.
If either kbyte or min is non-zero, the checkpoint is only
done if more than min minutes have passed since the last
checkpoint, or if more than kbyte kilobytes of log data
have been written since the last checkpoint.
The txn_checkpoint function returns the value of errno on
failure, 0 on success, and DB_INCOMPLETE if there were
pages that needed to be written but that memp_sync(3) was
unable to write immediately. In this case, the txn_check-
point call should be retried.
The txn_checkpoint function is the underlying function
used by the db_checkpoint(1) utility. See the source code
for the db_checkpoint utility for an example of using
txn_checkpoint in a UNIX environment.
txn_stat
The txn_stat function creates a statistical structure and
copies pointers to it into user-specified memory loca-
tions.
Statistical structures are created in allocated memory.
If db_malloc is non-NULL, it is called to allocate the
memory, otherwise, the library function malloc(3) is used.
The function db_malloc must match the calling conventions
of the malloc(3) library routine. Regardless, the caller
is responsible for deallocating the returned memory. To
deallocate the returned memory, free each returned memory
pointer; pointers inside the memory do not need to be
individually freed.
The transaction region statistics are stored in a struc-
ture of type DB_TXN_STAT (typedef'd in <db.h>). The fol-
lowing DB_TXN_STAT fields will be filled in:
DB_LSN st_last_ckp;
The LSN of the last checkpoint.
DB_LSN st_pending_ckp;
The LSN of any checkpoint that is currently in
progress. If st_pending_ckp is the same as
st_last_ckp there is no checkpoint in progress.
time_t st_time_ckp;
The time the last completed checkpoint finished (as
returned by time(2)).
u_int32_t st_last_txnid;
The last transaction ID allocated.
u_int32_t st_maxtxns;
The maximum number of active transactions supported
by the region.
u_int32_t st_naborts;
The number of transactions that have aborted.
u_int32_t st_nactive;
The number of transactions that are currently active.
u_int32_t st_nbegins;
The number of transactions that have begun.
u_int32_t st_ncommits;
The number of transactions that have committed.
DB_TXN_ACTIVE *st_txnarray;
A pointer to an array of st_nactive DB_TXN_ACTIVE
structures, describing the currently active transac-
tions. The following fields of the DB_TXN_ACTIVE
structure (typedef'd in <db.h>) will be filled in:
u_int32_t txnid;
The transaction ID as returned by txn_begin(3).
DB_LSN lsn;
The LSN of the transaction-begin record.
TRANSACTIONS
Creating transaction protected applications using the DB
access methods requires little system customization. In
most cases, the default parameters to the locking, log-
ging, memory pool, and transaction subsystems will suf-
fice. Applications can use db_appinit(3) to perform this
initialization, or they may do it explicitly.
Each database operation (i.e., any call to a function
underlying the handles returned by db_open(3) and db_cur-
sor(3)) is normally performed on behalf of a unique
locker. If, within a single thread of control, multiple
calls on behalf of the same locker are desired, then
transactions must be used.
Once the application has initialized the DB subsystems
that it is using, it may open the DB access method
databases. For applications performing transactions, the
databases must be opened after subsystem initialization,
and cannot be opened as part of a transaction. Once the
databases are opened, the application can group sets of
operations into transactions, by surrounding the opera-
tions with the appropriate txn_begin, txn_commit and
txn_abort calls. The DB access methods will make the
appropriate calls into the lock, log and memory pool sub-
systems in order to guarantee that transaction semantics
are applied. When the application is ready to exit, all
outstanding transactions should have been committed or
aborted. At this point, all open DB files should be
closed. Once the DB database files are closed, the DB
subsystems should be closed, either explicitly or by call-
ing db_appexit(3).
It is also possible to use the locking, logging and trans-
action subsystems of DB to provide transaction semantics
to objects other than those described by the DB access
methods. In these cases, the application will need more
explicit customization of the subsystems as well as the
development of appropriate data-structure-specific recov-
ery functions.
For example, consider an application that provides trans-
action semantics to data stored in plain UNIX files
accessed using the read(2) and write(2) system calls. The
operations for which transaction protection is desired are
bracketed by calls to txn_begin and txn_commit.
Before data are referenced, the application must make a
call to the lock manager, db_lock, for a lock of the
appropriate type (e.g. read) on the object being locked.
The object might be a page in the file, a byte, a range of
bytes, or some key. It is up to the application to ensure
that appropriate locks are acquired. Before a write is
performed, the application should acquire a write lock on
the object, by making an appropriate call to the lock man-
ager, db_lock. Then, the application should make a call
to the log manager, db_log, to record enough information
to redo the operation in case of failure after commit and
to undo the operation in case of abort. As discussed in
the db_log(3) manual page, the application is responsible
for providing any necessary structure to the log record.
For example, the application must understand what part of
the log record is an operation code, what part identifies
the file being modified, what part is redo information,
and what part is undo information.
After the log message is written, the application may
issue the write system call. After all requests are
issued, the application may call txn_commit. When
txn_commit returns, the caller is guaranteed that all nec-
essary log writes have been written to disk.
At any time, the application may call txn_abort, which
will result in the appropriate calls to the recover func-
tion to restore the ``database'' to a consistent pre-
transaction state. (The recover function must be able to
either re-apply or undo the update depending on the con-
text, for each different type of log record.)
If the application should crash, the recovery process uses
the db_log interface to read the log and call the recover
function to restore the database to a consistent state.
The txn_prepare function provides the core functionality
to implement distributed transactions, but it does not
manage the notification of distributed transaction man-
agers. The caller is responsible for issuing txn_prepare
calls to all sites participating in the transaction. If
all responses are positive, the caller can issue a
txn_commit. If any of the responses are negative, the
caller should issue a txn_abort. In general, the txn_pre-
pare call requires that the transaction log be flushed to
disk.
ENVIRONMENT VARIABLES
The following environment variables affect the execution
of db_txn:
DB_HOME
If the dbenv argument to txn_open was initialized
using db_appinit, the environment variable DB_HOME
may be used as the path of the database home for the
interpretation of the dir argument to txn_open, as
described in db_appinit(3).
TMPDIR
If the dbenv argument to txn_open was NULL or not
initialized using db_appinit, the environment vari-
able TMPDIR may be used as the directory in which to
create the transaction region, as described in the
txn_open section above.
COMPILING
On IRIX, if you are compiling a threaded application, you
must compile with the -D_SGI_MP_SOURCE flag:
cc -D_SGI_MP_SOURCE ...
On OSF/1, if you are compiling a threaded application, you
must compile with the -D_REENTRANT flag:
cc -D_REENTRANT ...
On Solaris, if you are compiling a threaded application,
you must compile with the -D_REENTRANT flag and link with
the -lthread library:
cc -D_REENTRANT ... -lthread
ERRORS
The txn_open function may fail and return errno for any of
the errors specified for the following DB and library
functions: close(2), fcntl(2), fstat(2), getpid(2),
mmap(2), munmap(2), open(2), unlink(2), abort(3),
fflush(3), fprintf(3), free(3), getenv(3), isdigit(3),
malloc(3), memcpy(3), memset(3), sigfillset(3), sigproc-
mask(3), stat(3), strcpy(3), strdup(3), strerror(3),
strlen(3), txn_create(3), txn_unlink(3) and vsnprintf(3).
In addition, the txn_open function may fail and return
errno for the following conditions:
[EINVAL]
An invalid flag value or parameter was specified.
TMPDIR If the dbenv argument to _open was NULL or not
initialized using db_appinit, the environment vari-
able TMPDIR may be used as the directory in which to
create the , as described in the _open section above.
The dbenv parameter was NULL.
[EAGAIN]
The shared memory region was locked and (repeatedly)
unavailable.
The txn_begin function may fail and return errno for any
of the errors specified for the following DB and library
functions: fcntl(2), getpid(2), lseek(2), mmap(2), mun-
map(2), write(2), fflush(3), fprintf(3), free(3),
llseek(3), log_put(3), malloc(3), memcpy(3), memset(3),
strerror(3) and vsnprintf(3).
In addition, the txn_begin function may fail and return
errno for the following conditions:
[ENOSPC]
The maximum number of concurrent transactions has
been reached.
The txn_prepare function may fail and return errno for any
of the errors specified for the following DB and library
functions: fflush(3), fprintf(3), log_flush(3), str-
error(3) and vsnprintf(3).
The txn_commit function may fail and return errno for any
of the errors specified for the following DB and library
functions: fcntl(2), getpid(2), fflush(3), fprintf(3),
free(3), lock_vec(3), log_put(3), malloc(3), memcpy(3),
strerror(3) and vsnprintf(3).
In addition, the txn_commit function may fail and return
errno for the following conditions:
[EINVAL]
The transaction was aborted.
The txn_abort function may fail and return errno for any
of the errors specified for the following DB and library
functions: fcntl(2), getpid(2), fflush(3), fprintf(3),
lock_vec(3), log_get(3), strerror(3) and vsnprintf(3).
[EINVAL]
The transaction was already aborted.
The txn_checkpoint function may fail and return errno for
any of the errors specified for the following DB and
library functions: fcntl(2), getpid(2), fflush(3),
fprintf(3), free(3), log_compare(3), log_put(3), mal-
loc(3), memcpy(3), memp_sync(3), memset(3), strerror(3),
time(3) and vsnprintf(3).
[EINVAL]
An invalid flag value or parameter was specified.
The txn_close function may fail and return errno for any
of the errors specified for the following DB and library
functions: close(2), fcntl(2), getpid(2), munmap(2),
fflush(3), fprintf(3), strerror(3), txn_abort(3) and
vsnprintf(3).
The txn_unlink function may fail and return errno for any
of the errors specified for the following DB and library
functions: close(2), fcntl(2), fstat(2), getpid(2),
mmap(2), munmap(2), open(2), unlink(2), abort(3),
fflush(3), fprintf(3), getenv(3), isdigit(3), malloc(3),
memcpy(3), memset(3), sigfillset(3), sigprocmask(3),
stat(3), strcpy(3), strdup(3), strerror(3), strlen(3) and
vsnprintf(3).
In addition, the txn_unlink function may fail and return
errno for the following conditions:
[EBUSY]
The shared memory region was in use and the force
flag was not set.
SEE ALSO
LIBTP: Portable, Modular Transactions for UNIX, Margo
Seltzer, Michael Olson, USENIX proceedings, Winter 1992.
BUGS
Nested transactions are not yet implemented.
db_archive(1), db_checkpoint(1), db_deadlock(1), db_dump(1),
db_intro(3), db_load(1), db_recover(1), db_stat(1),
db_appinit(3), db_cursor(3), db_dbm(3), db_lock(3), db_log(3),
db_mpool(3), db_open(3), db_txn(3)