db_intro



NAME

       db - the DB library overview and introduction


DESCRIPTION

       The  DB  library  is  a family of groups of functions that
       provides a modular programming interface  to  transactions
       and  record-oriented  file  access.   The library includes
       support for transactions, locking, logging and  file  page
       caching,  as well as various indexed access methods.  Many
       of the functional groups  (e.g.,  the  file  page  caching
       functions)  are  useful  independent of the other DB func-
       tions, although  some  functional  groups  are  explicitly
       based  on  other functional groups (e.g., transactions and
       logging).  For a general description of  the  DB  package,
       see db_intro(3).

       The  DB  library  does  not  provide user interfaces, data
       entry GUI's, SQL support or  any  of  the  other  standard
       user-level  database interfaces.  What it does provide are
       the programmatic building blocks that allow you to  easily
       embed  database-style functionality and support into other
       objects or interfaces.


ARCHITECTURE

       The DB library supports two different models  of  applica-
       tions: client-server and embedded.

       In  the  client-server model, a database server is created
       by writing an application that accepts requests  via  some
       form  of IPC and issues calls to the DB functions based on
       those queries.  In this  model,  applications  are  client
       programs that attach to the server and issue queries.  The
       client-server model trades performance for protection,  as
       it  does not require that the applications share a protec-
       tion domain with the  server,  but  IPC/RPC  is  generally
       slower than a function call.  In addition, this model sim-
       plifies the creation  of  network  client-server  applica-
       tions.

       In the embedded model, an application links the DB library
       directly into its address space.  This provides for faster
       access  to  database  functionality,  but  means  that the
       applications sharing log files, lock manager,  transaction
       manager  or  memory pool manager have the ability to read,
       write, and corrupt each other's data.

       It is the application designer's responsibility to  select
       the appropriate model for their application.

       Applications  require a single include file, <db.h>, which
       must be installed in an appropriate location on  the  sys-
       tem.

       The  DB  library  is  made up of five major subsystems, as
       follows:

       Access methods
            The access methods subsystem is made up  of  general-
            purpose support for creating and accessing files for-
            matted as B+tree's, hashed files, and fixed and vari-
            able length records.  These modules are useful in the
            absence of transactions for processes that want fast,
            formatted  file  support.  See db_open(3) and db_cur-
            sor(3).

       Locking
            The locking subsystem is a general-purpose lock  man-
            ager  used  by  DB.   This  module  is  useful in the
            absence of the rest of the DB package  for  processes
            that  want  a  fast,  configurable lock manager.  See
            db_lock(3) for more information.

       Logging
            The logging subsystem is the logging support used  to
            support the DB transaction model.  It is largely spe-
            cific to the DB package,  and  unlikely  to  be  used
            elsewhere.  See db_log(3) for more information.

       Memory Pool
            The  memory  pool  subsystem  is  the general-purpose
            shared memory buffer pool used by DB.  This module is
            useful  outside  of the DB package for processes that
            want page-oriented, cached, shared file access.   See
            db_mpool(3) for more information.

       Transactions
            The  transaction subsystem implements the DB transac-
            tion model.  It is largely specific to the  DB  pack-
            age.  See db_txn(3) for more information.

       There  are  several stand-alone utilities that support the
       DB environment.  They are as follows:

       db_archive
            The  db_archive  utility  supports  database  backup,
            archival    and   log   file   administration.    See
            db_archive(1) for more information.

       db_recover
            The db_recover utility runs after an unexpected DB or
            system  failure  to restore the database to a consis-
            tent state.  See db_recover(1) for more  information.

       db_checkpoint
            The  db_checkpoint  utility runs as a daemon process,
            monitoring the database log and periodically  issuing
            checkpoints.   See db_checkpoint(1) for more informa-
            tion.

       db_deadlock
            The db_deadlock utility runs  as  a  daemon  process,
            periodically  traversing the database lock structures
            and aborting transactions when it detects a deadlock.
            See db_deadlock(1) for more information.

       db_dump
            The  db_dump utility writes a copy of the database to
            a  flat-text  file  in  a   portable   format.    See
            db_dump(1) for more information.

       db_load
            The db_load utility reads the flat-text file produced
            by db_dump, and loads it into a database  file.   See
            db_load(1) for more information.

       db_stat
            The db_stat utility displays statistics for databases
            and database environments.  See db_stat(1)  for  more
            information.


NAMING AND THE DB ENVIRONMENT

       The   DB  application  environment  is  described  by  the
       db_appinit(3) manual page.   The  db_appinit  function  is
       used  to  create a consistent naming scheme for all of the
       subsystems sharing a DB environment.  If db_appinit is not
       called  by a DB application, naming is performed as speci-
       fied by the manual page for the specific subsystem.

       DB applications that run with additional privilege  should
       always  call the db_appinit function to initialize DB nam-
       ing for their application.  This ensures that the environ-
       ment variables DB_HOME and TMPDIR will only be used if the
       application explicitly specifies that they are safe.


ADMINISTERING THE DB ENVIRONMENT

       A DB environment consists of a database home directory and
       all the long-running daemons necessary to ensure continued
       functioning of DB and its applications.  In  the  presence
       of  transactions,  the  checkpoint  daemon, db_checkpoint,
       must be run as long as there are applications present (see
       db_checkpoint(1)  for  details).   When  locking  is being
       used, the deadlock detection daemon, db_deadlock, must  be
       run  as  long  as  there  are  applications  present  (see
       db_deadlock(1) for details).  The db_archive utility  pro-
       vides  information  to facilitate log reclamation and cre-
       ation  of  database  snapshots  (see   db_archive(1)   for
       details.    After   application  or  system  failure,  the
       db_recover utility must be run before any applications are
       restarted  to  return  the  database to a consistent state
       (see db_recover(1) for details).

       The simplest way to administer a DB  application  environ-
       ment is to create a single ``home'' directory which houses
       all the files for the applications that are sharing the DB
       environment.   In  this  model,  the shared memory regions
       (i.e., the locking, logging, memory pool, and  transaction
       regions)  and  log  files  will be stored in the specified
       directory hierarchy.  In addition, all data  files  speci-
       fied  using  relative  pathnames will be named relative to
       this home directory.  When recovery needs to be run (e.g.,
       after  system  or  application failure), this directory is
       specified as the home directory to db_recover(1), and  the
       system  is  restored  to a consistent state, ready for the
       applications to be restarted.

       In situations where further customization is desired, such
       as  placing the log files on a separate device, it is rec-
       ommended that the application installation process  create
       a  configuration  file named ``DB_CONFIG'' in the database
       home  directory,  specifying   the   customization.    See
       db_appinit(3) for details on this procedure.

       The  DB  architecture  does not support placing the shared
       memory regions on remote filesystems,  e.g.,  the  Network
       File  System  (NFS) and the Andrew File System (AFS).  For
       this reason, the database home directory must reside on  a
       local  filesystem.   Databases,  log  files  and temporary
       files may be placed on remote  filesystems,  although  the
       application  may incur a performance penalty for so doing.

       It is important to realize that all applications sharing a
       single  home  directory implicitly trust each other.  They
       have access to each other's data  as  it  resides  in  the
       shared memory buffer pool and will share resources such as
       buffer space and locks.  At the same  time,  any  applica-
       tions that access the same files must share an environment
       if consistency is to be maintained  across  the  different
       applications.


SIGNALS

       When  applications  using DB receive signals, it is impor-
       tant that they exit gracefully, discarding  any  DB  locks
       that  they  may  hold.  This is normally done by setting a
       flag when a signal arrives, and  then  checking  for  that
       flag  periodically  within the application.  Specifically,
       the signal handler should not  attempt  to  release  locks
       and/or  close  the  database  handles itself.  This is not
       guaranteed to work correctly and  the  results  are  unde-
       fined.

       If  an  application exits holding a lock, the situation is
       no different than if  the  application  crashed,  and  all
       applications  participating  in  the  database environment
       must be shutdown, and then recovery must be performed.  If
       this  is not done, the locks that the application held can
       cause unresolvable  deadlocks  inside  the  database,  and
       applications may then hang.


MULTI-THREADING

       The  DB library is not itself multi-threaded.  The library
       was deliberately architected to not use threads internally
       because  of  the  portability  problems that using threads
       within the library would introduce.

       DB supports multi-threaded applications  with  the  caveat
       that it loads and calls functions that are commonly avail-
       able in C language environments and which  may  not  them-
       selves  be  thread-safe.  Other than this usage, DB has no
       static data and maintains no local context  between  calls
       to  DB  functions.  To ensure that applications can safely
       use threads in the context of DB, porters to new operating
       systems  and/or  C  libraries must confirm that the system
       and C library functions used by the DB library are thread-
       safe.

       Object  handles  returned  from  DB  library functions are
       free-threaded, i.e., threads may use handles concurrently,
       by  specifying the DB_THREAD flag to db_appinit(3) and the
       other subsystem open functions.

       There  are  a  few  additional  caveats  concerning  using
       threads to access the DB library:

       1      Spinlocks  must  have been implemented for the com-
              piler/architecture  combination.    Attempting   to
              specify  the  DB_THREAD flag will fail if spinlocks
              are not available.

       2      The DB_THREAD flag must be specified for  all  sub-
              systems  either  explicitly  or  via the db_appinit
              function.  Setting  the  DB_THREAD  flag  inconsis-
              tently may result in database corruption.

       3      Only  a  single  thread may call the close function
              for a returned database or subsystem  handle.   See
              db_open(3)  and  the  appropriate  subsystem manual
              pages for more information.

       4      Either the DB_DBT_MALLOC  or  DB_DBT_USERMEM  flags
              must  be  set  in  a  DBT  used  for  key  or  data
              retrieval.  See db_dbt(3) for more information.

       5      The DB_CURRENT, DB_NEXT and DB_PREV  flags  to  the
              log_get function may not be used by a free-threaded
              handle.  If such  calls  are  necessary,  a  thread
              should  explicitly create a unique DB_LOG handle by
              calling log_open(3).  See db_log(3) for more infor-
              mation.

       6      Each  database operation (i.e., any call to a func-
              tion underlying the handles returned by  db_open(3)
              and  db_cursor(3))  is normally performed on behalf
              of a unique locker.  If, within a single thread  of
              control,  multiple  calls  on  behalf  of  the same
              locker are desired, then transactions must be used.
              For  example, consider the case where a cursor scan
              locates a record, and then based  on  that  record,
              accesses some other item in the database.  If these
              are done using the default lockers for the  handle,
              there  is  no  guarantee  that these two operations
              will not conflict.  If the  application  wishes  to
              guarantee  that  the  operations  do  not conflict,
              locks must be obtained on behalf of a  transaction,
              instead of the default locker id, and a transaction
              must be specified to the cursor  creation  and  the
              subsequent db call.

       7      Transactions  may  not  span  threads,  i.e.,  each
              transaction must begin and end in the same  thread,
              and  each  transaction may only be used by a single
              thread.


ERROR RETURNS

       Except for the historic dbm and  hsearch  interfaces  (see
       db_dbm(3)  and  db_hsearch(3)), DB does not use the global
       variable errno to return error values to the calling  pro-
       cess  or  thread.   The return values for all DB functions
       can be grouped into three categories:

        0   A return value of 0 indicates that the operation  was
            successful.

       >0   A  return value that is greater than 0 indicates that
            there was a system error.  The errno  value  returned
            by the system is returned by the function, e.g., when
            a DB function  is  unable  to  allocate  memory,  the
            return value from the function will be ENOMEM.

       <0   A return value that is less than 0 indicates a condi-
            tion that was not a system failure, but  was  not  an
            unqualified  success, either.  For example, a routine
            to retrieve a key/data pair  from  the  database  may
            return  DB_NOTFOUND  when  the key/data pair does not
            appear in the database, as opposed to the value of  0
            which  would  be  returned  if the key/data pair were
            found in  the  database.   All  such  special  values
            returned  by DB functions are less than 0 in order to
            avoid conflict with possible values of errno.

       There are two special return values that are somewhat sim-
       ilar  in  meaning, are returned in similar situations, and
       therefore might be confused: DB_NOTFOUND and  DB_KEYEMPTY.
       The  DB_NOTFOUND error return indicates that the requested
       key/data pair did not exist in the database or that start-
       or  end-of-file  has  been reached.  The DB_KEYEMPTY error
       return  indicates  that  the   requested   key/data   pair
       logically  exists  but was never explicitly created by the
       application (the recno access  method  will  automatically
       create   key/data  pairs  under  some  circumstances,  see
       db_open(3) for more information), or  that  the  requested
       key/data  pair  was  deleted and is currently in a deleted
       state.


DATABASE AND PAGE SIZES

       DB stores database file page numbers  as  unsigned  32-bit
       numbers  and  database  file page sizes as unsigned 16-bit
       numbers.  This results in a maximum database size of 2^48.
       The  minimum database page size is 512 bytes, resulting in
       a minimum maximum database size of 2^41.

       DB is potentially further limited if the host system  does
       not  have  filesystem  support for files larger than 2^32,
       including seeking to absolute offsets within such files.

       The maximum btree depth is 255.


BYTE ORDERING

       The database files created by DB can be created in  either
       little or big-endian formats.  By default, the native for-
       mat of the machine on which the database is  created  will
       be  used.   Any  format  database can be used on a machine
       with a different native format, although  it  is  possible
       that  the application will incur a performance penalty for
       the run-time conversion.


EXTENDING DB

       DB includes tools to simplify the development of  applica-
       tion-specific logging and recovery.  Specifically, given a
       description of the information to be logged,  these  tools
       will  automatically  create  logging  functions (functions
       that take the values as parameters and construct a  single
       record  that is written to the log), read functions (func-
       tions that read a log record  and  unmarshall  the  values
       into  a  structure  that maps onto the values you chose to
       log), a print function (for debugging), templates for  the
       recovery  functions,  and  automatic  dispatching  to your
       recovery functions.


EXAMPLES

       There are several different ways that the  DB  library  is
       used:

       1      Applications  that  want  to use formatted files to
              store data, and  are  unconcerned  with  concurrent
              access  and  loss of data due to catastrophic fail-
              ure.  Generally, these applications  create  short-
              lived  databases  that  are  discarded or recreated
              when the system fails.  Such applications will only
              be  concerned  with  the DB access methods.  The DB
              access methods will use the memory pool  subsystem,
              but  the  application  is  unlikely  to be aware of
              this.  See the files examples/ex_access.c and exam-
              ples/ex_btrec.c in the DB source distribution for C
              language code examples  of  how  such  applications
              might use the DB library.

       2      Applications  similar  to #1, but that also wish to
              use db_appinit(3) for  environment  initialization.
              See the file examples/ex_appinit.c in the DB source
              distribution for a C language code example  of  how
              such an application might use the DB library.

       3      Applications   that  wish  to  transaction  protect
              structures other than the DB access  methods.   See
              the  file examples/ex_trans.c in the DB source dis-
              tribution for a C language code example of how such
              an application might use the DB library.

       4      Applications  that  use  the DB access methods, but
              are  concerned  about  catastrophic  failure,   and
              therefore  want to transaction protect the underly-
              ing DB files.  See the file  examples/ex_tpcb.c  in
              the  DB  source  distribution for a C language code
              example of how such an application might use the DB
              library.

       5      Applications  that want to buffer input files other
              than the DB access  method  files.   See  the  file
              examples/ex_mpool.c  in  the DB source distribution
              for a C language code example of how such an appli-
              cation might use the DB library.

       6      Applications  that want a general purpose lock man-
              ager separate  from  locking  support  for  the  DB
              access methods.  See the file examples/ex_lock.c in
              the DB source distribution for a  C  language  code
              example of how such an application might use the DB
              library.


COMPATIBILITY

       The DB 2.0 library provides backward compatible interfaces
       for  the  historic  UNIX  dbm(3),  ndbm(3)  and hsearch(3)
       interfaces.  See db_dbm(3) and db_hsearch(3)  for  further
       information on these interfaces.  It also provides a back-
       ward  compatible  interface  for  the  historic  DB   1.85
       release.   DB  2.0 does not provide database compatibility
       for any of the above interfaces,  and  existing  databases
       must be converted manually.  To convert existing databases
       from the DB 1.85 format to the DB 2.0 format,  review  the
       db_dump185(1) and db_load(1) manual pages.

       The  name  space  in  DB 2.0 has been changed from that of
       previous DB versions, notably version 1.85, for  portabil-
       ity  and consistency reasons.  The only name collisions in
       the two libraries  are  the  names  used  by  the  dbm(3),
       ndbm(3),  hsearch(3)  and the DB 1.85 compatibility inter-
       faces.  To include both DB 1.85 and DB  2.0  in  a  single
       library,  remove the dbm(3), ndbm(3) and hsearch(3) inter-
       faces from either of the two libraries, and  the  DB  1.85
       compatibility interface from the DB 2.0 library.  This can
       be done by editing the library Makefiles and reconfiguring
       and  rebuilding the DB 2.0 library.  Obviously, if you use
       the historic interfaces, you will get the version  in  the
       library  from which you did not remove it.  Similarly, you
       will not be able to access DB 2.0 files using the DB  1.85
       compatibility  interface, since you have removed that from
       the library as well.

       It is possible to simply relink  applications  written  to
       the  DB 1.85 interface against the DB 2.0 library.  Recom-
       pilation of such applications is  slightly  more  complex.
       When  the  DB  2.0  library  is installed, it installs two
       include files, db.h and  db_185.h.   The  former  file  is
       likely to replace the DB 1.85 version's include file which
       had the same name.  If this did not happen, recompiling DB
       1.85  applications  to  use  the DB 2.0 library is simple:
       recompile as done historically, and load  against  the  DB
       2.0  library instead of the DB 1.85 library.  If, however,
       the DB 2.0 installation process has replaced the  system's
       db.h  include  file,  replace the application's include of
       db.h with inclusion of db_185.h, recompile as done histor-
       ically, and then load against the DB 2.0 library.

       Applications  written using the historic interfaces of the
       DB library should not require significant effort  to  port
       to  the  DB  2.0  interfaces.  While the functionality has
       been greatly enhanced in DB 2.0,  the  historic  interface
       and functionality and is largely unchanged.  Reviewing the
       application's calls into the DB library and updating those
       calls  to the new names, flags and return values should be
       sufficient.

       While loading applications that use the DB 1.85 interfaces
       against the DB 2.0 library, or converting DB 1.85 function
       calls to DB 2.0 function calls  will  work,  reconsidering
       your application's interface to the DB database library in
       light of the additional functionality in DB 2.0 is  recom-
       mended,  as it is likely to result in enhanced application
       performance.



SEE ALSO

       db_archive(1), db_checkpoint(1), db_deadlock(1), db_dump(1),
       db_intro(3), db_load(1), db_recover(1), db_stat(1),
       db_appinit(3), db_cursor(3), db_dbm(3), db_lock(3), db_log(3),
       db_mpool(3), db_open(3), db_txn(3)

       LIBTP: Portable, Modular Transactions for UNIX, Margo Seltzer,
       Michael Olson, USENIX proceedings, Winter 1992.