Overview of the Core
Libwww is built on the request/response paradigm where an application issues
a request for a URI (URL). Libwww then tries to fulfill the request as efficient
as possible either by requesting the URL at the origin server, a proxy server,
a gateway, directly from the local file system, or a locally cached version.
Data is delivered back to the application as soon as it gets ready which
guarantees minimum access delay for the application. Libwww is capable of
accepting many simultaneous requests and handling them in an intelligent
manner in order to save network bandwidth.
Requests and Responses
The request/response paradigm is illustrated in the control/data diagram
shown below. The diagram shows only the core modules - the other modules
are "pasted in" later. Note, that libwww code is to the right of the thick
vertical line (green), and the application to the left can be any type of
application, for example a proxy or a client. The architecture of libwww
does support clients and proxies in pretty much the same way as it makes
little difference to libwww: a client has a user interface whereas a server
has a network interface.
Another thing to note is that libwww supports large scale data flow from
the application to the network as well as from the network to the application.
This has an important impact on the functionality that can be put into
applications, for example allowing collaborative authoring possibilities
via the Web. The architecture behind this is described in the section
"Post Webs - an API for PUT and POST".
The thin lines (red) is control flow, the thick lines (blue) is data flow
and the "lightning" (magenta) is control flow as a result of events handled
by libwww. Let's see what happens when an application issues a request. The
description is based on having an event loop - this can either be the one
provided by libwww or an external event loop provided by the application.
The section on Threads and Event Loops explains
more on how this can be set up. The numbers refer to the figure above.
-
The event manager is waiting for an event from the application. This can
for example be a user clicking the mouse on a link or types a number on the
keyboard. When an event arrives, the event manager calls the user event handler
provided by the application.
-
The user event handler creates a request object and uses one of the load
methods.
-
The Request object creates a new Net object.
-
The Net object calls any call back functions registered to be called
before the request is actually started. This can for example be
mapping the URL to another destination, checking the cache, look for proxy
servers and gateways etc.
-
If the request has to access the net then the Net object passes it to the
protocol object
-
The after callback functions are called when the request is terminated.
Types of operations you want to make here can for example be logging, history
update etc. If the before call back functions implies that no net
access is required then the protocol object is not used at all.
-
The event callback function is now called to actually get the document.
-
When data is arriving then the Format manager is contacted to build a stream
stack.
-
The converted data is either handed from the network to the application
or from the application to the network as it gets ready. If no data
is ready, control is given back to the event manager.
-
If an error occurs then a dialog callback function is called to notify the
user
This description is the "macro" description of how the core modules interact
and in the rest of this document we shall see more of the details of what
is going on inside the core modules and what objects are involved. Note that
by using a threaded model, libwww can handle multiple requests simultaneously.
An example on how to do this is described in the section
"Libwww Threads".
-
Request Object
-
The access manager is the main entry point for requesting a data object pointed
to by a URI. It has a set of methods that allows the application to request
different services, for example to get a URI, post a URI, or to search a
URI.
-
Protocol Object
-
The protocol manager is invoked by the access manager in order to access
a document not found in memory or in cache. The manager consists of a set
of protocol modules handling the access schemes HTTP, FTP, NNTP, Gopher,
WAIS, Telnet, and access to the local file system. The protocol modules are
registered dynamically (using static linking) and the
User's Guide describes how modules can be registered.
Each protocol module is responsible for establishing the connection to the
remote server (or the local file-system) and extract information using a
specific access method. When data arrives from the network, it is passed
on to the format manager.
-
Format Manager
-
The stream format manager takes care of the transportation of streams of
data from the network to the application and vice verse. It also performs
any parsing and data format conversion requested based on a set of registered
format converters and a simple algorithm for selecting the best conversion.
As the protocol modules, data format converters can be registered dynamically,
and the current set of streams includes among others: MIME, SGML, HTML, and
LaTeX.
-
Error Object
-
This module manages an information stack which contains information of all
errors occurred during the communication with a remote server or simply
information about the current state. Using a stack for this kind of information
provides the possibility of nested error messages where each message can
be classified and filtered according to its impact on the current request,
for example "Fatal", "Non-Fatal", "Warning" etc. The
filtering can be used to decide which level of messages will be passed back
to the user.
-
Net Object
-
The net manager provides an interface for handling asynchronous sockets which
is an integral part of libwww.
-
Event Manager
-
The event manager is a "session layer" handling which thread should be the
active thread. A thread can either be a pseudo thread or a native thread,
for example a Posix thread, and the event loop can be provied by the application.
Libwww comes with an example of an event loop which ses a
select()
function call to decide which thread should be made
the active one, however the event loop may use another decision model. One
of the design ideas behind the event manager is that it can be extended to
a full session layer manager handling for example the control of a HTTP-NG
connection.
Henrik Frystyk, libwww@w3.org,
@(#) $Id: ControlFlow.html,v 1.17 1996/06/08 01:56:26 frystyk Exp $