ASEAWK

ASEAWK

ASE provides an embeddable processor of a dialect of the AWK programming language. The language implemented is slightly different from the version developed by Brian W. Kernighan and has been adjusted to the author's preference.

Overview

The following code fragment illustrates the basic steps of embedding the processor.

1) #include <ase/awk/awk.h>

2) ase_awk_t* awk;
3) awk = ase_awk_open (...);
4) if (ase_awk_parse (awk, ...) == -1)
   {
        /* parse error */
   }
   else
   {
5)     if (ase_awk_run (awk, ...) == -1)
       {
           /* run-time error */
       }
   }
6)  ase_awk_close (awk);
  1. Most of the functions and data types needed are defined in the header file ase/awk/awk.h.
  2. ase_awk_t represents the processor. However, the internal representation is not exposed.
  3. ase_awk_open creates the processor instance.
  4. ase_awk_parse parses an AWK script.
  5. ase_awk_run executes the script parsed.
  6. ase_awk_close destroys the processor instance.

More detailed description is available here. You may refer to other sample files such as ase/test/awk/awk.c and ase/awk/jni.c.

Primitive Functions

A set of primitive functions is needed to create an instance of the AWK processor. A primitive function is a user-defined function to help the library perform system-dependent operations such as memory allocation, character class handling.

typedef struct ase_awk_prmfns_t ase_awk_prmfns_t;

struct ase_awk_prmfns_t
{
    ase_mmgr_t mmgr;
    ase_ccls_t ccls;

    struct
    {
        ase_awk_pow_t     pow;
        ase_awk_sprintf_t sprintf;
        ase_awk_dprintf_t dprintf;
        void*             custom_data;
    } misc;
};

A caller of ase_awk_open should fill in most of the fields of a ase_awk_prmfns_t structure and pass the structure to it. The function pointers in the miscellaneous group labeled [misc] is defined as follows:

/* returns the value of x raised to the power of y */
typedef ase_real_t (*ase_awk_pow_t) (void* custom, ase_real_t x, ase_real_t y);

/* similar to snprintf of the standard C library. */
typedef int (*ase_awk_sprintf_t) (void* custom, ase_char_t* buf, ase_size_t size, const ase_char_t* fmt, ...);

/* similar to printf of the standard C library. called by a few uncommonly 
 * used output functions usually for debugging purpose */
typedef void (*ase_awk_dprintf_t) (void* custom, const ase_char_t* fmt, ...); 

The fourth field of the group is passed to its member functions as the first argument on invocation. The function pointed by the sprintf field should ensure that the resuliting string is null-terminated and %s and %c are accepted for the ase_char_t* and ase_char_t type respectively regardless the character mode.

The memory manager group labeled [mmgr] and the character class group labled [ccls] are defined as follows:

typedef void* (*ase_malloc_t)  (void* custom, ase_size_t n);
typedef void* (*ase_realloc_t) (void* custom, void* ptr, ase_size_t n);
typedef void  (*ase_free_t)    (void* custom, void* ptr);

typedef ase_bool_t (*ase_isccls_t) (void* custom, ase_cint_t c);
typedef ase_cint_t (*ase_toccls_t) (void* custom, ase_cint_t c);

struct ase_mmgr_t
{
	ase_malloc_t  malloc;
	ase_realloc_t realloc;
	ase_free_t    free;
	void*         custom_data;
};

struct ase_ccls_t
{
	ase_isccls_t is_upper;
	ase_isccls_t is_lower;
	ase_isccls_t is_alpha;
	ase_isccls_t is_digit;
	ase_isccls_t is_xdigit;
	ase_isccls_t is_alnum;
	ase_isccls_t is_space;
	ase_isccls_t is_print;
	ase_isccls_t is_graph;
	ase_isccls_t is_cntrl;
	ase_isccls_t is_punct;
	ase_toccls_t to_upper;
	ase_toccls_t to_lower;
	void*        custom_data;
};

The functions in these groups perform the memory operations and character class related operations respectively. They follow the style of the memory allocation functions and character class handling functions of the standard C library except that they accept a pointer to the user-defined data as the first argument, thus providing more flexibility. The pointer to the user-defined data is specified into the custom_data field of each group. The realloc field, however, can be ASE_NULL, in which case the functions pointed by the free and the malloc field replace the role of the function pointed by the realloc field.

Source IO Handler

The source code is handled by a source input handler provided by the user. The optional source code output handler can be provided to have the internal parse tree converted back to the source code.

The source code handler is defined as follows:

typedef ase_ssize_t (*ase_awk_io_t) (int cmd, void* custom, ase_char_t* data, ase_size_t count);

typedef struct ase_awk_srcios_t ase_awk_srcios_t;

struct ase_awk_srcios_t
{
    ase_awk_io_t in; /* source input */
    ase_awk_io_t out; /* source output */
    void*        custom_data;
};

The in field of the ase_awk_srcios_t is mandatory and should be filled in. The out field can be set to ASE_NULL or can point to a source output handling function. The custom_data field is passed to the source handlers as the second argument. The first parameter cmd of the source input handler is one of ASE_AWK_IO_OPEN, ASE_AWK_IO_CLOSE, ASE_AWK_IO_READ. The first parameter cmd of the source output handler is one of ASE_AWK_IO_OPEN, ASE_AWK_IO_CLOSE, ASE_AWK_IO_WRITE. The third parameter data and the fourth parameter count are the pointer to the buffer to read data into and its size if the first parameter cmd is ASE_AWK_IO_READ while they are the pointer to the data and its size if cmd is ASE_AWK_IO_WRITE.

The source handler should return a negative value for an error and zero or a positive value otherwise. However, there is a subtle difference in the meaning of the return value depending on the value of the first parameter cmd.

When cmd is ASE_AWK_IO_OPEN, the return value of -1 and 1 indicates the failure and the success respectively. In addition, the return value of 0 indicates that the operation is successful but has reached the end of the stream. The library calls the handler with ASE_AWK_IO_CLOSE for deinitialization if the return value is 0 or 1. When cmd is ASE_AWK_IO_CLOSE, the return value of -1 and 0 indicate the failure and the success respectively. When cmd is ASE_AWK_IO_READ or ASE_AWK_IO_WRITE, the return value of -1 indicates the failure, 0 the end of the stream, and other positive values the number of characters read or written.

The typical source handler will look as follows:

ase_ssize_t awk_srcio_in (int cmd, void* arg, ase_char_t* data, ase_size_t size)
{
    if (cmd == ASE_AWK_IO_OPEN)
    {
        /* open the stream */
        return 1;
    }
    else if (cmd == ASE_AWK_IO_CLOSE)
    {
        /* close the stream */
        return 0;
    }
    else if (cmd == ASE_AWK_IO_READ)
    {
        /* read the stream */
        return the number of characters read;
    }

    return -1;
}

ase_ssize_t awk_srcio_out (int cmd, void* arg, ase_char_t* data, ase_size_t size)
{
    if (cmd == ASE_AWK_IO_OPEN) 
    {
    /* open the stream */
        return 1;
    }
    else if (cmd == ASE_AWK_IO_CLOSE)
    {
        /* close the stream after flushing it */
        return 0;
    }
    else if (cmd == ASE_AWK_IO_WRITE)
    {
        /* write the stream */
        return the number of characters written;
    }

    return -1;
}

Once you have the source handler ready, you can fill in the fields of a ase_awk_srcios_t structure and pass it to the call of ase_awk_parse.

ase_awk_srcios_t srcios;

srcios.in = awk_srcio_in;
srcios.out = awk_srcio_out;
srcios.custom_data = point to the extra information necessary; 

if (ase_awk_parse (awk, &srcios) == -1)
{
	/* handle error */
}

External IO Handler

External IO handlers should be provided to support the AWK's built-in IO facilities.