                   TAWK Compiler for MS-DOS, Version 4.0
                        Reviewed by James K. Lawless

        Thompson automation Software has done an incredible job of enhancing
the Unix-born AWK text-processing language. The AWK language was created as
a means to manipulate text data generated by Unix utilities. This C-like
interpreted language made its way into the MS-DOS arena in the late 1980's
when Unix tools became popular with PC software developers.
        AWK's purpose was to provide text manipulation functionality in a
form that was conducive to rapid program development. Often, the
minimization in software development time gained by using AWK was contrasted
with slow program performance. Since AWK is usually implemented as an
interpreter, AWK systems are known for being somewhat sluggish in their
program execution.
        Thompson Automation Software has augmented the AWK language with a
number of significant improvements forming a new product known as TAWK. Both
a compiler and interpreter are provided for TAWK programming. Compiled code
can be made to operate fully "stand-alone". The EXE files produced may be
compiled in a special form that requires a run-time EXE file to execute
properly. This method is utilized so that the size of the EXE files are
smaller (leaving the "engine" in a commonly accessible module). A technique
similar to this was used by some BASIC compilers.
        With true compilation, TAWK provides an attractive feature known as
"separate compilation". This feature allows the developer to separately
compile multiple TAWK source files so that they may later be combined into a
single EXE. The importance of this facility becomes evident as the
programmer begins to develop function libraries that will be reused in
multiple EXE's.
        In addition to a compiler and interpreter, the TAWK system boasts an
interactive debugger! This is an interesting feature of the TAWK system. The
debugger itself is a special TAWK program ( DEBUG.AWK ) that you compile
with the other source files that comprise the modules in the EXE.
        When running the program, the debug screens automatically pop-up,
sporting a look similar to contemporary C debuggers such as Microsoft's
CodeView The TAWK language contains special functionality which allows a
running TAWK program to manipulate any variable used in a program
indirectly. Additionally, a set of special functions with a prefix of
"debug_" in the function name actually enable the TAWK program to be placed
in a controlled state for debugging purposes. While this may sound odd, it's
a very impressive feature. If the debugger doesn't meet your needs, change
it! An accompanying text file warns that customizing the debugger is best
left to an expert, but does little else to dissuade the reader from
attempting customization. In fact, after the initial warning, the text file
contains over 100 lines detailing the theory-of-operation of the debugger.
        A very interesting feature of TAWK is that it will utilize all
available EMS and/or XMS memory available ( this is subject to fine-tuning
via a special set of control variables ). Additionally, TAWK can swap to
disk files for additional virtual memory, allowing for huge data usage.
Special built-in variables can be used to constrict the amount of memory
that is subject to swapping to a particular virtual-memory medium.
        A powerful feature of the traditional AWK language is a mechanism
known as an "associative array". This is an array which can use a string as
an index rather than a number. In traditional languages such as C, you may
have an array "x" which is ten elements wide ( ordered 0,1,..,9). Accessing
the first element would require a reference to x[0].
        An associative array may have elements such as x["name"],
x["address"] ...etc. One advantage of an associative array is that the array
grows dynamically as you add to it. A special version of the "for" keyword
borrowed from the C language allows for iterating through all values stored
in the array, sorted in the order of the indices ( from the prior example,
the word "address" would be derived before the word "name" ). Particular
enhancements that TAWK has provided include the ability to control the
method used in sorting the arrays. Consider the following program:

{ x[$0] }
END { for(i in x) print i }

        This TAWK source code yields a text sort program that is far
superior to the MS-DOS sort. It is, however, dependent on each input line
being unique ( as the documentation clearly states ). Minimal changes are
necessary to provide support for non-unique entries. Using the compiled form
of this program, I sorted a file of over one megabyte of data (approximately
22,000 lines of text from a variety of product documentation files). The
EMS/XMS usage I described earlier provided for a transparent means to store
the entire file in RAM for sorting.
        For you bit-bangers out there, TAWK provides for low-level access to
MS-DOS resources and PC hardware. You can manipulate absolute memory
locations, perform I/O to hardware ports, and generate interrupts. Special
functions cause variables to "freeze" their locations in memory when they
are utilized in interrupt calls. This prevents TAWK's virtual memory
processor from swapping them out before the interrupt call is complete.
        In addition to the low-level functions a full set of screen I/O and
file I/O functions are available in the TAWK system. The file I/O functions
include including fseek(), provisions for file-locking, and shared file
access on local-area-networks (LAN's). A set of directory functions ( such
as findfirst() and findnext() ) is also available.
        The screen I/O functions seem to work adequately, but they depend on
the BIOS for the actual I/O. The book states "Under DOS (sic) high speed is
obtained by using the direct video interrupt ( number 16 ), thus bypassing
the ANSI.SYS driver, if any".
        This term "high speed" and "direct video" may be misleading. "Direct
video" access generally refers to the low-level manipulation of the video
RAM and I/O ports to achieve maximum speed. BIOS screen routines are
infamously slower than direct access ( due to the extra layers of code
involved ). Nonetheless, I did not notice any delay in the sample program
which implements a highlight-bar menu ( on a 486/25sx computer ).
        What if TAWK doesn't have a feature that you're looking for? If you
can't find some means of performing a particular function within the TAWK
system, you have the option of linking in C code. Compilers supported are
Microsoft C 6.0, Microsoft C++ 7.0 or higher ( including the 16-bit versions
of Visual C++ )  Borland C and Turbo C ( 3.0 or higher ). The large-memory
model must be used for these external functions. The functions must not
return pointers ( they will be misinterpreted by the TAWK system ). Consider
the following two files:

TESTER.AWK
   # Declare an external C function
   extern void write_str( char*)
   {
      write_str($0)
   }

WR.C
   // Implement the external function write_str
   #include <stdio.h>

   void write_str(char *s)
   {
      printf("*** %s\n",s);
   }

        The program linked perfectly and ran as expected, displaying strings
of text prefixed by the characters "***".
        Traditionally, AWK can only manipulate text files. You may have
surmised that TAWK can manipulate binary files using the fopen() family of
functions. In addition to being able to open these files, a special field
translation mechanism is available to minimize the amount of code required
to process binary data.
        The TAWK function unpack() can be used to translate a block of
binary data into a set of text elements within an associative array. The
counterpart function pack() converts the associative array elements back to
their binary form. Supported conversion types are as follows: ASCII string
with NULL padding, ASCII string with space padding, Byte, 16-bit integer,
32-bit integer, Single-precision floating-point, Double-precision
floating-point.
        To see how easy-to-use this feature was, I wrote a short program
which dumps the MESSAGES.DAT file from a QWK-mail packet to the console.
Consider the following program:

BEGIN {
   QWKTmp="status@c msgno@7A date@8A time@5A to@25A " \
   "from@25A subj@25A pswd@12A refmsg@8A numblks@6A " \
   "flag@c cnum@S mnum@S tag@c"
   fileptr = fopen("MESSAGES.DAT","rb");
   work = fread(128,fileptr)
   while(1) {
      header = fread(128,fileptr)
      if(feof(fileptr))
         exit(0)
      unpack(QWKTmp,header,hi)
      print "To      " hi["to"]
      print "From    " hi["from"]
      print "Subject " hi["subj"]
      print  hi["date"] " " hi["time"]
      for(i = 1; i<  hi["numblks"];i++ ) {
         work=fread(128,fileptr)
         gsubs("\xE3","\n",work)
         printf("%s", work )
      }
      print
   }
   close(fileptr)
}

        The QWKTmp string is a format string for the unpack() function. The
identifier on the left side of the "@" symbol is used as an index to the
specified associative array ( "hi", in this example ). After the unpack()
operation is complete for the header, the for-loop iterates through each
block containing the lines of text in the message. Not too shabby for a only
a screenful of source code!
        The only suggestion I have for this development system is that
Pascal data types and BASIC data types should be supported by the
pack()/unpack() system. Turbo Pascal has a 6-byte floating-point data type.
Pascal, in general, uses a special kind of string known as a
packed-array-of-characters (PAC). BASIC uses a special form of
floating-point, as well. With these additions, I could easily process a
number of files that contain these language-specific data types. The
facilities DO exist within TAWK so that I could actually write this myself,
but I'd bet that the routines built into the compiler would be faster and
more efficient.

I highly recommend this product to the following classes of PC users:

PC Support Staff
        If you write batch files as part of your job, do yourself a favor
and take the time to learn this package.  TAWK provides for a very powerful
means of manipulating files and directories.

Electronic Bulletin Board System Operators
        If you're a BBS sysop, you probably have a plethora of text-based
information that your users   access on a regular basis. TAWK is just the
language for you! Although it doesn't readily support modem I/O, it can be
accomplished either via port I/O, external C routines, or via an interrupt
interface to a FOSSIL driver.

The C/C++, Pascal, or BASIC Programmer
        Some facets of TAWK will be more natural to C programmers, since the
syntax of the language was derived from C. I believe that programmers using
traditional languages will be pleasantly surprised at  the level of
productivity and creative freedom they can attain by using this product.

Anyone with "older" PC Equipment
        The automatic usage of EMS/XMS/Disk-swapping makes this an
attractive language for those older computers out there.

        The traditional rapid-development philosophy of AWK is maintained in
this product. However, it is enhanced with features usually found in
contemporary programming languages. The seamless cohesion of these two
sometimes-mutually-exclusive philosophies certainly makes this a product
worth looking into. The price of the single-user-licensed TAWK compiler ( at
the time of this review) is $149.00. Please contact Thompson Automation
Software for other pricing information.

                        Thompson Automation Software
                             5616 SW Jefferson
                          Portland, OR 97221-2597
                            Phone: 503-224-1639
                            FAX:   503-224-3230

  Send your postal name, address, city, state, zip to 21prod@supportu.com
         for product literature to be sent to you via postal mail.
