Date: 7/11/92 15:00 EDT From: Charles Lasner (lasner@watsun.cc.columbia.edu) To: pdp8-lovers@ai.mit.edu Subj: Announcement of new Kermit-12 utilities Currently available at watsun.cc.columbia.edu via ftp in /pub/ftp/kermit/d/k12*.* are two new files: k12enc.pal and k12dec.pal. The new files have internal dates of 8-Jul-92. Several other documentation files with similar dates are also available. Anyone who last retrieved Kermit-12 files prior to Feb, 1992 should retrieve much of the current collection which has changed since then. These are the newly updated ENCODE and DECODE programs for moving around OS/8 binary files in a reliable manner. There is several new feature added to both: the ability to deal with "raw" image data sent as a single file, and also sent split into two files. Consider the following usage: .R ENCODE *BIGGY.EN80.) All files are now 80 or less. Except for the .DOC file, all it took was a little "cosmetic surgery" on a few lines. FTP'd copies are mostly unaffected. Most of the problems have to do with interpretation of the inter-page FF character being treated as the first character of the "record" in this non-stream-oriented system. At this time there is no actual doc file, as the file K12MIT.DOC is merely a truncation of the listing of K12MIT.PAL as passed through PAL8 and CREF. Anyone with a system big enough to support a 200K+ long source file can create this file themselves. In addition, due to certain quirks within PAL8 and CREF "beating" against unix line conventions, the file K12MIT.DOC at watsun.cc.columbia.edu was slightly different from the precise output of the assembly process, but again, only a cosmetic change. Since this file greatly exceeded the KERMSRV restriction, it has been withdrawn in favor of the source fragment equivalent to it taken directly from K12MIT.PAL. This source fragment is short enough that even an RX01-based OS/8 system can create the listing file from it thus recreating the original K12MIT.DOC locally. All this will disappear in the future when a "proper" doc file appears. In the meantime, K12MIT.DOC in whatever form it is available contains hardware hints and kinks, assembly options, and other info useful to users and anyone interested in the "innards" of the program, as well as an edit history of how K12MIT got to be where it is now starting from its "grandfather" K08MIT. It ends at the first line of the code in K12MIT.PAL, but includes all of the special purpose definitions particular to the various devices supported, such as DECmate I, DECmate II, etc. Any changes to customize KERMIT-12 are still accomplished using the separate patch file K12PCH.PAL which is unchanged. New files cover two areas: 1) direct loading without KERMIT-12, and 2) .BOO format support. 1) Many users have the hardware for running KERMIT-12, but don't already have it or another suitable program to acquire it yet, a real "catch-22" situation. Towards that end, a set of utilities has been provided to directly load KERMIT-12 without already having it. Most PDP-8 sites do have access to some other machine. Hopefully, the serial connection to be used is fairly "clean" and error-free, or at least some of the time. These programs depend on this fact. This could either be a connection to a remote multi-user system or something like a null-modem connection to a nearby IBM-PC. The programs assume only a few things: a) The connection is error free. b) The other end doesn't absolutely require anything be sent to it to send data to the PDP-8 end. (The -8 end will not send ^S/^Q or anything like that because this is unnecessary; all data goes only into PDP-8 memory directly.) c) The other end will send the data at a time controlled from its end, or after at most one character sent from the PDP-8 end of the link. The first situation is illustrated by the example of a PC connected to the -8. The -8 program is started, and it waits indefinitely after the -8 user presses any one key. (The corresponding character is sent to the PC where it is ignored.) The PC end is initiated with a command such as COPY K12FL0.IPL AUX: and the data goes to the -8. The second situation is illustrated by a remote system where a command such as TYPE K12FL0.IPL is available. The delimiting CR is not typed at this time, and will be finished later by the loading program. The initial connection up until the TYPE command is not covered by the loading program itself, so the user must supply a basic comm program, which is possible to accomplish in about 10 words or less if the rates are "favorable", or worst-case, a terminal can be used and the line switched over to the -8 at the appropriate time. In any case, CR or other appropriate character is hit on the -8 and the loading program echoes it down the line (and on the console) to initiate the data down-load. d) The other end is assumed to send the file verbatim without insertion of characters (octal 177) and upper-case/lower-case is preserved. If all of these assumptions are met, then the down-load accomplishs a partial acquisition of K12MIT.SV, the primary binary file of KERMIT-12. The process must be repeated several times to acquire all portions. If a local compare utility is available that can compare absolute binary files, perhaps the process can be totally repeated to assure reliable results by comparing runs. The method used is borrowed from the field-service use of a medium-speed serial port reader on the -8 for diagnostic read-in. This reader is *almost* compatible with the device 01 reader such as the PC8E. The difference is that the *real* PC8E is fully asynchronous, whereas the portable reader just spews out the characters without any protocol. The PC8E can't drop any characters in theory, although there are reports of misadjusted readers that drop characters at certain crucial data rates. (The PC8E runs at full speed if possible, and failing this falls back to a much slower speed. All operations depend on the use of the hardware handshakes of the IOTs etc., so nothing should be lost but throughput. Misadjusted readers may drop characters when switching over to the slower mode.) The reason the field reader is acceptable is that it is used only to load diagnostics directly into memory using the RIM and BIN loaders. These minimal applications can't possibly fall behind the reader running at full speed. This is the same principle used here to down-load KERMIT-12. The loading program is a 46 word long program suitable to be toggled into ODT and saved as a small core-image program. The user starts the program and then (at the appropriate time) presses one key (usually CR if it matters) and the loader waits for remote input. As the other end sends the data, it is directly loaded into memory. There is a leader/trailer convention, just like paper-tape binary, so at end-of-load the program exits to OS/8 at 07600. At this time the user issues a SAVE command. This completes the down-load of a single field of K12MIT.SV. At the current time, there are actually two fields of K12MIT.SV, namely 00000-07577 and 10000-17577, and there are two such loaders. There is no check for proper field, so the proper loader must be used with the proper data, else the fields will get cross-loaded and will certainly fail. Once the two fields are obtained as separate .SV files (named FIELD0.SV and FIELD1.SV) they can be combined using ABSLDR.SV with the /I switch (image mode) set. The resultant can be saved as K12MIT.SV. This, if all went well, is identical in every way to the distributed K12MIT.SV (which is only distributed in encoded form; see below). Actual file differences will only exist in the extraneous portions of the file representing the header block past all useful information and the artifacts of loading which represent 07600-07777 and 17600-17777 which are not used. This is the normal case for any OS/8 system when any file is saved. Merely saving an image twice will cause this to happen. At this point, K12MIT.SV can be used as intended, namely to acquire, via KERMIT protocol, the entire release. It is recommended that the provisional copy of K12MIT.SV be abandoned as soon as the encoded copy is decoded since the encoding process provides some assurances of valid data (using checksumming, etc.). This process can be accomplished on any KL-style -8 interface including PT08, etc., or on the printer port of VT-78 and all DECmates. When used on the DECmates, there may be some minor problems associated with the down-load which may have to be done as the first use of the printer port after power-on, or some other restriction. The loader includes a suggested instruction for DECmate use if problematic (and raises the program length to 47 words). Also, due to observed bugs in the operating system (OS/278 only), there are restrictions on the use of ABSLDR.SV that cause certain command forms to fail while other seemingly equivalent forms succeed! This is documented in the latest K12MIT.BWR file in the distribution. The command form stated in the K12IPL.PAL file is the only known form that works correctly on these flawed systems. The format for down-load files is known as .IPL or Initial Program Load format. It consists of a leader containing only lower-case letters (code 141-177 only) followed by "printable" data in the range 041 (!) through 140 (`). Each of the characters represents six bits of data, to be read left to right as pairs, which load into PDP-8 12-bit memory. The implied loading address is always to start at 0000 of the implied field. The leader comment contains documentation of which field of data from K12MIT.SV it is. The trailer consists of one lower-case character followed by anything at all. This is why it is crucial that DEL (177) not appear anywhere in the body of the file. Throughout the file, all codes 040 or less are ignored. This allows for spaces in the lower-case leader for better readability, and for CR/LF throughout the entire file. CR/LF is added every 32 words (64 characters) to satisfy certain other systems' requirements. The trailer contains documentation on a suggested SAVE command for the particular data just obtained. 2) PDP-8 ENCODE format is the format of choice to obtain binary OS/8 image files because of the validation techniques employed, etc. This is the standard method of distributing K12MIT.SV as well as other "critical" files such as TECO macros and other image files. In the MS-DOS world there exists another very popular format known as .BOO encoding. It would be useful to support this format on the PDP-8 as well. .BOO format files are smaller because they use six-bit encoding instead of five-bit encoding, or at least in theory. Both ENCODE and .BOO use repeat compression techniques, but ENCODE can compress 12-bit words of any value, while .BOO only compresses zeroes and that itself is based on a byte-order view of the data. PDP-8 programs often include large regions of non-zero words such as 7402 (HLT) which would not compress when looked at as bytes. Such files would show compression ratios quite different from the norm. In any case, .BOO format is useful on the PDP-8 because it allows inter-change with .BOO files created on other systems, such as PCs. This allows the exchange of unusually formatted files, such as TECO macros between PDP-8s and PCs. (Both systems support a viable version of TECO.) The new KERMIT-12 utilities include a .BOO encoder and .BOO decoder, known as K12ENB.PAL (or ENBOO.PAL) and K12DEB.PAL (or DEBOO.PAL) respectively. They use .BOO encoded files unpacked in the standard OS/8 "3 for 2" order to preserve the original byte contents when the files originate from other systems. (Technically, .BOO format doesn't require this, but the obvious advantages dictate it. Anything encoded into .BOO format must merely have a 24-bit data structure encoded into four six-bit characters, so in theory any encoding of two adjacent PDP-8 12-bit words would be acceptable. By additionally supplying the bits in OS/8 pack/unpack order guarantees the inter-system compatibility as well.) There is an inherent weakness in the original .BOO format which must be addressed. .BOO format files always end on one of two data fields: either a repeat-zero compression field, or on a 24-bit field expressed as four characters. Should the data in a 24-bit field consist of only two or even one bytes, there are one or two extraneous null bytes encoded into the field to complete it. Presumably the need to add the extra bytes is to allow validation of the format. In any case, only the encoder knows just how many (0, 1, 2) bytes are extraneous. We can presume that if the last byte is non-zero, it is significant. If the last two are both zero, then the last or possibly both are extraneous with no way to tell. On PC systems, the general trend is to ignore these one or two extra bytes because so far there haven't been any complaints of failure. I have personally discovered that a widely used PC .BOO encoding program (written in C) erroneously adds two null bytes as a short compression field beyond the data! This is not a .BOO format issue, but rather a genuine program bug. Apparently few PC users are concerned that encoding their files prevents transparent delivery to the other end. In the OS/8 world, the situation is quite different. Each OS/8 record is 256 words or 384 bytes. If even a single byte is added, this creates an additional all-zeroes record. Besides wasting space, it is conceivable that such a file could be dangerous to use under OS/8 depending on content. (Certain files, such as .HN files are partially identified by their length. File damage, such as lengthening a file from two to three records will confuse the SET utility, etc.) Many files cannot be identified as having been artifically lengthened (and may be hard to shorten!), so this must be avoided. I have invented a fix for the problem: repeat compression fields are expressed as ~ followed by a count. 2 means two null bytes and is thus the smallest "useful" field to be found. (It takes two characters to express what would take 2-2/3 characters in encoded format. One null would only take 1-1/3 characters, not two, so this case is vestigial, but must be supported for the benefit of brain-dead encoders.) The value of 0 means a count of literally zero, thus ~0 is a "NOP" to a decoder. I have successfully tested MS-DOS programs written in BASIC and C that decode .BOO files successfully even if ~0 is appended to the end with no ill effects. (They correctly ignored the appended fields.) In my encoding scheme, ~0 at the end of a data field containing trailing zeroes means to "take back" a null byte. ~0~0 means to take back two null bytes. Thus files encoded with ENBOO.PAL either end in a repeat-compression field as before, or in a data encoding field possibly followed by ~0 or ~0~0 if necessary. The corresponding DEBOO.PAL correctly decodes such files perfectly. Should files encoded with ENBOO reach "foreign" systems, they will do what they always do, i.e., make files one or two bytes too long occasionally, with no other ill effects. Files originating from such systems will certainly be lacking any trailing correction fields and will cause DEBOO to perform as foolishly as MSBPCT. Extraneous null bytes will appear at the end of the file in OS/8 just as in MS-DOS in this case. (Note that if the file length is not a multiple of 384 bytes, additional bytes are added by DEBOO as well, but this is not a design weakness of .BOO format. It is caused by the clash of fixed record size and a variable size format.) Hopefully, files originating on OS/8 will be decoded on OS/8 as well, thus preserving file lengths. Most "foreign" files will probably be ASCII, so the ^Z convention will allow removal of trailing null bytes at either end. It is hoped that MS-DOS and other systems "upgrade" their .BOO format files to be compatible with the PDP-8 version. All KERMIT-12 files are available via the normal distribution "paths" of anonymous FTP and/or KERMSRV. The user is directed to the file /ftp/pub/kermit/d/k12mit.dsk as a "roadmap" to the entire distribution. Each .PAL file includes assembly instructions. Most use non-default option switches and non-default loading and saving instructions, so each must be carefully read. The development support files (TECO macro, .IPL generator, recent copies of PAL8, CREF, etc.) are included in the total collection. Development is not possible on RX01 systems due to inadequate disk space, but RX02's are barely adequate with a lot of disk exchanges. (Future versions may require larger disks for development.) Charles Lasner (lasner@watsun.cc.columbia.edu)