THE C-INTERCAL SUPPLEMENTAL REFERENCE MANUAL Version 0.7 by Louis Howell, 12/16/91 Version 0.8 info by Eric S. Raymond, 01/18/92 1. CONTRADICTION (with apologies to A. A. Milne) In which we attempt to justify the existence of this file to you, the reader, who since you are reading the file have probably already accepted, at least provisionally, its right to exist. 1.1 JUSTIFICATION Verily it is written, "When a program is useful it must be changed, when it is useless it must be documented." The intention is that, unlike the original manual, this file should not be cast in stone, but rather should change to reflect later additions or modifications to the language. Just keep it neat and accurate, ok? 2. CONSTRUCTS FROM INTERCAL-72 In which we discuss the detailed behavior of some features that have either changed from the original version of the language, or were not completely specified in the original manual. 2.1 SELECT The original manual defines the return type of a SELECT operation to depend on the number of bits SELECTed. The present compiler takes the easier route of defining the return type to be that of the right operand, independent of its actual value. This form has the advantage that all types can be determined at compile time. Putting in run time type checking would add significant overhead and complication, to effect a very minor change in language semantics. The only time this distinction makes any difference is when a unary operator is applied to the SELECT result. This happens extremely rarely in practice, the only known instance being the 32-bit greater-than test in the standard library, where an XOR operator is applied to the result of SELECTing a number against itself. The authors first SELECT the result against #65535$#65535 to insure that XOR sees a 32-bit value. With the current compiler this extra step is unnecessary, but harmless. The cautious programmer should write code that does not depend on the compiler version being used. We therefore suggest the following guideline for determining the SELECT return type: A SELECT operation with a 16-bit right operand returns a 16-bit value. The return type of a SELECT operation with a 32-bit right operand is undefined, but is guaranteed to be an acceptable input to a MINGLE operation so long as 16 or fewer bits are actually selected. Correct code should not depend on whether the return type is 16 or 32 bits. 2.2 IGNORE Though the manual states that the value of an IGNOREd variable cannot change, it is unclear about whether or not a statement which appears to change an IGNOREd variable is executed or not. This may appear to be a "If a tree falls in the forest..." type of question, but if the statement in question has other side effects it is not. Since another mechanism already exists for ABSTAINing from a statement, we suggest that IGNORE only prevent the changing of the specific variable in question, not the execution of the entire statement. In the present version of the language this only makes a difference for the WRITE IN statement. Attempting to WRITE IN to an IGNOREd variable will cause a number to be read from the input, which will be discarded since it cannot be stored in the variable. Various proposals to make assignment an operator, unlike the '=' operator in C, would depend more heavily on this condition. It's not clear whether such a feature is worth adding to the language, but if it is it should be done right. 3. COME FROM In which we try to precisely define a statement that should never have been born, but is nevertheless one of the more useful statements in INTERCAL. 3.1 BACKGROUND The earliest known description of the COME FROM statement in the computing literature is in [R. L. Clark, "A linguistic contribution to GOTO-less programming," Commun. ACM 27 (1984), pp. 349--350], part of the famous April Fools issue of CACM. The subsequent rush by language designers to include the statement in their languages was underwhelming, one might even say nonexistent. Eric Raymond therefore decided that COME FROM would be an appropriate addition to INTERCAL, and proceeded to implement it in his C-INTERCAL compiler. That initial implementation included support not only for COME FROM itself, but also for such related forms as DO ABSTAIN FROM COMING FROM. It had some bugs, however, was not precisely documented, and was not used by any significant piece of INTERCAL source code. In the present distribution the bugs are (hopefully) fixed, the statement has been used in a couple of programs, and the following section attempts to precisely define what the whole thing means. 3.2 DESCRIPTION There are two useful ways to visualize the action of the COME FROM statement. The simpler is to see that it acts like a GOTO when the program is traced backwards in time. More precisely, the statements (1) DO . . . (2) DO COME FROM (1) should be thought of as being equivalent to (1) DO (2) DO GOTO (3) . . . (3) DO NOTHING if INTERCAL actually had a GOTO statement at all, which of course it doesn't. What this boils down to is that the statement DO COME FROM (label), anywhere in the program, places a kind of invisible trap door immediately after statement (label). Execution or abstention of that statement is immediately followed by an unconditional jump to the COME FROM, unless the (label)ed statement is an executed NEXT, in which case the jump occurs if the program attempts to RESUME back to that NEXT statement. It is an error for more than one COME FROM to refer to the same (label). Modification of the target statement by ABSTAIN or by the % qualifier affects only that statement, not the subsequent jump. Such modifications to the COME FROM itself, however, do affect the jump. Encountering the COME FROM statement itself, rather than its target, has no effect. In the current C-INTERCAL implementation, the compiler places a simple label at the location of the COME FROM in the program, while all of the machinery for checking for abstention and conditional execution and actually performing the jump is placed immediately after the code for the target statement. 4. OUTSIDE COMMUNICATION In which we try to remedy the fact that, due to I/O limitations, INTERCAL can not even in principle perform the same tasks as other languages. It is hoped that this addition will permit INTERCAL users to waste vast quantities of computer time well into the 21st century. 4.1 MOTIVATION One of the goals of INTERCAL was to provide a language which, though different from all other languages, is nevertheless theoretically capable of all the same tasks. The original version failed to accomplish this because its I/O functions could not handle arbitrary streams of bits, or even arbitrary sequences of characters. A language which can't even send its input directly to its output can hardly be considered as capable as other languages. 4.2 TURING TEXT MODEL To remedy this problem, character I/O is now provided in a form based on the "Turing Text" model, originally proposed by Jon Blow. The INTERCAL programmer can access this capability by placing a one- dimensional array in the list of items given to a WRITE IN or READ OUT statement. On execution of the statement, the elements of the array will, from first to last, be either loaded from the input or sent to the output, as appropriate, in the manner described below. There is currently no support for I/O involving higher-dimensional arrays, but some form of graphics might be a possible 2-D interpretation. The heart of the Turing Text model is the idea of a continuous loop of tape containing, in order, all the characters in the machine's character set. When a character is received by the input routine, the tape is advanced the appropriate number of spaces to bring that character under the tape head, and the number of spaces the tape was moved is the number that is actually seen by the INTERCAL program. Another way to say this is that the number placed in an INTERCAL array is the difference between the character just received and the previous character, modulo the number of characters in the machine character set. Output works in just the opposite fashion, except that the characters being output come from the other side of the tape. From this position the characters on the tape appear to be in reverse order, and are individually backwards as well. (We would show you what it looks like, but we don't have a font with backwards letters available. Use your imagination.) The effect is that a number is taken out of an INTERCAL array, subtracted from the last character output--- i.e., the result of the last subtraction---and then sent on down the output channel. The only catch is that the character as seen by the INTERCAL program is the mirror-image of the character as seen by the machine and the user. The bits of the character are therefore taken in reverse order as it is sent to the output. Note that this bit reversal affects only the character seen by the outside world; it does not affect the character stored internally by the program, from which the next output number will be subtracted. All subtractions are done modulo the number of characters in the character set. Two different tapes are used for input and for output to allow for future expansion of the language to include multiple input and output channels. Both tapes start at character 0 when a program begins execution. On input, when an end of file marker is reached the number placed in the array is one greater than the highest- numbered character on the tape. 4.3 EXAMPLE PROGRAM If all this seems terribly complicated, it should be made perfectly clear by the following example program, which simply maps its input to its output (like a simplified UN*X "cat"). It assumes that characters are 8 bits long, but that's fine since the current version of the compiler does too. Standard library routines for addition and subtraction are not included here, as they are listed in the original manual. DO ,1 <- #1 DO .4 <- #0 DO .5 <- #0 DO COME FROM (30) DO WRITE IN ,1 DO .1 <- ,1SUB#1 DO (10) NEXT PLEASE GIVE UP (20) PLEASE RESUME '?.1$#256'~'#256$#256' (10) DO (20) NEXT DO FORGET #1 DO .2 <- .4 DO (1000) NEXT DO .4 <- .3~#255 DO .3 <- !3~#15'$!3~#240' DO .3 <- !3~#15'$!3~#240' DO .2 <- !3~#15'$!3~#240' DO .1 <- .5 DO (1010) NEXT DO .5 <- .2 DO ,1SUB#1 <- .3 (30) PLEASE READ OUT ,1 For each number received in the input array, the program first tests the #256 bit to see if the end of file has been reached. If not, the previous input character is subtracted off to obtain the current input character. Then the order of the bits is reversed to find out what character should be sent to the output, and the result is subtracted from the last character sent. Finally, the difference is placed in an array and given to a READ OUT statement. See? We told you it was simple! 5. TriINTERCAL In which it is revealed that bitwise operations are too ordinary for hard-core INTERCAL programmers, and extensions to other bases are discussed. These are not, strictly speaking, extensions to INTERCAL itself, but rather new dialects sharing most of the features of the parent language. 5.1 MOTIVATION INTERCAL is really a pretty sissy language. It tries hard to be different, but when you get right down to its roots, what do you find? You find bits, that's what. Plain old ones and zeroes, in groups of 16 and 32, just like every other language you've ever heard of. And what operations can you perform on these bits? The INTERCAL operators may arrange and permute them in weird and wonderful ways, but at the bit level the operators are the same AND, OR and XOR you've seen countless times before. Once the prospective INTERCAL programmer masters the unusual syntax, she finds herself working with the familiar Boolean operators on perfectly ordinary unsigned integer words. Even the constants she uses are familiar. After all, who would not immediately recognize #65535 and #32768? It may take a just a moment more to figure out #65280, and #21845 and #43690 could be puzzles until she notices that they sum to #65535, but basically she's still on her home turf. The 16-bit limit on constants actually works in the programmer's favor by insuring that very long anonymous constants can not appear in INTERCAL programs. And this is in a language that is supposed to be different from any other! 5.2 ABANDON ALL HOPE... Standard INTERCAL is based on variables consisting of ordinary bits and familiar Boolean operations on those bits. In pursuit of uniqueness, it seems appropriate to provide a new dialect, otherwise identical to INTERCAL, which instead uses variables consisting of trits, i.e. ternary digits, and operators based on tritwise logical operations. This is intended to be a separate dialect, rather than an extension to INTERCAL itself, for a number of reasons. Doing it this way avoids word-length conflicts, does not spoil the elegance of the Spartan INTERCAL operator set, and dodges the objections of those who might feel it too great an alteration to the original language. Primarily, though, giving INTERCAL programmers the ability to switch numeric base at will amounts to excessive functionality. So much better that a programmer choose a base at the outset and then be forced to stick with it for the remainder of the program. 5.3 COMPILER OPERATION The same compiler, ick, supports both INTERCAL and TriINTERCAL. This has the advantage that future bug fixes and additions to the language not related to arithmetic immediately apply to both versions. The compiler recognizes INTERCAL source files by the extension '.i', and TriINTERCAL source files by the extension '.3i'. It's as simple as that. There is no way to mix INTERCAL and TriINTERCAL source in the same program, and it is not always possible to determine which dialect a program is written in just by looking at the source code. 5.4 DATA TYPES The two TriINTERCAL data types are 10-trit unsigned integers and 20-trit unsigned integers. All INTERCAL syntax for distinguishing data types is ported to these new types in the obvious way. Small words may contain numbers from #0 to #59048, large words may contain numbers from #0$#0 to #59048$#59048. Errors are signaled for constants greater than #59048 and for attempts to WRITE IN numbers too large for a given variable or array element to hold. Note that though TriINTERCAL considers all numbers to be unsigned, nothing prevents the programmer from implementing arithmetic operations that treat their operands as signed. Three's complement is one obvious choice, but balanced ternary notation is also a possibility. This latter is a very pretty and symmetrical system in which all 2 trits are treated as if they had the value -1. 5.5 OPERATORS The TriINTERCAL operators are designed to inherit the relevant properties of the standard INTERCAL operators, so that both can be considered as merely different aspects of the same Platonic ideal. (Not that the word "ideal" is ever particularly relevant when used in connection with INTERCAL.) 5.5.1 BINARY OPERATORS I The binary operators carry over from the original language with only minor changes. The MINGLE operator ($) creates a 20-trit word by alternating trits from its two 10-trit operands. The SELECT operator (~) is a little more complicated, since the ternary tritmask may contain 0, 1, and 2 trits. If we observe that the SELECT operation on binary operands amounts to a bitwise AND and some rearrangement of bits, it seems appropriate to base the SELECT for ternary operands on a tritwise AND in the analogous fashion. We therefore postpone the definition of SELECT until we know what a tritwise AND looks like. 5.5.2 UNARY OPERATORS The unary operators in INTERCAL are all derived from the familiar Boolean operations on single bits. To extend these operations to trits, we first ask ourselves what the important properties of these operations are that we wish to be preserved, then design the tritwise operators so that they behave in a similar fashion. 5.5.2.1 UNARY LOGICAL OPERATORS Let's start with AND and OR. To begin with, these can be considered "choice" or "preference" operators, as they always return one of their operands. AND can be described as wanting to return 0, but returning 1 if it is given no other choice, i.e., if both operands are 1. Similarly, OR wants to return 1 but returns 0 if that is its only choice. From this it is immediately apparent that each operator has an identity element that "always loses", and a dominator element that "always wins". AND and OR are commutative and associative, and each distributes over the other. They are also symmetric with each other, in the sense that AND looks like OR and OR looks like AND when the roles of 0 and 1 are interchanged (De Morgan's Laws). This symmetry property seems to be a key element to the idea that these are logical, rather than arithmetic, operators. In a three-valued logic we would similarly expect a three- way symmetry among the three values 0, 1 and 2 and the three operators AND, OR and (of course) BUT. The following tritwise operations have all the desired properties: OR returns the greater of its two operands. That is, it returns 2 if it can get it, else it tries to return 1, and it returns 0 only if both operands are 0. AND wants to return 0, will return 2 if it can't get 0, and returns 1 only if forced. BUT wants 1, will take 0, and tries to avoid 2. The equivalents to De Morgan's Laws apply to rotations of the three elements, e.g., 0 -> 1, 1 -> 2, 2 -> 0. Each operator distributes over exactly one other operator, so the property "X distributes over Y" is not transitive. The question of which way this distributivity ring goes around is left as an exercise for the student. In TriINTERCAL programs the '@' (whirlpool) symbol denotes the unary tritwise BUT operation. You can think of the whirlpool as drawing values preferentially towards the central value 1. Alternatively, you can think of it as drawing your soul and your sanity inexorably down... On the other hand, maybe it's best you NOT think of it that way. A few comments about how these operators can be used. OR acts like a tritwise maximum operation. AND can be used with tritmasks. 0's in a mask wipe out the corresponding elements in the other operand, while 1's let the corresponding elements pass through unchanged. 2's in a mask consolidate the values of nonzero elements, as both 1's and 2's in the other operand yield 2's in the output. BUT can be used to create "partial tritmasks". 0's in a mask let BUT eliminate 2's from the other operand while leaving other values unchanged. Of course, the symmetry property guarantees that the operators don't really behave differently from each other in any fundamental way; the apparent differences come from the intuitive view that a 0 trit is "not set" while a 1 or 2 trit is "set". 5.5.2.1.1 BINARY OPERATORS II At this point we can define SELECT, since we now know what the tritwise AND looks like. SELECT takes the binary tritwise AND of its two operands. It shifts all the trits of the result corresponding to 2's in the right operand over to the right (low) end of the result, then follows them with all the output trits corresponding to 1's in the right operand. Trits corresponding to 0's in the right operand, which are all 0 anyway, occupy the remaining space at the left end of the output word. Both 10-trit and 20-trit operands are accepted, and are padded with zeroes on the left if necessary. The output type is determined the same way as in standard INTERCAL. 5.5.2.2 UNARY ARITHMETIC OPERATORS Now that we've got all that settled, what about XOR? This is easily the most-useful of the three unary INTERCAL operators, because it combines in one package the operations ADD WITHOUT CARRY, SUBTRACT WITHOUT BORROW, BITWISE NOT-EQUAL, and BITWISE NOT. In TriINTERCAL we can't have all of these in the same operator, since addition and subtraction are no longer the same thing. The solution is to split the XOR concept into two operators. The ADD WITHOUT CARRY operation is represented by the new '^' (sharkfin) symbol, while the old '?' symbol represents SUBTRACT WITHOUT BORROW. The reason for this choice is so that '?' will also represent the TRITWISE NOT-EQUAL operation. Note that '?', unlike the other four unary operators, is not symmetrical. It should be thought of as rotating its operand one trit to the right (with wraparound) and then subtracting off the trits of the original number. These subtractions are done without borrowing, i.e., trit-by-trit modulo 3. 5.5.3 EXAMPLES The TriINTERCAL operators really aren't all that bad once you get used to them. Let's look at a few examples to show how they can be used in practice. In all of these examples the input value is contained in the 10-trit variable .3. In INTERCAL, single-bit values often have to be converted from {0,1} to {1,2} for use in RESUME statements. Examples of how to do this appear in the original manual. In TriINTERCAL the expression "^.3$#1"~#1 sends 0 -> 1 and 1 -> 2. If the 1-trit input value can take on any of its three possible states, however, we will also have to deal with the 2 case. The expression "V.3$#1"~#1 sends {0,1} -> 1 and 2 -> 2. To test if a trit is set, we can use "V'"&.3$#2"~#1'$#1"~#1, sending 0 -> 1 and {1,2} -> 2. To reverse the test we use "?'"&.3$#2"~#1'$#1"~#1, sending 0 -> 2 and {1,2} -> 1. Note that we have not been taking full advantage of the new SELECT operator. These last two expressions can be simplified into "V!3~#2'$#1"~#1 and "?!3~#2'$#1"~#1, which perform exactly the same mappings. Finally, if we need a 3-way test, we can use "@'"^.3$#7"~#4'$#2"~#10, which obviously sends 0 -> 1, 1 -> 2, and 2 -> 3. For an unrelated example, the expression "^.3$.3"~"#0$#29524" converts all of the 1-trits of .3 into 2's and all of the 2-trits into 1's. In balanced ternary, where 2-trits represent -1 values, this is the negation operation. 5.6 BEYOND TERNARY... While we're at it, we might as well extend this multiple bases business a little farther. The ick compiler actually recognizes filename suffixes of the form '.Ni', where N is any number from 2 to 7. 2 of course gives standard INTERCAL, while 3 gives TriINTERCAL. We cut off before 8 because octal notation is the smallest base used to facilitate human-to-machine communication, and this seems quite contrary to the basic principles behind INTERCAL. The small data types hold 16 bits, 10 trits, 8 quarts, 6 quints, 6 sexts, or 5 septs, and the large types are always twice this size. As for operators, '?' is always SUBTRACT WITHOUT BORROW, and '^' is always ADD WITHOUT CARRY. 'V' is the OR operation and always returns the max of its inputs. '&' is the AND operation, which chooses 0 if possible but otherwise returns the max of the inputs. '@' is BUT, which prefers 1, then 0, then the max of the remaining possibilities. Rather than add more special symbols forever, a numeric modifier may be placed directly before the '@' symbol to indicate the operation that prefers one of the digits not already represented. Thus in files ending in '.5i', the permitted unary operators are '?', '^', '&', '@', '2@', '3@', and 'V'. Use of such barbarisms as '0@' to represent '&' are not permitted, nor is the use of '@' or '^' in files with either of the extensions '.i' or '.2i'. Why not? You just can't, that's why. Don't ask so many questions. As a closing example, we note that in balanced quinary notation, where 3 means -2 and 4 means -1, the negation operation can be written as either DO .1 <- "^'"^.3$.3"~"#0$#3906"'$'"^.3$.3"~"#0$#3906"'"~"#0$#3906" or as DO .1 <- "^.3$.3"~"#0$#3906" DO .1 <- "^.1$.1"~"#0$#3906" These work because multiplication by -1 is the same as multiplication by 4, modulo 5. Now go beat your head up against the wall for a while. 6. UNDOCUMENTED FEATURES FROM INTERCAL-72 A feature of INTERCAL-72 not documented in the manual was that it required a certain level of politesse from the programmer. If fewer than 1/5th of the program statements included the PLEASE qualifier, the program would be rejected as insufficiently polite. If more than 1/3rd of them included PLEASE, the program would be rejected as excessively polite. This check has been implemented in C-INTERCAL. To assist programmers in coping with it, the intercal.el mode included with the distribution randomly expands "do " in entered source to "DO PLEASE" or "PLEASE DO" 1/4th of the time. 7. NEW ERROR MESSAGES IN C-INTERCAL The following error codes are new in C-INTERCAL. 111 You tried to use a C-INTERCAL extension with the `traditional' flag on. 222 Out of stash space. 333 Too many variables. 444 A COME FROM statement references a non-existent line label. 555 More than one COME FROM references the same label. 666 Too many source lines. 777 No such source file. 888 Can't open C output file 999 Can't open C skeleton file. 998 Source file name with invalid extension (use .i or .[3-7]i). 997 Illegal possession of a controlled unary operator. 7. END OF FILE Can't you read? Beat it! There's nothing left. Why don't you lie down and take a stress pill?