FURPHY - A PROJECT TO TESTBED CONCATENATIVE FUNCTIONAL LANGUAGE CONSTRUCTS WITH FORTH

This page is best viewed with a computer and a monitor

Support the Any Browser campaign and keep your freedom of choice on the Net.

Arachne WWW 
browser


Back to the home page.

Introduction

If you are not yet familiar with the computer concepts involved in Forth and functional languages, before going on you may like to find out more about Forth here or about functional languages here.

I have been working on some functional extensions of Forth, drawing on Euphoria and Joy as well as some other underlying stuff (thanks to Henry Baker et al). I would like to put up some of this material to get feedback (thanks to those who have already offered some).

Why call it "Furphy"? Because of its similar sound to Forth and Euphoria, and because it has a cultural significance in Australia. (You can find some of the background here, in the Water Carrier History part of the Furphy company site.)

What is Joy, and how is it related to Forth? It's a special kind of functional language, a "concatenative" language, and that means it engineers out many common features of functional languages, like closures and binding. It does this in a way very like the way Forth engineers out many of the complexities of imperative and procedural languages, so Joy has parallel evolved to Forth. This wasn't planned in, but people noticed it afterwards.

What is Euphoria, and how is it related to Forth? It's a simple procedural language that has borrowed heavily from work in functional languages, in particular to give it powerful data structuring features without great complexity. It turns out that the part that's in Joy that does data structuring has also parallel evolved to the similar part in Euphoria. So none of these three are really related, and there aren't any serious similarities between Forth and Euphoria, but putting some structuring into Forth based on Euphoria's tricks will give some functional building blocks for Forth. I hope to be able to use those with Forth's own power to get close to the functional stuff that Joy does.

Progress to date

For now, I am just testbedding what I want inside Forth (and I think it will be useful even within Forth). I plan on not collecting any garbage at all at first!

Latest work (20.8.06)

The language constructs are almost the same as with Forth, but with enough differences to provide "faux amis", false friends. That is, some things look the same but are enough different to mislead people who are already familiar with Forth. Here is a list of the main ones so far:-

There is a simple compile and go compiler, that is provided with a single finite string as source. In batch mode that can be an entire file that can end with code to save an executable, or in interactive mode it can end with a sentinel to obtain the next input or just return to outer code that does that. Here is a simplified outline of the logic of the compiler:-

All the work of the language is done by the keywords, even memoising if you provide that, starting off with a stock of keywords in the implementation just as Forth does. In fact, it would be possible to do everything without any immediate keywords - they are provided as syntactic sugar. Without them, any source would compile to something! That means that the work could theoretically be separated into a preprocessor and an even simpler compiler as a back end, but that would be less well integrated.

Forthers will notice that this does not provide for defining other things directly, such as constants or variables. As a functional language Furphy is less oriented to that sort of thing, but they can be provided with workarounds that put them inside wrapper code to give them names.

Apart from conditional code, most control flow is much like Forth. That is, each keyword encountered during execution is either low level code, embedded data with associated handlers, or calls on other Furphy code defined elsewhere. In each case it is evaluated (i.e. executed), using the stack to get parameters and pass results. Data can also be wrapped up in data structures, e.g. by a { } pair. This stack is distinct from the return stack which is used mainly for control flow, in particular return addresses. There are keywords like DROP and DUP to manipulate the stack, just as in Forth.

Conditional code is mostly achieved by IF and a complementary pair of things, freezing and thawing. This is a bit like using indirection in C to pass parameters by name instead of the default by value. The analogy here is that IF does eager evaluation, and freezing and thawing change the behaviour to lazy evaluation. IF has this behaviour, using Forth documenting standards: (  true/false_flag  true_option  false_option  ---  result  ).

FREEZE and THAW behave like an anonymous label (if that isn't a contradiction in terms - it really returns early and pushes a re-entry address to the stack) and a dynamic subroutine call, different from the Forth keyword EXECUTE that it resembles. Typically, you would put FREEZE at the beginning of each keyword that you might want to select with IF and then you would use THAW on whichever one got chosen, like this:-

FREEZE TRUE_STUFF TRUE_OPTION

FREEZE FALSE_STUFF FALSE_OPTION

GET_FLAG TRUE_OPTION FALSE_OPTION IF THAW DEMO

In keeping with many functional languages, TRUE and FALSE could be special selector keywords to select options. Then IF would be implemented as something that merely rolled the flag to the top and then thawed it. However for efficiency it makes more sense to stick with the Forth convention of zero for false and non-zero for true.

Remembering that >R and R> are (as in Forth) low level keywords for popping a value from the data stack and pushing it onto the return stack, and vice versa, then freezing and thawing can be implemented as secondaries by the following Furphy definitions:-

R> FREEZE

>R THAW

These work because >R and R> are low level code inside higher level code, and the compiler closes the definitions of this particular higher level code with returns. Of course, in a properly designed Furphy implementation FREEZE and THAW would be low level code themselves.

This is actually enough to achieve loops and recursion and so on, using a DUP THAW combination to pass the freeze address of a keyword that uses FREEZE as a parameter to itself. However there is syntactic sugar to make programming more convenient:-

The compiler optimising is simple in theory, but there are a lot of special cases to watch out for (for instance, what happens if you try to define both [] and FREEZE as secondaries?). The idea is to avoid unnecessary return stack build up. Basically, the compiler looks at the last thing it compiled. If that was a secondary, it patches that to be a jump into the beginning of the secondary. After that perhaps it compiles a sentinel no-op to help with the special cases. If the last thing compiled was not a secondary, the compiler compiles a return in the usual way. This works well enough if THAW is a secondary, but the compiler also has to patch THAW if that is in low level code (so you shouldn't have that during your early development - it will complicate the testing).

Earlier work (June, 2003)

As a taster, here's an early version of the extensions using Pygmy Forth, a "direct threaded code" or DTC Forth implementation which like most implementations is slightly non-standard (in particular, it uses PUSH and POP instead of >R and R>). This version of the extensions is enough for a proof of principle. I have finalised the alpha version of my main building block, which is the nestable pair {{ and }} that will do for Forth what { and } do in Euphoria (only, I don't bother with proper garbage collection). So for instance {{ 3 4 5 }} will set up a chunk of memory holding housekeeping information and then the bit pattern 3 4 5, and it will leave a pointer on the stack to the beginning of that bit pattern.

Basically {{ and }} give the stuff that provides the data structuring Euphoria gets from its nestable { } construct (but not yet its syntactic sugar for strings). However, you are even less limited than in Euphoria in what you can do between {{ and }} - for instance, stack space permitting you could have other keywords to generate a list of a million prime numbers if you wanted. {{EXPLODE}} is basically a reverse operation to these, which can be used to build up data structure manipulation (though that isn't the most efficient way to go).

I have also used those building blocks to build up [[ and ]] to give analogues of Joy's [ and ] that provide program code, using the Forth DEFER and IS to achieve the effect of Joy's naming construct ==. These will run in execution mode, just like everything else being built here so far - it's like using CREATE with ' and , to build up secondaries in Forth instead of using a compile mode integrated with the outer/text interpreter. IIF and ITERATE provide conditional and looping behaviour for program code.

The constructs [{ and }] do what {{ and }} did, as far as holding bit patterns goes, only they start with a header that makes descriptors they generate return themselves when executed - the sort of thing you would want if they were behaving functionally. Keywords handling [{ and }] will have to look inside with an offset. [{EXPLODE}] is an analogue of {{EXPLODE}}.

Source code so far (preformatted and in colour)


( This is the testbed prototype code for "Furphy", a set of functional )
( Forth extensions which should eventually produce sufficient feedback )
( for a dedicated implementation bringing together heritages from Forth, )
( Euphoria, Joy, and perhaps even a little Oberon in its implementation )
( approach. The name Furphy has been chosen because of its similar sound )
( to Forth and Euphoria, and because it has a cultural significance in )
( Australia. To help with documentation, this will work towards a )
( "literate programming" style; where convenient, keywords are being )
( given in capital letters to make them stand out. )

( This is alpha version 0.02 of 16.6.03 )

( This has been migrated to the Pygmy Forth platform. )

( First some utility keywords to build {{ and }} with )

2 CONSTANT CELLSIZE

VARIABLE {SP} 0 {SP} ! ( will become a sentinel value one day )

VARIABLE {RESULT} 0 {RESULT} ! ( will become a sentinel value one day )

( this is the poor man's analogue of malloc we are using for prototyping )
( although it doesn't yet have an analogue of free, it will set up a )
( count of the number of cells requested, i.e. excluding cells used for )
( housekeeping like this, which will allow stored data to be unpacked )
( onto the stack )

: {OLD-PROVIDE} ( n --- addr )

  HERE SWAP
  FOR 0 , NEXT ;

: {PROVIDE} ( n --- addr )

  DUP , {OLD-PROVIDE} ;

( This effectively reverses the action of the {{ }} pair )
: {{EXPLODE}} ( addr --- manyvalues )
  DUP CELLSIZE - @
  FOR
    DUP @ SWAP CELLSIZE +
  NEXT
  DROP ;

( Note that Pygmy Forth may keep the top of stack in a register. )
( SP@ still points to the topmost entry, and the stack grows downwards )
( with the usual cell size - so there is no difference there from F-PC. )

: {{ ( --- oldSP@ )

  {SP} @ SP@ {SP} ! ; ( housekeeping that allows nesting )

: }} ( manyvalues --- addr )

  SP@ {SP} @ SWAP - ( space used on stack between {{ and }} )
  CELLSIZE / ( number of items )

  DUP {PROVIDE}

  {RESULT} ! ( temporarily saving structure address for eventual result )
  ( using a variable because FOR NEXT is more limited than ?DO LOOP )

  ( at this point the TOS has the number of items to be moved into )
  ( the structure then the items themselves and the housekeeping for )
  ( {{ and }} )

  FOR I CELLSIZE * {RESULT} @ + ! NEXT

  {SP} ! ( housekeeping that allows nesting )

  {RESULT} @ ;

: {{CONCAT}} ( addr1 addr2 --- addr3 )
  PUSH PUSH ( need to have the stack tidy before doing {{ )
  {{ POP {{EXPLODE}} POP {{EXPLODE}} }} ;

( {{ 17 23 57 }} )

( DUP CR . ( test 1 OK )

( CR ? ( test 2 OK )

( {{ 17 23 {{ 99 98 97 96 95 }} 57 }} )

( DUP CR . ( test 3 OK )

( DUP CR ? ( test 4 OK )

( DUP CELLSIZE + CELLSIZE + @ ( getting at the substructure )

( DUP CELLSIZE + ( second entry within substructure )

( DUP CR . ( test 5 OK )

( CR ? ( test 6 OK )

( DUP CR . ( test 7 OK - substructure )

( CR ? ( test 8 OK - substructure )

( Now some utility keywords to build and test [[ and ]] with )

: [NO-OP] ; ( as a template, and to help with unwanted if branches )

( We will use the template and the magic number facts that headers take )
( three bytes while cells are only two )

' [NO-OP]

1 - ( pointing at a junk byte just before so as to get a whole cell )

DUP CONSTANT [NO-OP-ADDR] ( note, this is NOT executable )

DUP @ CONSTANT [HEAD1] ( will do the first part of the work of : )
CELLSIZE + @ CONSTANT [HEAD2] ( will do the second part of the work of : )

: [RETURNER] POP DROP ;
' [RETURNER] CONSTANT [TAIL] ( will do the work of ; )

( Start the new secondary off with some cells for its code )

: [[ {{ [HEAD1] [HEAD2] ; ( the second cell will need to be patched anyway )

: }] ( it is worth having this as a separate keyword for use elsewhere )

  }}

  ( this needs to patch the header for it to execute properly )

  [NO-OP-ADDR] OVER - ( work out the discrepancy - note the sign change )
  [HEAD2] + ( construct the value to patch in )
  OVER CELLSIZE + ! ( get the location to patch and patch it )

  1 + ; ( have to offset past that junk byte to get something executable )

: ]] [TAIL] }] ;

: [[TESTER]] ." Reached [[TESTER]] OK" CR ;

( [[ ' [[TESTER]] ]] CR EXECUTE ( test 9 OK )

: [LIT] POP DUP @ SWAP CELLSIZE + PUSH ;

: IIF ( flag trueresult falseresult --- result )
  ROT
  IF
    DROP
  ELSE
    SWAP DROP
  THEN ;

' IIF CONSTANT 'IIF ( syntactic sugar )

( 0 [[ ' [LIT] 15 ' [LIT] 22 'IIF ' . ' CR ]] ( test 10 OK )

( will become a sentinel value one day )
VARIABLE [ITER-SUCCESSOR] 0 [ITER-SUCCESSOR] !

VARIABLE [ITER-TEST] 0 [ITER-TEST] ! ( will become a sentinel value one day )

: ITERATE ( seed test successor --- result )

  ( local storage of higher order stuff so ITERATE can nest )
  ( local variables mean less stack thrashing )
  [ITER-SUCCESSOR] @ PUSH [ITER-SUCCESSOR] !
  [ITER-TEST] @ PUSH [ITER-TEST] !

  BEGIN
    ( CR ." Tested: " DUP . )
    DUP [ITER-TEST] @ EXECUTE
    ( ."  getting test result: " DUP . )
  WHILE
    [ITER-SUCCESSOR] @ EXECUTE
  REPEAT

  ( clear away and reset locals )
  POP [ITER-TEST] !
  POP [ITER-SUCCESSOR] ! ;

' ITERATE CONSTANT 'ITERATE ( syntactic sugar )

( Materials to test ITERATE. We have to use DEFER rather than an analogue )
( for Joy's == at the moment, though. )

DEFER [NOT-ONE-YET.]

[[ ' DUP ' . ' [LIT] 1 ' > ]]

IS [NOT-ONE-YET.] ( n --- flag )

( The inefficient test for even provides a nested test for ITERATE )

DEFER [EVEN]

[[ ' [LIT] [[ ' [LIT] 1 ' > ]]
   ' [LIT] [[ ' [LIT] 2 ' - ]]
   ' ITERATE
   ' [LIT] 0 ' = ]]

IS [EVEN] ( n --- flag )

DEFER [ITER]

[[ ' DUP 
   ' [EVEN] 
   ' [LIT] [[ ' [LIT] 2 ' / ]]
   ' [LIT] [[ ' [LIT] 3 ' * ' [LIT] 1 ' + ]]
   ' IIF
   ' EXECUTE ]]

IS [ITER] ( n --- n )

( This uses a fairly well known algorithm as a test base, )
( the one at the basis of the Collatz or Ulam conjecture )

DEFER [IT-TEST]

[[ ' [LIT] ' [NOT-ONE-YET.] ' [LIT] ' [ITER] ' ITERATE ' DROP ]]

IS [IT-TEST] ( n --- )

( [{ and }] will give a bit pattern storage construct that returns its )
( own descriptor reference when executed; it will piggy back on {{ }} )
( and might as well piggy back on [[ ]] too. )

( This will eventually be used to piggy back a string construct itself, so )
( it could work with a leading byte count and strings could be built from )
( double byte values on the stack that respect the platform's little )
( endianness. )

( As we now have enough extensions to boot [{ from, though untidily from )
( not yet having enough syntactic sugar, it is worth doing it that way to )
( get the self testing it will give. )

[[ ' POP ' CELLSIZE ' - ' [LIT] 3 ' - ]] ( --- n )

CONSTANT '[SELF-REF]

( doesn't do a regular return when executed )
( uses the same magic number facts about header sizes as [[ and ]] do )

DEFER [{

[[ ' [[ ' [LIT] '[SELF-REF] ]]

IS [{

[{ 23 51 77 }] CONSTANT [{TEST}] ( for test 11 )

( We need [{EXPLODE}] as an analogue for {{EXPLODE}} )
( unfortunately the most convenient way to piggyback the one on the other )
( is to use the magic number facts and ROLL - but Pygmy Forth doesn't )
( support ROLL so it will have to be implemented the high level recursive )
( way )

DEFER ROLL

: ROLL-ACTION
  DUP 0 >
  IF
    1 -
    SWAP PUSH
    ROLL
    POP SWAP
  ELSE
    DROP
  THEN ;

' ROLL-ACTION IS ROLL

( will become a sentinel value one day )
VARIABLE [{EXPL-SIZE}] 0 [{EXPL-SIZE}] !

: [{EXPLODE}] ( addr --- manyvalues )
  1 - ( magic number to get at underlying construct made by {{ and }} )
  DUP CELLSIZE - @ ( magic number to get at size of underlying construct )
  [{EXPL-SIZE}] !
  {{EXPLODE}}
  ( bring unwanted stuff to top of stack - so far, so inefficiently )
  [{EXPL-SIZE}] @ ROLL
  [{EXPL-SIZE}] @ ROLL
  [{EXPL-SIZE}] @ ROLL
  DROP DROP DROP ( throw away unwanted stuff ) ;

Possible future directions, as suggested by the earlier work

An analogue of Joy's naming construct == will come later, probably called FIAT. Also there will be something like "" and " as analogues of Euphoria's " and " (syntactic sugar to set up holding string text more easily). And a few other things. [{ and }] will probably be used to implement this string construct. Keywords handling [{ and }] will have to look inside with an offset, and I am considering using Oberon style type extension to implement that in an object oriented sort of way (this won't necessarily be visible to users of Furphy, though in keeping with the Forth spirit it might be).

In theory all atoms could be implemented this way, if the operations on them were reworked to look inside what they held, but I shan't do that. Instead, I shall make ordinary integers stay within ranges that can't be confused with things set up via the memory allocation or set up as keywords in the dictionary, and then I shall replace EXECUTE in these extensions with an intelligent variant that tests for regular integers and just leaves them as the result but executes the other stuff. I shall also use the test for integers in the compilation done by [[ and ]] and similar constructs, so that when an integer is spotted [LIT] will be placed in front of it in the parameters for the final compile that ]] does. Together, those changes will make straight integers work in the right way when "executed", i.e. return their values or compile stuff that returns their values when it runs. And maybe intelligent variants replacing DUP and so on can provide information for a garbage collector.

I suspect I might find that reference counting will be good enough for garbage collection here. I'm not going to rule it out, anyway. Later, I will rework the inner interpreter to have some kind of garbage collection bolted on. The cheapest way to achieve garbage collection will be "snarfing", i.e. working inside something else that already provides it. I know a couple of ways to go to get that, such as piggybacking on the Perl 6 virtual machine, Parrot.

The outer/text interpreter might change as well, so that [[ and ]] set up a sort of compile mode (maybe instead of, maybe as well as, what : and ; do now). Maybe I will use a different model of "calls" - not a call/return with a return stack, but a slower run time macro expansion onto an execution stack that consumes its contents. That will avoid stack depth problems that can come up with recursive functional stuff. This is not revolutionary, it's old hat. I mean just precisely, treat all "calls" as run time macro expansions and throw away each part of the expanded macro as it gets used. That way, each later time you hit a tail recursion the execution stack is just precisely as deep as it was any previous times - the extra depth gets cut back each time before hitting it.

It's not worth bothering with, mostly, because of inefficiency - but Henry Baker found that the inefficiency was surprisingly small when dealing with Forth style highly factored short words. So again, it will be worth exploring because of bypassing the tail recursion problem. Then again, maybe the "while" construct ITERATE will be so convenient that it won't be necessary to tail recurse very often, or maybe explicit optimisation will turn out convenient.

All that is subject to alteration, of course - it's just my current thinking. But it shows more of the way towards a Joy analogue that sits on top of a variant Forth, something functional without being a precise clone of Joy and with nearly all the optimisability of Forth.

Back to the home page.