The Point When Lists Break

It had occurred to me that at some point that I was going to have to do something about complex lists. Lists work great when they hold one data type inside them. Like a list of numbers:

'(1 2 3 4 5)

Or even complex things like lists of procedures:

'(even odd (lambda (x) (equal? x 3)))

But at this point I needed to hold a collection of things that weren't that alike. Specifically timestamps, metadata and lists of data. So it was time to go deeper into my journey with racket and see if I couldn't find a suitable language feature for my data. After a few minutes of reading I found Structures. They had pretty much exactly what I needed. I could define a custom datatype, composed of the simpler datatypes I had and therefore a thing I could pass around my program.

Providing Structs

Structs are pretty easy to make:

(struct examined-row (id results))

This makes a struct called examined row and it has two fields, id and results. So far so good. The detailed documentation for structs says:

A struct form with n fields defines up to 4+2n names

So when you execute that struct form you get 8 forms in return. It took me a moment to realise that when I wanted to provide the struct I was going to need to give access to the generated forms. So less:

(provide examined-row)

And more:

(provide examined-row examined-row-id examined-row-results

Becuase I'm sharing the structs between my modules I'm always providing them. And as you can see the creation form of examined-row has the same name as the struct. So it wasn't until I couldn't get at the contents of the struct that I realised by the time I was providing things the struct had morphed into into its constituent pieces.

Update - 22/11/2021

After posting this thing on racket stories I got a really helpful message from soegaard2 on slack 👋. They said I could use a form called struct-out in the provide. So I did and it's very neat:

(provide (struct-out examined-row))

Much more like my original intent. Thanks again soegaard2.

Initialising Structs

The second thing about structs that had me was the fact they require all fields to have values at the point of creation. Now you could use the #:mutable keyword. But I wanted to be functional and I felt the difference between a struct and say a hash clearly here. This mattered as I started to make structs made of structs.

Side note: If you want to feel one of the major pain points of real programming here you go. In any sufficiently long-lived or complex program you eventually get to the point where you are dragging large graphs of things about with you. Either by reference or in-memory. So when you're evaluating a book (or other resource) to learn a language/framework its often worth checking to see if there's specific talk of complexity. It won't be there in every good book but it is notably absent from every bad book.

A good example of where you require this is timestamps. So say you had a struct like this:

(struct time-card (start-time end-time))

Your plan is to end up with something that you can iterate over to do some time and date arithmetic. Becuase the struct has to be set at creation you have two options. One, wait until you've done the work and then create a struct knowing both the start and the end time values. That works well for short-lived things, in fact it's exactly what I do with the examined-row example above. But not so well when the time is long (in computer terms). So instead you use your second option, which is to make a value that clearly expresses nothing happened. You can do that various ways, but what I did for table-level statistics (where there may be millions if not billions of rows) is use the same time for start and end. Allowing me to do arithmetic but clear arithmetic (i.e. that 0 seconds have passed between start and end.

That does leave me with a conundrum. I'm saved from the hash issue of having to write guard code to ensure a particular key-value pair is present. But in order to get the right value once I've finished working I'm going to need to either produce a new struct instance or mutate the existing instance. The key helper here is struct-copy because this allows you to copy a struct with previously set values and change one or more of them for values you now have. So you end up with:

  (struct time-card (start-time end-time))
  (define initial-time (now))
  (define card (time-card initial-time initial-time))
  (do-something-that-takes-along-time)
  (define finish-time (now))
  (struct-copy time-card card [end-time (finish-time)])

If this happens inside a function obviously you have two structs when you're aiming at one. But the garbage collector can come and clean up pretty quickly if the struct is being returned not long after the copy. Which in my case it is.

Update - 23/11/2021

So I also posted this on Hacker News and got an intrguing comment from iamevn. There's a racket package called Lenses. Tis deals with updating parts of functional sturctures. I haven't delved into the how or what of Lenses but its certainly an interesting package to understand.

Printing Structs

The default output of a struct is a bit underwhelming. For instance if you reuse some of that last example in a REPL you just get:

(time-card (now) (now))
#<time-card>

Which apart from telling you what type of struct it is doesn't give you much.

So, you want to use make-constructor-style-printer:

(require racket/struct)

(struct examined-row (id results)
  #:methods gen:custom-write
  [(define write-proc
     (make-constructor-style-printer
      (lambda (obj) 'examined-row)
      (lambda (obj) (list (examined-row-id obj) (examined-row-results obj)))))])

First up, note that you need to require racket/struct. Second, I got confused about what was happening with the lambdas. The first just returns a symbol, which is generally the name. The second is more interesting, in that you give it a list of values you want printed. Which gives you a pretty flexible approach to outputting the data from the struct. For instance just being able to pick the fields is a bit of a win. But you can also prefix it with text if you wish (I haven't). But that takes the struct from being quite opaque to really intriguing to work with.

Conclusion

I'm not 100% sold on structs. But I'm getting there. They're a bit less flexible than hashmaps but they're more declarative which is nice. They don't carry some of the formality of typed racket, but they can clearly be a gateway to typed racket. All in all I'm going to keep using them and see how it pans out.