Towards Better Specifications

2024/09/30
Miscellaneous

Discussing how low-level specification documents could be improved...


I've recently been messing around with some OS development, and I've found that some specifications documents are quite difficult to work with. I'm going to pick on the UEFI spec as that's what I've been dealing with the most recently, but these comments apply to lots of other specs too. First, I find it frustrating how many typographical mistakes are in such an important document! Headings are often misformatted (e.g. not in bold like most of the document), and regular text is often appearing in code blocks. For example, see the two images below taken from the latest version of the spec:

Fig 1: Mistake in UEFI spec, "Prototype" should be a heading and not part of the code block.
Fig 2: An example where the spec is correct and formats headings correctly.

I'm almost certain that these are mistakes rather than intentional formatting, given how most of the document is formatted. This causes a particular problem for my use-case: I had planned to scrape the specification to generate C headers. This is a reasonably common practice, but since text (which is not valid code) randomly appears in code blocks occasionally, automatically generating headers requires massive amounts of manual intervention! As someone who sometimes contributes to open source projects, my first instinct was to find the public git repository for the documents and submit a PR with the changes. Unfortunately the UEFI organisation does not seem to develop their documents publicly, and there doesn't seem to be a proper place for reporting mistakes. This is frustrating as the UEFI documents come in both HTML and PDF versions, and both contain the same mistakes, so the documents are almost definitely generated programmatically - why not publish the source too? I really strongly believe in open-sourcing code which is written by non-profit organisations.

However, this did get me thinking - we should have a more standardised format for specification documents to eliminates mistakes. In fact, I believe specifications should be primarily code (perhaps a literate code document), rather than text. By writing our specifications in a well-defined programming language, we could eliminate a large class of specification mistakes by utilising the language's parser, type system, etc. For example, if the UEFI spec was written in some kind of literate C document, then the file would not compile as the code blocks are not valid C. Ideally, I think it would be best to encode as much as possible in code, with text only as a more friendly description of the code. We often see pre-conditions such as "X parameter must not be NULL" written in text next to a function signature. I believe these properties should be encoded in code too, using some form of dependent or refinement types. This way of doing things is much more precise, and eliminates the possibility of confusing language making the spec ambiguous. It would also help write formally verified code, where you prove you are using the API directly.

To my knowledge, the closest existing project to what I am thinking of is Sail. Sail is a specification language for defining instruction sets, and uses many of the techniques I mentioned above such as dependent types. It seems like a really cool project, but does seem to be specialised for ISAs rather than APIs. I'm curious how difficult it would be to extend Sail to support firmware level APIs (e.g. BIOS and UEFI). Combined with Sail, this could pave the way for formally verified embedded and OS programming. Over the next few weeks I'll be thinking about what a language for API specicications should look like - is it easier to modify an existing project (e.g. Sail), write something from scratch, or embed the specifications in a DSL in a dependently typed language like Idris..?