Should we show how STS relations are inductive datatypes? #677

WhatisRT · 2025-02-04T16:01:46Z

After some internal discussions, it seems we need to settle the question of how to present our STS relations. This is a follow-up to this comment: #663 (comment)

The two main options are:

We say that they are relations between the environment, signal and two states that are defined by the natural deduction rules we give in the figure. We do not explain how this is implemented in Agda and hide everything that hints toward the implementation (e.g. data and where keywords). This would mirror the style of the old PDF specs. It would look something like this:

_⊢_⇀⟦_⟧*_ : C → S → List Sig → S → Type

RTC-base :
  Γ ⊢ s ⇀⟦ [] ⟧* s

RTC-ind :
  ∙ Γ ⊢ s  ⇀⟦ sig  ⟧  s'
  ∙ Γ ⊢ s' ⇀⟦ sigs ⟧* s''
  ─────────────────────────
  Γ ⊢ s ⇀⟦ sig ∷ sigs ⟧* s''

We show the definition of the relations in full and explain to a degree what they mean. Note that this means we'd be showing that STS relations are just sum types (which we use in a bunch of places as normal non-dependent sum types). To provide understanding that these are the same, we'd have to explain how to desugar the ∙_, _∙_ and _──────────────────────_ symbols. It would look something like this:

data _⊢_⇀⟦_⟧*_ : C → S → List Sig → S → Type where

RTC-base :
  Γ ⊢ s ⇀⟦ [] ⟧* s

RTC-ind :
  ∙ Γ ⊢ s  ⇀⟦ sig  ⟧  s'
  ∙ Γ ⊢ s' ⇀⟦ sigs ⟧* s''
  ─────────────────────────
  Γ ⊢ s ⇀⟦ sig ∷ sigs ⟧* s''

Compare this with a data type we already have:

data Credential : Type where
  KeyHashObj : KeyHash → Credential
  ScriptObj  : ScriptHash → Credential

My personal opinion is that most of the target audience does not care that our relations are datatypes, and showing that they are will mostly lead to confusion and distraction. For somebody consuming the PDF as a piece of documentation, knowing how to work with values of these types is completely irrelevant. There is of course some potential for confusing people familiar with Agda or dependent types in general, but these people can always just have a look at the source code.

The text was updated successfully, but these errors were encountered:

yHSJ · 2025-02-04T16:57:21Z

As someone with no formal background in Agda (or formal methods in general) who has read and actively uses the spec in day to day work, I tend to agree with you @WhatisRT. If the goal is to allow the specification to be a document accessible to blockchain developers in general, there is no reason to make the specification more esoteric. Those who have a deep understanding of Agda or dependent types will almost always have the ability to read the source, and thus there is no need to tailor the PDF to them.

In general, I think the goal should always be to reduce difficulty of reading the specs without reducing technical accuracy.

Quantumplation · 2025-02-04T19:15:01Z

If you want to focus on making it more understandable, I would devote more of a preamble to explaining the mental model around it. I'm working on a draft of the whole spec that tries to do this throughout, but something like this in the introduction:

1.1 Overview

This document describes, in a very precise and executable way, the behavior of a Ledger that can be updated in response to a series of events. Because of the precise nature of the document, it can be dense and difficult to read at times, and it can be helpful to have a high level understanding of what it is trying to describe, which we will present below. Just be aware that this section focuses on intuition that the rest of the document will make precise.

By "Ledger", we mean a record of, at a specific point in time, a data structure. This data structure contains, for example, an accounting of how funds in the system are distributed among different accounts; the ADA currently held in the reserves; a list of stake pools operating the network, and so on.

This ledger can be updated in response to certain signals, such as receiving a new transaction, time passing and crossing an "epoch boundary", enacting a governance proposal, to name a few. Thus, the ledger also defines a set of rules for what signals are valid to apply, and exactly how the state should be updated in response to those signals. A description of this state, the possible signals, and the aforementioned rules is exactly what this document seeks to make precise.

We will model this via a number of "state transition systems" (STS) that describe different subsets of the behavior, and can be composed into the full description of the protocol. Each state transition system will consist of:

An environment, which consists of data read from the ledger or outside world that should be considered constant and unchanging for the purposes of the rule
An initial state, which consists of the subset of the full ledger state that the transition can update
A signal, with associated data, that the transition system can receive
A set of preconditions that must be met in order for the transition to be valid
A new state that results from said transition

For example, the "UTXOW" transition system defined on page 32 checks that, among other things, the transaction is signed by the appropriate parties.

These transition systems can be composed by requiring another transition system to hold as part of the preconditions.

For example, the "UTXOW" transition system mentioned above also requires that the "UTXO" transition, which checks that the inputs to the transaction exist, that the transaction is balanced, and several other rules.

A brief description of each transition system is provided below, with a link to where they are defined in this document:

UTXOS: Checks that any relevant scripts needed by the transaction evaluate to true
UTXO: Checks core invariants for an individual transaction to be valid, such as the transaction being balanced, fees being paid, etc; includes the UTXOS transition system.
UTXOW: Checks that a transaction is witnessed correctly with the appropriate signatures, datums, and scripts; includes the UTXO transition system.
GOV: Handles voting and submitting governance proposals
DELEG: Handles registering stake addresses and delegating to a stake pool
POOL: Handles registering and retiring stake pools
GOVCERT: Handles registering and delegating to DReps
CERT: Combines DELEG, POOL, GOVCERT transitions systems, as well as some additional rules shared by all three
CERTS: Applies CERT repeatedly for each certificate in the transaction
LEDGER: The full state update in response to a single transaction; includes the UTXOW, GOV, and CERTS rules
LEDGERS: Applies LEDGER repeatedly as needed, for each transaction in a list of transactions
ENACT: Applies the result of a previously ratified governance action, such as triggering a hard fork or updating the protocol parameters.
RATIFY: Decides whether a pending governance action has reached the thresholds it needs to be ratified.
SNAP: Computes new stake distribution snapshots
EPOCH: Computes the new state as of the end of an epoch; includes the ENACT, RATIFY, and SNAP transition systems.
NEWEPOCH: Computes the new state as of the start of a new epoch; includes the previous EPOCH transition.
CHAIN: The top level transition in response to a new block that applies the NEWEPOCH transition when crossing an epoch boundary, and the LEDGERS transition on the list of transactions in the body.

Quantumplation · 2025-02-04T19:15:58Z

I think if I had had something like the description above the first time reading through this, it would have gone a lot smoother; I have have someone on my team go through the exercise of reading it for the first time if you'd like to validate that hypothesis.

Quantumplation · 2025-02-04T19:31:05Z

regarding the initial question, I don't think stuff like that is the major barrier to comprehension; I think there are two major barriers to comprehension, one of which there's not much you can do about:

Lack of context / frame of reference for what the syntax is supposed to mean, in human terms
Unfamiliarity with dense symbolic logic

The first is what I'm trying to solve with my writeup above; The latter isn't impacted by removing "data" and "where"; the hard part is exactly what you leave behind, and is exactly the core of the spec. I think that's just going to be intimidating and impenetrable to people no matter what you do, and will just require effort to get familiar with / used to.

Where you can help the agda code is by being more verbose: longer variable names, more intermediate let bindings, less cryptic symbols like <$> or _|ᵒ_ or ¬? etc.

sandtreader · 2025-02-05T10:53:39Z

I think this kind of human-readable explanation - in parallel with the formal definition - is exactly what's needed. If we could shift it into Markdown+Agda as well, that would be perfect ;-)

I don't think that's a reason to make the 'dense symbolic logic' even denser, though - as a complete Agda-newbie (-phobe?) I at least had some hope that

data Credential : Type where
  KeyHashObj : KeyHash → Credential
  ScriptObj  : ScriptHash → Credential

would make sense given more context.

WhatisRT · 2025-02-05T12:33:43Z

Thanks for the comments!

@Quantumplation do you mind if we copy what you wrote into the introduction? It looks quite nice! It could also be integrated with the diagrams (#639) we have.

As for the symbols, there has been an effort to remove the more obscure one as much as possible. In particular <$> has been one that I wanted to get rid of for a while but it's just so convenient that it managed to stick around. With that one I'm tempted to just give in and explain it in the Notation section. I think I'll make an issue for discussing what symbol are and aren't ok, and for keeping track of previous decisions on these.

@sandtreader we just discussed literate Markdown in our meeting yesterday, but mostly as a way to cater to Agda experts. We now have issue #680 for this.

For the definition of the Credential type, we should probably add an explanation of sum types and their notation as well. This would probably even be useful to some Haskell developers that don't use GADTs. I'll make an issue for that as well.

Note that this feeds back into the original discussion: when we explain what data means, can we get away with just explaining sum types & simply typed pattern matching (if we go with option 1) or do we touch the topic of dependent types (if we go with option 2). I guess we could also go with option 2 & just not explain it (e.g. if you know, you know) but I really dislike that.

Quantumplation · 2025-02-05T15:01:04Z

@WhatisRT Yes, please feel free, if you find it useful!

Quantumplation · 2025-02-05T19:20:27Z

@WhatisRT I don't think you need to contort the code to remove <$> if it's that convenient; A note in the notation section, and some more verbose variable names probably go a long way there.

The question I would ask in each case is:

Is the symbol machinery needed to convince the computer of something, or hiding important details about the semantics and execution of this code, such that if I had no idea what it meant, would I still come away with the correct idea.

The former should be minimized, but aren't the worst sin, whereas the latter is what is really harmful; and because the document has both types, a new reader will generally feel that that they are all the latter.

If you can get to the point where they're all the former, then you can mention that in the notation section:

"The following agda constructs might be useful to understand the workings of the code, but generally are more about the mechanistic operation of the proof systems, and can be safely ignored".

In the case of <$>, I feel like it's a case of the former; it's basically just saying "run this function on this argument", and so if you told the user to ignore it, it wouldn't actually harm the human interpretation of the code.

Whereas something like _|ᵒ_, even though it's defined in the document, is fundamental to the logic you're trying to convey.

So replacing that with an infix updatedBy or distributedOver or whatever that's trying to convey, would be super helpful.

(I'll also disclaim, part of the challenge you'll face is that comprehension sometimes just takes time, and thorough reading, and you're never going to be able to completely circumvent that; if someone skims the document and misses a clause where you say you can ignore the ? and ¿ operators, they're still going to get confused when they read that elsewhere in the document 😅)

WhatisRT added discussion documentation Improvements or additions to documentation labels Feb 4, 2025

ch1bo mentioned this issue Feb 6, 2025

Literate Agda using markdown #680

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we show how STS relations are inductive datatypes? #677

Should we show how STS relations are inductive datatypes? #677

WhatisRT commented Feb 4, 2025

yHSJ commented Feb 4, 2025

Quantumplation commented Feb 4, 2025

Quantumplation commented Feb 4, 2025

Quantumplation commented Feb 4, 2025

sandtreader commented Feb 5, 2025

WhatisRT commented Feb 5, 2025

Quantumplation commented Feb 5, 2025

Quantumplation commented Feb 5, 2025

Should we show how STS relations are inductive datatypes? #677

Should we show how STS relations are inductive datatypes? #677

Comments

WhatisRT commented Feb 4, 2025

yHSJ commented Feb 4, 2025

Quantumplation commented Feb 4, 2025

1.1 Overview

Quantumplation commented Feb 4, 2025

Quantumplation commented Feb 4, 2025

sandtreader commented Feb 5, 2025

WhatisRT commented Feb 5, 2025

Quantumplation commented Feb 5, 2025

Quantumplation commented Feb 5, 2025