Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we show how STS relations are inductive datatypes? #677

Open
WhatisRT opened this issue Feb 4, 2025 · 8 comments
Open

Should we show how STS relations are inductive datatypes? #677

WhatisRT opened this issue Feb 4, 2025 · 8 comments
Labels
discussion documentation Improvements or additions to documentation

Comments

@WhatisRT
Copy link
Collaborator

WhatisRT commented Feb 4, 2025

After some internal discussions, it seems we need to settle the question of how to present our STS relations. This is a follow-up to this comment: #663 (comment)

The two main options are:

  1. We say that they are relations between the environment, signal and two states that are defined by the natural deduction rules we give in the figure. We do not explain how this is implemented in Agda and hide everything that hints toward the implementation (e.g. data and where keywords). This would mirror the style of the old PDF specs. It would look something like this:
_⊢_⇀⟦_⟧*_ : C  S  List Sig  S  Type

RTC-base :
  Γ ⊢ s ⇀⟦ [] ⟧* s

RTC-ind :
  ∙ Γ ⊢ s  ⇀⟦ sig  ⟧  s'
  ∙ Γ ⊢ s' ⇀⟦ sigs ⟧* s''
  ─────────────────────────
  Γ ⊢ s ⇀⟦ sig ∷ sigs ⟧* s''
  1. We show the definition of the relations in full and explain to a degree what they mean. Note that this means we'd be showing that STS relations are just sum types (which we use in a bunch of places as normal non-dependent sum types). To provide understanding that these are the same, we'd have to explain how to desugar the ∙_, _∙_ and _──────────────────────_ symbols. It would look something like this:
data _⊢_⇀⟦_⟧*_ : C  S  List Sig  S  Type where

RTC-base :
  Γ ⊢ s ⇀⟦ [] ⟧* s

RTC-ind :
  ∙ Γ ⊢ s  ⇀⟦ sig  ⟧  s'
  ∙ Γ ⊢ s' ⇀⟦ sigs ⟧* s''
  ─────────────────────────
  Γ ⊢ s ⇀⟦ sig ∷ sigs ⟧* s''

Compare this with a data type we already have:

data Credential : Type where
  KeyHashObj : KeyHash  Credential
  ScriptObj  : ScriptHash  Credential

My personal opinion is that most of the target audience does not care that our relations are datatypes, and showing that they are will mostly lead to confusion and distraction. For somebody consuming the PDF as a piece of documentation, knowing how to work with values of these types is completely irrelevant. There is of course some potential for confusing people familiar with Agda or dependent types in general, but these people can always just have a look at the source code.

@WhatisRT WhatisRT added discussion documentation Improvements or additions to documentation labels Feb 4, 2025
@yHSJ
Copy link

yHSJ commented Feb 4, 2025

As someone with no formal background in Agda (or formal methods in general) who has read and actively uses the spec in day to day work, I tend to agree with you @WhatisRT. If the goal is to allow the specification to be a document accessible to blockchain developers in general, there is no reason to make the specification more esoteric. Those who have a deep understanding of Agda or dependent types will almost always have the ability to read the source, and thus there is no need to tailor the PDF to them.

In general, I think the goal should always be to reduce difficulty of reading the specs without reducing technical accuracy.

@Quantumplation
Copy link

If you want to focus on making it more understandable, I would devote more of a preamble to explaining the mental model around it. I'm working on a draft of the whole spec that tries to do this throughout, but something like this in the introduction:

1.1 Overview

This document describes, in a very precise and executable way, the behavior of a Ledger that can be updated in response to a series of events. Because of the precise nature of the document, it can be dense and difficult to read at times, and it can be helpful to have a high level understanding of what it is trying to describe, which we will present below. Just be aware that this section focuses on intuition that the rest of the document will make precise.

By "Ledger", we mean a record of, at a specific point in time, a data structure. This data structure contains, for example, an accounting of how funds in the system are distributed among different accounts; the ADA currently held in the reserves; a list of stake pools operating the network, and so on.

This ledger can be updated in response to certain signals, such as receiving a new transaction, time passing and crossing an "epoch boundary", enacting a governance proposal, to name a few. Thus, the ledger also defines a set of rules for what signals are valid to apply, and exactly how the state should be updated in response to those signals. A description of this state, the possible signals, and the aforementioned rules is exactly what this document seeks to make precise.

We will model this via a number of "state transition systems" (STS) that describe different subsets of the behavior, and can be composed into the full description of the protocol. Each state transition system will consist of:

  • An environment, which consists of data read from the ledger or outside world that should be considered constant and unchanging for the purposes of the rule
  • An initial state, which consists of the subset of the full ledger state that the transition can update
  • A signal, with associated data, that the transition system can receive
  • A set of preconditions that must be met in order for the transition to be valid
  • A new state that results from said transition

For example, the "UTXOW" transition system defined on page 32 checks that, among other things, the transaction is signed by the appropriate parties.

These transition systems can be composed by requiring another transition system to hold as part of the preconditions.

For example, the "UTXOW" transition system mentioned above also requires that the "UTXO" transition, which checks that the inputs to the transaction exist, that the transaction is balanced, and several other rules.

A brief description of each transition system is provided below, with a link to where they are defined in this document:

  • UTXOS: Checks that any relevant scripts needed by the transaction evaluate to true
  • UTXO: Checks core invariants for an individual transaction to be valid, such as the transaction being balanced, fees being paid, etc; includes the UTXOS transition system.
  • UTXOW: Checks that a transaction is witnessed correctly with the appropriate signatures, datums, and scripts; includes the UTXO transition system.
  • GOV: Handles voting and submitting governance proposals
  • DELEG: Handles registering stake addresses and delegating to a stake pool
  • POOL: Handles registering and retiring stake pools
  • GOVCERT: Handles registering and delegating to DReps
  • CERT: Combines DELEG, POOL, GOVCERT transitions systems, as well as some additional rules shared by all three
  • CERTS: Applies CERT repeatedly for each certificate in the transaction
  • LEDGER: The full state update in response to a single transaction; includes the UTXOW, GOV, and CERTS rules
  • LEDGERS: Applies LEDGER repeatedly as needed, for each transaction in a list of transactions
  • ENACT: Applies the result of a previously ratified governance action, such as triggering a hard fork or updating the protocol parameters.
  • RATIFY: Decides whether a pending governance action has reached the thresholds it needs to be ratified.
  • SNAP: Computes new stake distribution snapshots
  • EPOCH: Computes the new state as of the end of an epoch; includes the ENACT, RATIFY, and SNAP transition systems.
  • NEWEPOCH: Computes the new state as of the start of a new epoch; includes the previous EPOCH transition.
  • CHAIN: The top level transition in response to a new block that applies the NEWEPOCH transition when crossing an epoch boundary, and the LEDGERS transition on the list of transactions in the body.

@Quantumplation
Copy link

I think if I had had something like the description above the first time reading through this, it would have gone a lot smoother; I have have someone on my team go through the exercise of reading it for the first time if you'd like to validate that hypothesis.

@Quantumplation
Copy link

regarding the initial question, I don't think stuff like that is the major barrier to comprehension; I think there are two major barriers to comprehension, one of which there's not much you can do about:

  • Lack of context / frame of reference for what the syntax is supposed to mean, in human terms
  • Unfamiliarity with dense symbolic logic

The first is what I'm trying to solve with my writeup above; The latter isn't impacted by removing "data" and "where"; the hard part is exactly what you leave behind, and is exactly the core of the spec. I think that's just going to be intimidating and impenetrable to people no matter what you do, and will just require effort to get familiar with / used to.

Where you can help the agda code is by being more verbose: longer variable names, more intermediate let bindings, less cryptic symbols like <$> or _|ᵒ_ or ¬? etc.

@sandtreader
Copy link

I think this kind of human-readable explanation - in parallel with the formal definition - is exactly what's needed. If we could shift it into Markdown+Agda as well, that would be perfect ;-)

I don't think that's a reason to make the 'dense symbolic logic' even denser, though - as a complete Agda-newbie (-phobe?) I at least had some hope that

data Credential : Type where
  KeyHashObj : KeyHash → Credential
  ScriptObj  : ScriptHash → Credential

would make sense given more context.

@WhatisRT
Copy link
Collaborator Author

WhatisRT commented Feb 5, 2025

Thanks for the comments!

@Quantumplation do you mind if we copy what you wrote into the introduction? It looks quite nice! It could also be integrated with the diagrams (#639) we have.

As for the symbols, there has been an effort to remove the more obscure one as much as possible. In particular <$> has been one that I wanted to get rid of for a while but it's just so convenient that it managed to stick around. With that one I'm tempted to just give in and explain it in the Notation section. I think I'll make an issue for discussing what symbol are and aren't ok, and for keeping track of previous decisions on these.

@sandtreader we just discussed literate Markdown in our meeting yesterday, but mostly as a way to cater to Agda experts. We now have issue #680 for this.

For the definition of the Credential type, we should probably add an explanation of sum types and their notation as well. This would probably even be useful to some Haskell developers that don't use GADTs. I'll make an issue for that as well.

Note that this feeds back into the original discussion: when we explain what data means, can we get away with just explaining sum types & simply typed pattern matching (if we go with option 1) or do we touch the topic of dependent types (if we go with option 2). I guess we could also go with option 2 & just not explain it (e.g. if you know, you know) but I really dislike that.

@Quantumplation
Copy link

@WhatisRT Yes, please feel free, if you find it useful!

@Quantumplation
Copy link

@WhatisRT I don't think you need to contort the code to remove <$> if it's that convenient; A note in the notation section, and some more verbose variable names probably go a long way there.

The question I would ask in each case is:

  • Is the symbol machinery needed to convince the computer of something, or hiding important details about the semantics and execution of this code, such that if I had no idea what it meant, would I still come away with the correct idea.

The former should be minimized, but aren't the worst sin, whereas the latter is what is really harmful; and because the document has both types, a new reader will generally feel that that they are all the latter.

If you can get to the point where they're all the former, then you can mention that in the notation section:

"The following agda constructs might be useful to understand the workings of the code, but generally are more about the mechanistic operation of the proof systems, and can be safely ignored".

In the case of <$>, I feel like it's a case of the former; it's basically just saying "run this function on this argument", and so if you told the user to ignore it, it wouldn't actually harm the human interpretation of the code.

Whereas something like _|ᵒ_, even though it's defined in the document, is fundamental to the logic you're trying to convey.

So replacing that with an infix updatedBy or distributedOver or whatever that's trying to convey, would be super helpful.

(I'll also disclaim, part of the challenge you'll face is that comprehension sometimes just takes time, and thorough reading, and you're never going to be able to completely circumvent that; if someone skims the document and misses a clause where you say you can ignore the ? and ¿ operators, they're still going to get confused when they read that elsewhere in the document 😅)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants