Extensibility, Reusability, and Testability. Let's chat about it! #212

smoya · 2021-12-29T14:15:06Z

smoya
Dec 29, 2021
Maintainer

AsyncAPI popularity is increasing rapidly. It is a fact that AsyncAPI is being used more every day and that is gonna keep growing during the next years.
Meaning more developers will jump into the initiative, collaborating with the community and working on the tools that supports the AsyncAPI spec, such as the parsers, the generators, etc.

Some of the issues that popularity can bring to AsyncAPI software, triggered by the increment of developers working on the code, are:

Duplication of efforts. Different tools working on the same or similar problem but not really sharing code (or "enough").
Duplication of code. Similar to the previous, but emphasizing the fact that the code is not the same:
- Bugs can be introduced in different sources of code.
- Features misalignement. Let's say the Studio introduces a new feature that applies to HTML documentation templates but is not available in the @asyncapi/react-component because of any reason.
New spec changes have to be applied differently and in different code sources.
Etc.

Producing software that can be easy extended, reusable and highly testable is not a thing to postpone to the future anymore. IMHO that future is now.

The problem

I want to illustrate the problems above with a real example.

Server-API is a server API used by other services. It provides different commands to interact with AsyncAPI documents like validating documents, generating code using templates, etc.

CLI is a command-line tool that lets users to work with AsyncAPI documents like validating, generating code using templates, creating new documents, editing them, etc.

Did you notice that both projects have similarities? That's because they are sharing concerns; some of them are the core of AsyncAPI tooling.

It is good that both projects use the same tools under the hood. For example, both projects use the Parser-JS to parser and validate AsyncAPI documents.
However, there is still a lot of code that is not shared.

Let's see the generator.service.ts file from the server-api project:

// @ts-ignore
import AsyncAPIGenerator from '@asyncapi/generator';
import { AsyncAPIDocument } from '@asyncapi/parser';

import { prepareParserConfig } from '../utils/parser';

/**
 * Service providing `@asyncapi/generate` functionality.
 */
export class GeneratorService {
  public async generate(
    asyncapi: AsyncAPIDocument | string,
    template: string,
    parameters: Record<string, any>,
    destDir: string,
    parserOptions: ReturnType<typeof prepareParserConfig>,
  ) {
    const generator = new AsyncAPIGenerator(template, destDir, {
      forceWrite: true,
      templateParams: parameters,
    });

    if (typeof asyncapi === 'string') {
      await generator.generateFromString(asyncapi, parserOptions);
    } else {
      await generator.generate(asyncapi, parserOptions);
    }
  }
}

That code is executed on every POST request made to the /generate endpoint.
Thanks to the simplicity of the code, we can see that the generate method uses the @asyncapi/generator library to generate the code after doing a couple of checks on the provided input.
However, the code has more side effects than imagined. Where are the generated files being generated? File system? In memory or disk? The reality is that the generated files are written in disk (destDir arg is the hint).

Do we really want to write files on disk? We know that we are generating new files for every request into the disk. This has some implications on the application's performance: I/O operations are expensive both in terms of resources and time. We could be eventually generating a bottleneck in the application. Of course, the alternative would be to generate them in memory, which also has some other cons to consider.

For illustration purposes, let's say we decide to generate the files in memory instead. Does the @asyncapi/generator library support this? Not really.
@asyncapi/generator requires targetDir argument to be provided to the generate method. See generator.js for more details.

Would it make sense to add a new function that does the same as the generate method but generates the files in memory? Would it make sense to add a new function for all possible destinations or file systems?
Would it rather make sense to add a new argument to generate method to indicate if the output should be written into an in-memory filesystem?

I'm telling you that keeping adding code on top of the existing one without thinking broadly is not ideal.

The way (?)

What are the issues we are trying to solve on the generate method?

It is not very extensible:
- Adding a new argument to discriminate if output goes to disk or in memory will:
  - Make the code harder to test. It is already hard to test since it has many arguments (smell). But imagine having to test different outputs and their features.
  - Make the code harder to maintain.
  - Make the code harder to extend. Let's say we prefer to output the code to stdout (there is some similar feature already).
  - Make the code strongly dependent on another output format or filesystem type.
- Adding a new method will:
  - Make generate method more straightforward, but most of the logic will be shared in the best scenario, duplicated or semi-duplicated in the worst.
  - Set a new tendency for developers to create new methods for every new supported output type. This could be cool initially, but it does not scale very well.
  - Still make the code strongly dependent on another output format or filesystem type.
It is not very reusable:
- It has a strong dependency on a real filesystem. Since it depends on the filesystem type, we have seen that you won't ever be able to use any other filesystem or output of your convenience if it is not in the code.
It is hard to test:
- The code does so many things at the same time.
- If we keep adding functionality, we will have to test it as well.

In the end, we want to make the code more extensible, more reusable and more testable.

Software design patterns and principles

Software design patterns and principles are a collection of rules to apply to the code that solves a set of common problems around software design. There are many of them; some are more difficult than others to implement, some others are only used in particular scenarios. Not all of them apply to every problem.

Let's focus on the core feature of the generate method: It generates code based on an AsyncAPI document and a template.
Based on that core functionality, I see that it outputs files to the filesystem should not be a decision of that method. Should not be a decision of the caller?.

Dependency inversion is a core principle of software design. It is part of the SOLID principles, and it says that the code should depend on abstractions, not on concretions.

Wait a moment, Sergio. What is the concretion here? -- The filesystem, or more precisely, the fs node module. I agree we want to keep writing files to disk. But we might also want to write to some other places (as an example in memory).
Why don't we somehow abstract the filesystem so users can choose which one to use?

For simplicity sake, let's say generate code only depends on fs.writeFile method. For a moment, imagine @asyncapi/generator library is written in Typescript, so we have support for writing interfaces. What would stop us from making the following?:

import fs from 'fs';

interface filesystemInterface {
    writeFile(path: fs.PathLike | number, data: string | NodeJS.ArrayBufferView, options: fs.WriteFileOptions, callback: fs.NoParamCallback): void;
}

class Generator {
  constructor(templateName: string, filesystem: filesystemInterface} = {}) { // The rest of arguments were not added in order to simplify the example.
    // ...

    filesystem.writeFile("/tmp/foo", "abcdefg", 'utf8', (err) => {
      if (err) throw err;
      console.log('File saved');
    });
  }
}

Callers could now set the filesystem they want; for example they could directly use memfs, which is an in-memory filesystem but compatible with the fs api (so compliant with our interface).
Or rather, they could implement whatever filesystem they want. They will only need to stick with the interface we declared.

The @asyncapi/generator](https://github.com/asyncapi/generator) library could also include a set of filesystems available to the users, so they would need to pick up one:

import memfs from 'memfs';

export function InMemoryFileSystem(): filesystemInterface {
  return memfs;
}

We could go further and create another kind of interface called Writer or similar. An interface that does not care about filesystems but just about rendering bytes somewhere. Providing some implementations again, but keeping it open to users to write their own.

Software architecture patterns

Software architecture patterns are very similar to Software design patterns, but they have a more significant scope: they solve problems on (and from) the architecture of the software. Some of them can be combined sometimes; some others are exclusive to a particular scenario.

If we think again about the solution suggested above, the generate method became agnostic of the output type the user wants to use. Meaning it is more focused on the core functionality, leaving other decisions to the user who implements it.
The generate method delegates some responsibility to an upper layer: Responsibilities that are not really dependent on the core/business logic are moved to the applications that depend on the generate method.

This change grants us more flexibility to use the generate method.

Our current toolset focuses mainly on the following areas:

Validate AsyncAPI documents
Convert AsyncAPI documents from/to different versions and formats
Bundle AsyncAPI documents
Diff AsyncAPI documents
Generate templates from AsyncAPI documents
Generate visualizations (charts) from AsyncAPI documents
Analysis of AsyncAPI documents
Etc

At a glance, we can guess those areas have many shared concerns, E.g., if you need to convert a document, you must first parse it and validate it.

What if we apply the same concepts to the whole codebase? Segregation of concerns becomes the rule for designing Software architectures then.

Clean, Layered, Onion and Hexagonal Architecture.

Very catchy names, right? They are just a collection of patterns and rules to apply to your software architecture that are mainly focused on separating or segregating concerns of an application. E.g., you won't mix concerns related to the core/business logic with others related to the filesystem or network.

If we design our applications having in mind the core logic, separation of concerns will come by progressively to us.
Applications can't be built based on those patterns if the libraries used in the core don't follow them. Because of that, it is crucial to design our libraries and modules with the same mindset (as shown already in the previous point).

Without going into which pattern is better or not, I'm going to try to illustrate the thing using some drawings (based on source files made by @hgraca).

There is a lot of literature about those patterns already. I don't aim to expand the info available out there.
My main point is to show how we could apply those patterns to our codebase.

Traveling back to the beginning, @asyncapi/generator library core logic is about generating code from AsyncAPI documents. No matters if triggered from a CLI application or a web server application. No matter if generated to the local filesystem or the cloud, or in memory. Just generate code based on a well-defined input.

Suppose the application should generate code rather than a local filesystem, a remote file, or a raw input. In that case, those won't be the concerns of the application core logic but just from the Adapters.
Those are just pieces of code that transform the input into something the core logic can understand (e.g., the generate method). Examples of adapters can be the Controllers on a web service, event handlers on an EDA world, CLI command handlers, Command handlers from a Command Bus), etc.

The critical point is that under no circumstances should @asyncapi/generator be tied to those adapters.

Based on a possible future scenario, let's move on to a more advanced picture.

Even though there is a lot of information to digest here, the main idea is to show how everything interconnects. The same application (do not assume source code lives in the same repository) separates concerns. Cross-cutting concerns are not tied to the core logic, and this one is not tied to the filesystem, network, or any database.

How we can unify behaviors by applying tools like the Command Bus (using Commands to represent use cases similar to what CLI does).

How independent components can be used anywhere, not only across the whole application layer, but also on every port and adapter.

Patterns and tools that can help us

There are tons, but here are some of the most common ones:

SOLID principles. Well-known patterns that normally help developers to make better code (can be extended, reused, and tested) software.
Hexagonal Architecture. A very opinionated pattern, still trendy since years ago.
Clean Architecture. A mix of several architecture patterns, including the Hexagonal one.
Domain Driven Design or DDD. An extensive collection of practices that help projects and companies write and manage software at scale. It is based on a lot of good practices and patterns. Its main focus is on defining the domain, within domain experts, such as entities and contexts, and the interaction and boundaries (called bounded contexts) between them. An interesting type here is the Shared Kernel, where in practice, several components are using shared code but never dependant on each other.
Event Sourcing. Actions end up producing events. Listeners will act when those events are triggered. Even though this goes deeper, I guess it sounds familiar to redux users.
CQRS. Even though you can use only a small portion of it. Still, the concepts of Commands and Queries flowing through a Command Bus are phenomenal for decoupling architectures and reusing code in several applications.
12Factor. Mainly designed for web apps, but still some of the points can also be applied to any other application.

Perception and last thoughts

I wrote these lines to remind our community, including myself, that creating software is not only about writing code.

The way I see it, the separation of concerns is an essential thing to do when designing software. Isolating the core logic from the rest of the application has a lot of benefits, as we saw earlier.

"The phrase 'This happened and it is bad' is actually two impressions. The first — 'This happened'— is objective. The second — 'it is bad' — is subjective".
— Ryan Holiday, The Obstacle is the Way.

However, everything should be taken as a starting point and never as a final solution. I like to say that considering what books say as "the only way" is submitting to what others think is the best. And I think everything is about perception. Not all people will find all patterns helpful, especially at their first look. But still worth reading about it and meditating on it.

Because, in the end, who doesn't want software that can be easy extended, reusable and highly testable?

Bibliography and Links of interest

https://alistair.cockburn.us/hexagonal-architecture. The original post where Alistair Cockburn wrote about Hexagonal Architecture for the first time.
Domain-Driven Design, Eric Evans. Also known as "The big blue book".
Implementing Domain-Driven Design, Vaughn Vernon. Also known as "The big red book".
https://fideloper.com/hexagonal-architecture, an excellent summary article about Hexagonal by Chris Fidao.
https://herbertograca.com/2017/07/03/the-software-architecture-chronicles, an excellent saga about Software architecture patterns by Herberto Graça (@hgraca).

BOLT04 · 2021-12-30T23:23:48Z

BOLT04
Dec 30, 2021
Collaborator

wow @smoya, you've done a tremendous amount of work to put this together, and the outcome of your effort is incredible 💯 💯 💯
These 3 topics in particular really speak to me, so I think discussing this is very well worth our time 👍 .

Just to clarify on the topic of sharing code, we are focusing on the architecture of all AsyncAPI tools of the same language (e.g. JS/TS or Go)? I'm asking because the tools that can reuse the domain layer, i.e. domain-asyncapi-js, are the tools written in that same language. Then if we want to reuse this logic/behavior and knowledge on Go, for example, we'd use an user interface that is available for Go like standard protocols (e.g. HTTP using the Server API). Or we can choose to build a native library (domain-asyncapi-go perhaps) that would have it's infrastructure and UI layers.
But I think this is what we are assuming, and there are no issues with this level of "coupling". Just wanted to ask anyway 🙂.

In the concrete example, I agree the path forward is to decouple the generator and allow extension on the output of the file. The filesystemInterface also seems coupled to file systems, but I know this is a simple example 😅. We could make a simpler interface since those parameters (output directory, encoding, etc ) are controlled by the caller. The data is the only variable and is controlled by the generator, so it's the only argument.
As you said, let's improve upon the interfaces and make them simpler. A simple contract but yet extensible 😃.

The generate method delegates some responsibility to an upper layer:

I actually think about this in reverse 😅. The generate method is a high-level policy, the piece of software implementing the interface is the low-level policy. As the DI principle says, "The high-level module does not depend on the low-level module" 😃, so I tend to think of that diagram you show as the domain layer being at the top (top == center) and the rest being below it, but I totally get what you mean 😄 .

Perhaps we could come up with a list of other concrete examples, and do some up front design on the tools this brings the most value, what does everyone think?

Thank you once again for this initiative 🙂.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AsyncAPI Initiative

Extensibility, Reusability, and Testability. Let's chat about it! #212

{{title}}

Replies: 1 comment

{{title}}

Select a reply

AsyncAPI Initiative

Extensibility, Reusability, and Testability. Let's chat about it! #212

smoya Dec 29, 2021 Maintainer

The problem

The way (?)

Software design patterns and principles

Software architecture patterns

Clean, Layered, Onion and Hexagonal Architecture.

Patterns and tools that can help us

Perception and last thoughts

Bibliography and Links of interest

Replies: 1 comment

BOLT04 Dec 30, 2021 Collaborator

smoya
Dec 29, 2021
Maintainer

BOLT04
Dec 30, 2021
Collaborator