Skip to content

Commit

Permalink
Add documentation for hash partitioning and the related migration ste…
Browse files Browse the repository at this point in the history
…ps (#5302)

Co-authored-by: Simon Dumas <[email protected]>
  • Loading branch information
imsdu and Simon Dumas authored Mar 5, 2025
1 parent d5a0a05 commit 134382b
Show file tree
Hide file tree
Showing 7 changed files with 226 additions and 45 deletions.
5 changes: 5 additions & 0 deletions delta/app/src/main/resources/app.conf
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,13 @@ app {
# Access to database for streaming access (indexing / SSEs)
streaming = ${app.defaults.database.access}

# Partition strategy for the scoped_events and scoped_states tables
partition-strategy {
# List: a partition is created for every project
type = list
# Uncomment to use a hash strategy where the number of partitions will be fixed.
#type = hash
# modulo = 5
}

name = ${app.defaults.database.name}
Expand Down

This file was deleted.

1 change: 1 addition & 0 deletions docs/src/main/paradox/docs/releases/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Please test this new version carefully and address any issues on the client side

### New features / enhancements

- Support for hash partitioning in PostgreSQL
- Full support of JSON-LD 1.1 via [Apache Jena 5.x](https://jena.apache.org/) and [Titanium JSON-LD](https://github.com/filip26/titanium-json-ld)
- The default indexing in Elasticsearch has been rewritten to push data in a single Elasticsearch index in order to reduce shard consumption.
- Projects, resolvers, schemas, storages and views are not indexed anymore to prepare further changes on how they are handled
Expand Down
124 changes: 124 additions & 0 deletions docs/src/main/paradox/docs/releases/v1.11-to-v1.12-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,142 @@ and [following the Elastic recommendations about sharding](https://www.elastic.c
* You can optionally allocate more resources to Delta, PostgreSQL, Blazegraph and Elasticsearch to accommodate for the load related to the indexing
* You may also want to reindex in Elasticsearch first and then in Blazegraph if the load is too high
* Run the following SQL script to delete the former default elasticsearch views and their progress:

```sql
DELETE FROM scoped_events WHERE type = 'elasticsearch' AND id = 'https://bluebrain.github.io/nexus/vocabulary/defaultElasticSearchIndex';
DELETE FROM scoped_states WHERE type = 'elasticsearch' AND id = 'https://bluebrain.github.io/nexus/vocabulary/defaultElasticSearchIndex';
DELETE FROM projection_offsets WHERE module = 'elasticsearch' AND resource_id = 'https://bluebrain.github.io/nexus/vocabulary/defaultElasticSearchIndex';
DELETE FROM projection_offsets WHERE name = 'event-metrics';
```
* Run the following SQL script to delete the progress of Blazegraph indexing:

```sql
DELETE FROM projection_offsets WHERE module = 'blazegraph';
```
* Start Nexus Delta
* Monitor the load of the different components and the indexing process by running the following query

```sql
SELECT * FROM projection_offsets ORDER BY updated_at DESC;
```

# Adopt hash partitioning

* Make a backup of your PostgreSQL instance
* Make sure to accommodate your deployment so that it can accomodate twice your data during the migration
* Scale down Delta
* Run the [following script](https://github.com/BlueBrain/nexus/blob/$git.branch$/delta/sourcing-psql/src/main/resources/scripts/postgres/init/common/V1_12_M02__partition_config.ddl) to init the partition_config table
* Run the following script to rename the partitioning tables and their indices:

```sql
-- Renaming scoped_events table and its indices
ALTER TABLE scoped_events RENAME TO scoped_events_list;
ALTER INDEX scoped_events_type_idx RENAME TO scoped_events_type_idx_list ;
ALTER INDEX scoped_events_ordering_idx RENAME TO scoped_events_ordering_idx_list ;

-- Renaming scoped_states table and its indices
ALTER TABLE scoped_states RENAME TO scoped_states_list;
ALTER INDEX scoped_states_type_idx RENAME TO scoped_states_type_idx_list ;
ALTER INDEX scoped_states_ordering_idx RENAME TO scoped_states_ordering_idx_list ;
ALTER INDEX project_uuid_idx RENAME TO project_uuid_idx_list ;
ALTER INDEX state_value_types_idx RENAME TO state_value_types_idx_list ;
```
* Run the following script to create the new tables:

```sql
-- scoped_events
CREATE TABLE IF NOT EXISTS public.scoped_events(
ordering bigint NOT NULL DEFAULT nextval('event_offset'),
type text NOT NULL,
org text NOT NULL,
project text NOT NULL,
id text NOT NULL,
rev integer NOT NULL,
value JSONB NOT NULL,
instant timestamptz NOT NULL,
PRIMARY KEY(org, project, id, rev)
) PARTITION BY HASH (org, project);

CREATE INDEX IF NOT EXISTS scoped_events_type_idx ON public.scoped_events(type);
CREATE INDEX IF NOT EXISTS scoped_events_ordering_idx ON public.scoped_events (ordering);

CREATE TABLE public.scoped_states_0000 PARTITION OF public.scoped_states FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE public.scoped_states_0001 PARTITION OF public.scoped_states FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- scoped_states
CREATE TABLE IF NOT EXISTS public.scoped_states(
ordering bigint NOT NULL DEFAULT nextval('state_offset'),
type text NOT NULL,
org text NOT NULL,
project text NOT NULL,
id text NOT NULL,
tag text NOT NULL,
rev integer NOT NULL,
value JSONB NOT NULL,
deprecated boolean NOT NULL,
instant timestamptz NOT NULL,
PRIMARY KEY(org, project, tag, id)
) PARTITION BY HASH (org, project);

CREATE INDEX IF NOT EXISTS scoped_states_type_idx ON public.scoped_states(type);
CREATE INDEX IF NOT EXISTS scoped_states_ordering_idx ON public.scoped_states (ordering);
CREATE INDEX IF NOT EXISTS project_uuid_idx ON public.scoped_states((value->>'uuid')) WHERE type = 'project';
CREATE INDEX IF NOT EXISTS state_value_types_idx ON public.scoped_states USING GIN ((value->'types'));
```
* Start Nexus with a configuration with the hash partition strategy without making it available to users.

The modulo value should remain low(at most a few hundreds) and should be set to fit your number of projects and resources

```hocon
app {
database {
partition-strategy {
type = hash
# Adapt the modulo value to the number of partitions you expect
modulo = 5
}
}
}
```
* Stop Nexus again once this log appears:

```
2025-03-05 14:05:25 INFO c.e.b.n.d.s.p.DatabasePartitioner - The partition strategy has not been set yet, initializing...
```
* The following queries should return the different partitions

```sql
SELECT table_name
FROM information_schema.tables WHERE table_name LIKE 'scoped_events_0%';
```

```sql
SELECT table_name
FROM information_schema.tables WHERE table_name LIKE 'scoped_states_0%';
```
* Running a query against the table partition_config should also return a row matching your configuration:

```sql
SELECT * FROM partition_config;
```
* Run the following scripts to copy data to the new tables

```sql
INSERT INTO scoped_events (SELECT * FROM scoped_events_list);
INSERT INTO scoped_states (SELECT * FROM scoped_states_list);
```
* Run a count against the old tables and the new ones to check they have the same number of rows

```sql
SELECT count(*) from scoped_events_list;
SELECT count(*) from scoped_events;
SELECT count(*) from scoped_states_list;
SELECT count(*) from scoped_states;
```
* Delete the previous tables:

```sql
DROP TABLE IF EXISTS public.scoped_events_list;
DROP TABLE IF EXISTS public.scoped_states_list;
```
* Start Delta again
12 changes: 11 additions & 1 deletion docs/src/main/paradox/docs/releases/v1.12-release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,14 @@ This change allow Nexus to provide a full support of JSON-LD 1.1 including featu

### Default indexing

The default Elasticsearch view now points to a single index to avoid the oversharding problems when a Nexus instance contains a lot of projects.
The default Elasticsearch view now points to a single index to avoid the oversharding problems when a Nexus instance contains a lot of projects.

### Support of hash partitioning in PostgreSQL

Since 1.8 and the introduction of PostgreSQL as the primary store, Nexus only permitted list partitioning where each project was assigned
to a PostgreSQL partition.

While this approach stands for deployments where there are few projects which can be quite large, it is not adapted to the scenario where there are
a lot of small projects. The new hash strategy allows to address this case.

@ref:[See database config for more details](../running-nexus/configuration/index.md#postgres-configuration)
15 changes: 15 additions & 0 deletions docs/src/main/paradox/docs/running-nexus/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ The configuration flag `akka.http.server.parsing.max-content-length` can be used

## Postgres configuration

### Database pool
@link:[The `database` section](https://github.com/BlueBrain/nexus/blob/$git.branch$/delta/app/src/main/resources/app.conf#L23){ open=new } of the configuration defines the postgres specific configuration. As Nexus Delta uses three separate pools ('read', 'write', 'streaming'), it is recommended to set the host, port, database name, username, and password via the `app.defaults.database` field, as it will apply to all pools. It is however possible to accommodate more advanced setups by configuring each pool separately by changing its respective `app.database.{read|write|streaming}` fields.

The pool size can be set using the `app.defaults.database.access.pool-size` setting for all pools, or individually for each pool (`app.database.{read|write|streaming}.access.pool-size`).
Expand All @@ -32,8 +33,22 @@ A default Postgres deployment will limit the number of connections to 100, unles

@@@

### Partitioning

Nexus supports currently two types of partitioning list and hash.

This can be set via the `app.database.partition-strategy`.

Note that it is an essential setting which can not be changed automatically once it is set, so configure it carefully according to your needs.

More information about partitioning is available @ref:[here](../postgresql.md).

### Init scripts

Before running Nexus Delta, the @link:[init scripts](https://github.com/BlueBrain/nexus/tree/$git.branch$/delta/sourcing-psql/src/main/resources/scripts/postgres/init){ open=new } should be run in the lexicographical order.

The scripts in the subdirectory matching your partitioning strategy should be run first, followed by those in the common folder.

It is possible to let Nexus Delta automatically create them using the following configuration parameters: `app.database.tables-autocreate=true`.

@@@ note { .warning }
Expand Down
111 changes: 70 additions & 41 deletions docs/src/main/paradox/docs/running-nexus/postgresql.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,76 @@ Since this is the _primary store_ it is the most important system to be
that Nexus uses in other stores can be recomputed from the one stored in PostgreSQL as the other stores are used as
mere indexing systems.

### PostgreSQL partitioning

Nexus Delta takes advantage of PostgreSQL's @link:[Table Partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) feature. This allows for improved query performance, and facilitates loading, deleting, or transferring data.

It currently supports 2 partition strategies offered by PostgreSQL:

* List partitioning
* Hash partitioning

@@@ note { .warning }

List partitioning is the default strategy.

Please be careful when configuring partitioning as Delta does not allow out-of-the box to adopt another one.
@@@

#### List partitioning

**This strategy is best when you have a small number of projects which can contain a lot of resources.**

The `public.scoped_events` and `public.scoped_states` are partitioned by organization, which is itself partitioned by the projects it contains; this follows the natural hierarchy that can be found in Nexus Delta.

Nexus Delta takes care of handling the creation and deletion of the partitions.

* If the created project is the first one of a given organization, both the organization partition and the project subpartition will be created.
* If the organization partition already exist, then only the project subpartition will be created upon project creation.

The naming scheme of the (sub)partitions is as follows:

`{table_name}_{MD5_org_hash}` for organization partitions

`{table_name}_{MD5_project_hash}` for project partition

where

* `{table_name}` is either `scoped_events` or `scoped_states`
* `{MD5_org_hash}` is the MD5 hash of the organization name
* `{MD5_project_hash}` is the MD5 hash of the project reference (i.e. has the form `{org_name}/{project_name}`)

MD5 hashing is used in order to guarantee a constant partition name length (PostgreSQL table names are limited to 63 character by default), as well as to avoid any special characters that might be allowed in project names but not in PostgreSQL table names (such as `-`).

Example:

You create the organization called `myorg`, inside of which you create the `myproject` project. When the project is created, Nexus Delta will have created the following partitions:

* `scoped_events_B665280652D01C4679777AFD9861170C`, the partition of events from the `myorg` organization
* `scoped_events_7922DA7049D5E38C83053EE145B27596`, the subpartition of the events from the `myorg/myproject` project
* `scoped_states_B665280652D01C4679777AFD9861170C`, the partition of states from the `myorg` organization
* `scoped_states_7922DA7049D5E38C83053EE145B27596`, the subpartition of the states from the `myorg/myproject` project

#### Hash partitioning

Since 1.12, Nexus allows hash partitioning. This strategy is best when you have a lot of projects containing a small amount of resources.

The `public.scoped_events` and `public.scoped_states` are partitioned with a fixed number of partitions where all events and states from a given project
will end up in the same partition.

Nexus takes care of creating the different partitions the first time it starts with this configuration.

The naming scheme of the partitions is as follows:

- `{table_name}_{(%04d)}` (ex: `scoped_events_0001`)

#### Advanced subpartitioning

While Nexus Delta provides table partitioning out-of-the-box, it is primarily addressing the case where the data is more or less uniformly spread out across multiple projects. If however there is one or more project that are very large,
it is possible to add further subpartitions according to a custom rule. This custom subpartitioning must be decided on a case-by-case basis using your knowledge of the given project; the idea is to create uniform partitions of your project.

Please refer to the @link:[PostgreSQL Table Partitioning documentation](https://www.postgresql.org/docs/current/ddl-partitioning.html).

## Tables

### Initializing the schema
Expand Down Expand Up @@ -90,47 +160,6 @@ Stores the last_update and last_ordering properties for the different projects.

This table allows to power the passivation strategy for the different views.

### PostgreSQL partitioning

Nexus Delta takes advantage of PostgreSQL's @link:[Table Partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) feature. This allows for improved query performance, and facilitates loading, deleting, or transferring data.

The `public.scoped_events` and `public.scoped_states` are partitioned by organization, which is itself partitioned by the projects it contains; this follows the natural hierarchy that can be found in Nexus Delta.

Nexus Delta takes care of handling the creation and deletion of the partitions.

* If the created project is the first one of a given organization, both the organization partition and the project subpartition will be created.
* If the organization partition already exist, then only the project subpartition will be created upon project creation.

The naming scheme of the (sub)partitions is as follows:

`{table_name}_{MD5_org_hash}` for organization partitions

`{table_name}_{MD5_project_hash}` for project partition

where

* `{table_name}` is either `scoped_events` or `scoped_states`
* `{MD5_org_hash}` is the MD5 hash of the organization name
* `{MD5_project_hash}` is the MD5 hash of the project reference (i.e. has the form `{org_name}/{project_name}`)

MD5 hashing is used in order to guarantee a constant partition name length (PostgreSQL table names are limited to 63 character by default), as well as to avoid any special characters that might be allowed in project names but not in PostgreSQL table names (such as `-`).

Example:

You create the organization called `myorg`, inside of which you create the `myproject` project. When the project is created, Nexus Delta will have created the following partitions:

* `scoped_events_B665280652D01C4679777AFD9861170C`, the partition of events from the `myorg` organization
* `scoped_events_7922DA7049D5E38C83053EE145B27596`, the subpartition of the events from the `myorg/myproject` project
* `scoped_states_B665280652D01C4679777AFD9861170C`, the partition of states from the `myorg` organization
* `scoped_states_7922DA7049D5E38C83053EE145B27596`, the subpartition of the states from the `myorg/myproject` project

#### Advanced subpartitioning

While Nexus Delta provides table partitioning out-of-the-box, it is primarily addressing the case where the data is more or less uniformly spread out across multiple projects. If however there is one or more project that are very large,
it is possible to add further subpartitions according to a custom rule. This custom subpartitioning must be decided on a case-by-case basis using your knowledge of the given project; the idea is to create uniform partitions of your project.

Please refer to the @link:[PostgreSQL Table Partitioning documentation](https://www.postgresql.org/docs/current/ddl-partitioning.html).

## Running and monitoring

**CPU:**
Expand Down

0 comments on commit 134382b

Please sign in to comment.