Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support having one json-file per content item #223

Merged
merged 3 commits into from
Oct 11, 2023
Merged

Support having one json-file per content item #223

merged 3 commits into from
Oct 11, 2023

Conversation

pbauer
Copy link
Member

@pbauer pbauer commented Oct 7, 2023

No description provided.

@fredvd
Copy link
Member

fredvd commented Oct 7, 2023

I chatted with @pbauer and @davisagli and maybe @flipmcf about this during PloneConf 2023. The problem here is one of the core computer science ones, 'naming things'.

the goal of storing items in single file items is that

  • diffs can be handled by git/github
  • you could patch individual items's json or binary file contents in for example a content export for a plone.distribution demo data
  • And for some purposes you don't need to find an editor that can handle 100+Mb of data with larger exports.

I have only a beginning love/hate relation toward generative LLM's and Chat GTP, but it has helped me a few times already to clear my mind. check this prompt and response:


I want to export a hierarchical database with nodes that have parent/children relation to a classical filesystem. the nodes should be individual files. What is a correct naming scheme or algorithm for the storage of the individual files so that the order is preserved for exiting items. But new items can also be added or items can be deleted. All of this without renaming files or folder id's on the filesystem.


To achieve a hierarchical database representation as individual files in a classical filesystem with the ability to preserve order, add new items, and delete items without renaming files or folder IDs, you can consider the following naming scheme:

  1. Unique Identifiers: Each node (item) in your hierarchical database should have a unique identifier. This identifier could be generated based on a UUID (Universally Unique Identifier) or some other scheme that ensures uniqueness. This identifier will not change even when items are added or deleted.

  2. Hierarchy in File Names: To preserve the hierarchical structure, you can include the parent-child relationships in the file names. For example, you can use a path-like structure where each file name contains the unique identifier of the node and its parent node.

    Example:

    • Root Node: root-uuid.txt
    • Child Node 1 of Root: root-uuid/child1-uuid.txt
    • Child Node 2 of Root: root-uuid/child2-uuid.txt
    • Sub-Child Node 1 of Child 1: root-uuid/child1-uuid/subchild1-uuid.txt
  3. Ordering: To preserve order, you can include a numeric or timestamp-based prefix or suffix in the file names. This way, the order of items can be maintained without changing the filenames when adding new items.

    Example:

    • Root Node with Order 1: 01-root-uuid.txt
    • Child Node 1 with Order 2: 02-root-uuid/child1-uuid.txt
    • Child Node 2 with Order 3: 03-root-uuid/child2-uuid.txt
  4. Deletion: When an item is deleted, you can mark it as deleted by adding a special prefix or suffix to the file name, e.g., deleted-. This way, you don't have to rename or remove the file immediately, and you can still keep track of deleted items.

    Example:

    • Deleted Child Node 1: deleted-02-root-uuid/child1-uuid.txt

This naming scheme allows you to preserve the hierarchical structure, maintain order, add new items without renaming existing ones, and keep track of deleted items in your classical filesystem-based representation of the hierarchical database.


So , with the added complexity that an export would need to support reading an existing hierarchical export to 'patch' the existing content tree, we would need to add some fancy numbering when you insert an item inbetween 1 and 2 for the position so it would become 1.1 or 1.1.1 (but github's diffing algorithms could handle this correctly) and that we have folderish items where you'd create a file for a folderish pages that could become a hierarchy with contents added into it .

Is using folders and a naming scheme like objposid_[optional_extenstion_pos]-uuid.json and an an accompanying identical folder name without .json as soon as it contains content a naming scheme that would give us all of these features?

@rber474
Copy link
Member

rber474 commented Oct 7, 2023

Yesterday I prototyped an hierarchy export/ import for content types.
I haven’t submitted any changes to the fork but it’s working.
I hadn’t thought about order position, but will take note of that. All .json files go onto their proper “folder” and the import takes files using iglob, so they are imported by folder position.
Will post some pics and expand info. Any feedback will be great!!!
Ps: about names, I used type_uid.json to make it easier to indentify elements.

@pbauer pbauer requested a review from ericof October 8, 2023 08:43
@pbauer
Copy link
Member Author

pbauer commented Oct 8, 2023

At the moment the export creates a structure like this:

└── Plone
    ├── 1.json
    ├── 2.json
    ├── 3.json
    [...]
    └── 543600.json

The import also works with the structure created by collective.jsonify:

└── Plone
    ├── 0
    │  ├── 1.json
    │  ├── [...]
    │  └── 999.json
    ├── 1
    │  ├── 1000.json
    │  ├── 1001.json
    │  ├── [...]
    [...]

The order of items in containers and the tree-structure of the portal is not covered here.
The order of items is by path and the same as the default (one large json-file) so that items are only imported after their parents are already there.
The sorting of content in containers is differed to a different export and import.
I'm happy to adapt the export and import to use different filenames.

Copy link
Member

@ericof ericof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants