Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRUNK-6300 Add Storage Service #4920

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rkorytkowski
Copy link
Member

@rkorytkowski rkorytkowski commented Feb 12, 2025

Description of what I changed

Issue I worked on

see https://issues.openmrs.org/browse/TRUNK-6300

Checklist: I completed these to help reviewers :)

  • My IDE is configured to follow the code style of this project.

    No? Unsure? -> configure your IDE, format the code and add the changes with git add . && git commit --amend

  • I have added tests to cover my changes. (If you refactored
    existing code that was well tested you do not have to add tests)

    No? -> write tests and add them to this commit git add . && git commit --amend

  • I ran mvn clean package right before creating this pull request and
    added all formatting changes to my commit.

    No? -> execute above command

  • All new and existing tests passed.

    No? -> figure out why and add the fix to your commit. It is your responsibility to make sure your code works.

  • My pull request is based on the latest changes of the master branch.

    No? Unsure? -> execute command git pull --rebase upstream master

* <p>
* Data objects are stored under generated or provided unique keys. Data and keys are immutable.
*
* @since 2.8.0, 2.7.2, 2.6.16, 2.5.15
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we backport the interface and implementation to all versions used by o3, o2, bahmni, ozone... It is to be able to easily add support in modules. We will only modify the master branch to use the new service in api. Does this list of versions cover what is used by implementations these days?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use-case for backporting this? I kind of imagine that we'll need to re-write stuff in various modules and that's going to be a lot simpler to just say "this works with 2.8.0" rather than "this works with 2.8.0+, 2.7.3+, 2.6.16+, and 2.5.15+".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that if you want to correct your module to use storage service and it's only available in 2.8.0 then your module is incompatible with all older versions. It's not something module authors wanted to do historically. If we backport it to older branches then modules only need to require new maintenance branches of core, but do not have to suddenly drop support for core older than 2.8.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not something module authors wanted to do historically

If that's the problem we're trying to solve, we'll need to backport to a lot further. While I don't know of any major deployments running against 2.3.x or lower, for example, very few modules are written that target anything newer than that and many of them target substantially older versions. E.g., attachments targets 2.2.0, module-spa targets 2.4.1, reporting targets 1.9.9.

I think the best path here is to add this to 2.8.0 and let module authors figure out how they want to handle the migration, e.g., force the module to only be compatible with 2.8.0 or add their own abstraction layer that handles both.

@rkorytkowski
Copy link
Member Author

Please do not merge until I refactor complex obs to use this service in https://openmrs.atlassian.net/jira/software/c/projects/TRUNK/issues/TRUNK-6304

@rkorytkowski rkorytkowski marked this pull request as draft February 12, 2025 09:39
@rkorytkowski
Copy link
Member Author

Also I need to add key encoding so it doesn't include special characters. It might be needed to add mimetype support as well as it's used in https://github.com/openmrs/openmrs-core/blob/master/api/src/main/java/org/openmrs/obs/handler/AbstractHandler.java#L117

* @return stream of keys
* @throws IOException IO error
*/
Stream<String> getKeys(String moduleId, String prefix) throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i see! 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would a key be a folder name? For instance lucene as we currently have in the application data directory?

Copy link
Member Author

@rkorytkowski rkorytkowski Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key points to an actual file not a directory, but a key can include a directory e.g. lucene/data.dat, lucene/data0.dat, etc.

If you want to search for all files in lucene you would call the method with "lucene/". If you want to list all files starting with data in lucene dir you would do "lucene/data".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also give an example of how the prefix would differ from the key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I call it prefix, because it matches any keys starting with the prefix e.g. having 'abc/ab', 'abc/abc', 'ab', 'abc/abcd', 'abc/ac' and the prefix 'abc/' we would get back keys 'abc/ab', 'abc/abc', 'abc/abcd' but not 'ab' and 'abc/ac'.

Copy link
Member

@ibacher ibacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rkorytkowski! A few questions / comments.

* <p>
* Data objects are stored under generated or provided unique keys. Data and keys are immutable.
*
* @since 2.8.0, 2.7.2, 2.6.16, 2.5.15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use-case for backporting this? I kind of imagine that we'll need to re-write stuff in various modules and that's going to be a lot simpler to just say "this works with 2.8.0" rather than "this works with 2.8.0+, 2.7.3+, 2.6.16+, and 2.5.15+".


private final String storageDir;

private final DateFormat dateFormat = new SimpleDateFormat("yyyy/MM-dd/HH-mm/yyyy-MM-dd-HH-mm-ss-SSS-");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any new code, we should probably use DateTimeFormatter (SimpleDateFormat is rather infamously not thread-safe).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!


public Path getPath(final String key) {
Path path = Paths.get(key);
if (path.isAbsolute()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purposes of this service, it would seem safer to just enforce the rule that everything is interpreted relative to the storageDir.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to support old storage location and new storage location as well as temp files. I'll add more clarification around that.

private Path newKey(String moduleId, String keySuffix) {
if (keySuffix == null) {
Date date = new Date();
keySuffix = dateFormat.format(date) + UUID.randomUUID().toString().substring(0, 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to use Common Lang's RandomStringUtils#nextAlphanumeric(8)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.


@Override
public String saveTempData(InputStream inputStream) throws IOException {
Path tempFile = Files.createTempFile("openmrs-temp", ".tmp");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of my comment about not allowing absolute paths:

Suggested change
Path tempFile = Files.createTempFile("openmrs-temp", ".tmp");
Path tempFile = Files.createTempFile(Paths.get(storageDir, ".tmp"), "openmrs-temp", ".tmp");


/**
* Saves the given InputStream as a temporary file. Temporary files are kept, if possible, in local
* storage for fast IO. They are available only for the duration of a request and for the single
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the "request" that this is tied to?

Copy link
Member Author

@rkorytkowski rkorytkowski Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not tied to any request in particular. It's just mind setting. Even in replication you can always expect that a request runs in a single VM and has access to local temp files.

dirs.add(moduleId);
storagePath = Paths.get(storageDir, "modules", moduleId);
} else {
storagePath = Paths.get(storageDir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If moduleId is null, should we filter the modules directory from the returned results? Or else are we creating the contract that every storage service should have a top-level key called modules that stores the module-specific data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll slightly change this in the next revision I push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants