Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11784. Allow aborting FSO multipart uploads with missing parent directories #7700

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sokui
Copy link
Contributor

@sokui sokui commented Jan 14, 2025

What changes were proposed in this pull request?

HDDS-11784 adding missing parent directories for MPU abort and expired abort request

We observed lots of open key (files) in our FSO enabled ozone cluster. And these are all incomplete MPU keys.

When I tried to abort MPU by using s3 cli as below, I got the exception complaining about the parent directory is not found.

aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket '2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key 'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob' --upload-id '4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074'

An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: The specified multipart upload does not exist. The upload ID might be invalid, or the multipart upload might have been aborted or completed.

Exceptions in the log

NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key: CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob
at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148)
at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402)
at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39)
at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398)
at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587)
at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: DIRECTORY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent directory of CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob
at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038)
at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988)
at org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122)
at org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99)
at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256)
at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145)
... 9 more

This issue is because of missing the parent directories. The causation and the solution is explained here: #7566 (comment).

There is another PR which we closed. We had some conversation about which approach should be taken. Pls reference here: #7566

I think the original intention is that we want to ensure the MPU abort will succeed regardless whether the parent exists or not. I can think of one way, how about we first get the OmMultipartKeyInfo from the multipartInfoTable since it does not require parentId information. Afterwards, we use the OmMultipartKeyInfo#getParentId to derive the MPU open key, without needing to call OMFileRequest#getParentId. That way, we don't need to create the missing parent directories.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11784

How was this patch tested?

It tested by the CI. Also we validate it in our cluster.

@adoroszlai adoroszlai added s3 S3 Gateway om labels Jan 15, 2025
Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sokui Thanks for the patch.

Please check the test failures

getMultipartKeyFSO is used by a lot of MPU flows. Furthermore, multipartInfoTable will be accessed two times although. I think we can create another function similar to getMultipartKeyFSO by passing the OmMultipartKeyInfo and use that only for the abort case. This means switching the order in S3MultipartAbortRequest by first getting from multipartInfoTable and then to openFileTable. All other implementations are welcome.

Also let's add a simple test as suggested in #7566 (comment)

For example, there is a directory "/a" and there is a pending MPU key with path "/a/mpu_key" that was initiated but haven't been completed / aborted yet. After the MPU key was initiated, the directory "/a" is deleted, and since mpu_key has not been completed yet and does not exist in fileTable, the DIRECTORY_NOT_EMPTY will not be thrown in OMKeyDeleteRequestWithFSO#validateAndUpdateCache. Therefore mpu_key is orphaned and when it's completed / aborted, it will fail in OMFileRequest#getParentID since the parent directory has been deleted.

@ivandika3 ivandika3 changed the title HDDS-11784 get the parent id for MPU even it is missing parent direct… HDDS-11784 get the parent id for MPU even it is missing parent directies for MPU abort and expired abort request Jan 15, 2025
@ivandika3 ivandika3 changed the title HDDS-11784 get the parent id for MPU even it is missing parent directies for MPU abort and expired abort request HDDS-11784. Allow aborting FSO multipart uploads with missing parent directories Jan 15, 2025
@adoroszlai adoroszlai marked this pull request as draft January 15, 2025 13:08
@sokui
Copy link
Contributor Author

sokui commented Jan 15, 2025

@sokui Thanks for the patch.

Please check the test failures

getMultipartKeyFSO is used by a lot of MPU flows. Furthermore, multipartInfoTable will be accessed two times although. I think we can create another function similar to getMultipartKeyFSO by passing the OmMultipartKeyInfo and use that only for the abort case. This means switching the order in S3MultipartAbortRequest by first getting from multipartInfoTable and then to openFileTable. All other implementations are welcome.

Also let's add a simple test as suggested in #7566 (comment)

For example, there is a directory "/a" and there is a pending MPU key with path "/a/mpu_key" that was initiated but haven't been completed / aborted yet. After the MPU key was initiated, the directory "/a" is deleted, and since mpu_key has not been completed yet and does not exist in fileTable, the DIRECTORY_NOT_EMPTY will not be thrown in OMKeyDeleteRequestWithFSO#validateAndUpdateCache. Therefore mpu_key is orphaned and when it's completed / aborted, it will fail in OMFileRequest#getParentID since the parent directory has been deleted.

My consideration is that org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils#getMultipartOpenKey is used by multiple places, including S3ExpiredMultipartUploadsAbortRequest and S3MultipartUploadAbortRequest. By updating one place here, it benefits for two places (both not work currently). Secondly, if my current implementation of getMultipartKeyFSO is more reliable, there is no reason to only limit this benefit to S3MultipartUploadAbortRequest. All the other places should use this as well.

I originally implemented the exact way you describe. But considering the above reasons, I change to directly modify getMultipartKeyFSO. Pls let me know if it makes sense.

For the interface, I noticed OmMultipartKeyInfo is already there in S3MultipartUploadAbortRequest. Let me take a look how to make the interface better so that it can be used in all places.

For the tests, I will take a look the failure. For the new test you suggested, do you know if there is similar testing existing so that I can reference it? I am not super familiar with ozone code base. So if there is no such similar code there, could you pls show me some code snippet which I can start with. Really appreciate it!

Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My consideration is that org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils#getMultipartOpenKey is used by multiple places, including S3ExpiredMultipartUploadsAbortRequest and S3MultipartUploadAbortRequest. By updating one place here, it benefits for two places (both not work currently). Secondly, if my current implementation of getMultipartKeyFSO is more reliable, there is no reason to only limit this benefit to S3MultipartUploadAbortRequest. All the other places should use this as well.

My worry is that there might be some places where the expectations for OMMultipartUploadUtils#getMultipartKeyFSO is to access the open key/file table or there might some places where the multipartInfoTable entry does not exist yet, which might result in NPE which might crash the OM (we should handle the possible NPE). However, I'm OK as long as there are no test regressions.

For the tests, I will take a look the failure. For the new test you suggested, do you know if there is similar testing existing so that I can reference it? I am not super familiar with ozone code base. So if there is no such similar code there, could you pls show me some code snippet which I can start with. Really appreciate it!

You can start with TestOzoneClientMultipartUploadWithFSO integration test.

Comment on lines +903 to +904
final OmMultipartKeyInfo multipartKeyInfo =
getMultipartInfoTable().get(nonFSOMultipartKey);
Copy link
Contributor

@ivandika3 ivandika3 Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please handle null OmMultipartKeyInfo if the entry does not exist in multipartInfoTable to prevent NPE that might crash all OMs since RuntimeException is not caught in validateAndUpdateCache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Comment on lines +901 to +902
final String nonFSOMultipartKey =
getMultipartKey(volume, bucket, key, uploadId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename nonFSOMultipartKey to multipartKey. nonFSOMultipartKey name is quite confusing since we are dealing with FSO MPU key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I think multipartKey is ambitious. A multipartKey could be a FSO multipartKey or nonFSO multipartKey, since here we are dealing with FSO, that's why I intentionally name it as nonFSOMultipartKey so that the reader can understand we should not directly return this value.

If you think multipartKey is conventionally mean nonFSO multipartKey, I can rename it. Pls let me know. Thanks

@sokui
Copy link
Contributor Author

sokui commented Jan 17, 2025

My consideration is that org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils#getMultipartOpenKey is used by multiple places, including S3ExpiredMultipartUploadsAbortRequest and S3MultipartUploadAbortRequest. By updating one place here, it benefits for two places (both not work currently). Secondly, if my current implementation of getMultipartKeyFSO is more reliable, there is no reason to only limit this benefit to S3MultipartUploadAbortRequest. All the other places should use this as well.

My worry is that there might be some places where the expectations for OMMultipartUploadUtils#getMultipartKeyFSO is to access the open key/file table or there might some places where the multipartInfoTable entry does not exist yet, which might result in NPE which might crash the OM (we should handle the possible NPE). However, I'm OK as long as there are no test regressions.

For the tests, I will take a look the failure. For the new test you suggested, do you know if there is similar testing existing so that I can reference it? I am not super familiar with ozone code base. So if there is no such similar code there, could you pls show me some code snippet which I can start with. Really appreciate it!

You can start with TestOzoneClientMultipartUploadWithFSO integration test.

I started with some test with the following code. But It seem I cannot delete the directory.

    String parentDir = "a/b";
    keyName = parentDir + UUID.randomUUID();

    OzoneManager ozoneManager = cluster.getOzoneManager();
    String buckKey = ozoneManager.getMetadataManager()
        .getBucketKey(volume.getName(), bucket.getName());
    OmBucketInfo buckInfo =
        ozoneManager.getMetadataManager().getBucketTable().get(buckKey);
    BucketLayout bucketLayout = buckInfo.getBucketLayout();

    String uploadID = initiateMultipartUploadWithAsserts(bucket, keyName, RATIS,
        ONE);
    bucket.deleteDirectory(parentDir, false);

It gave me the following error:

KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Unable to get file status: volume: 8f75be75-539f-4fb1-8cc0-8c1123f1710f bucket: 2ce72254-3315-4662-95ae-cd09132ef932 key: a/b

	at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:763)
	at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.deleteKey(OzoneManagerProtocolClientSideTranslatorPB.java:962)
	at org.apache.hadoop.ozone.client.rpc.RpcClient.deleteKey(RpcClient.java:1631)
	at org.apache.hadoop.ozone.client.OzoneBucket.deleteDirectory(OzoneBucket.java:689)
	at org.apache.hadoop.ozone.client.rpc.TestOzoneClientMultipartUploadWithFSO.testAbortUploadSuccessWithMissingParentDirectories(TestOzoneClientMultipartUploadWithFSO.java:648)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)

I checked the bucket.deleteDirectory() method. It looks like this:

public void deleteDirectory(String key, boolean recursive)
      throws IOException {
    proxy.deleteKey(volumeName, name, key, recursive);
  }

Just wonder if it is deleting a key or a directory? And if it is directory, why I got the above exception?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
om s3 S3 Gateway
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants