-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel upload issue with put for data objects with an apostrophe in the path #703
Comments
Can you please share any error output from the client, and, if possible, any relevant server logs? Is the data object stuck in the intermediate state forever, or does it eventually get marked as stale? If the object status never gets updated, that means something serious happened in the server and we should file an issue here: https://github.com/irods/irods/issues. If the iRODS agent eventually times out the connection and marks the data object as stale, then the problem is a bit more contained. |
FWIW I tried and succeeded right away in reproducing with v3.0.0 .... The issue does seem limited to paths containing an apostrophe, but I can't find anything the PRC is doing that is suspicious; the line given in the data dump is simply feeding the logical path as-is to another data object open API call, in an auxiliary thread for the parallel transfer.
|
@alanking When we open(DATA_OBJ_OPEN_AN) with REPLICA_TOKEN_KW , are we hitting actually a different code path, one that targets replicas? Wonder if the Genquery called in that case is more fragile than the one generally used by a data open and therefore reacts to the apostrophe? |
There are two API calls which are used for opening a replica: DataObjOpen and replica_open. For multi-1247 parallel transfer, replica_open is used for at least opening the secondary streams, to my knowledge.
It's possible that some of the underlying mechanics in replica_open differ from DataObjOpen, but, ultimately, replica_open is actually calling DataObjOpen under the covers. It's possible that some preliminary query runs before that call. I'm noticing that the path in your example has an apostrophe in the data object name whereas the path in OP indicates that there is an apostrophe in one of the parent collections of the data object (in addition to or instead of, I do not know). Not sure whether that matters, but may wish to try that scenario as well. Anyway, I think this is yet another manifestation of the GenQuery1 parser: irods/irods#3902. I wonder whether this problem exists in tip of main for the server... |
@alanking It seems to me the replica_open API would (or should) have enough information to go on with just the replica ID itself. Why would it need the path? But a DataObjInp* type is also needed for that API, so ... hmm. |
rc_replica_open just optimizes network usage by handling the calls to rsDataObjOpen and rs_get_file_descriptor_info for you. It has the same interface as rcDataObjOpen - i.e. it requires a logical path. |
Forgot to mention the only interface difference between rc_replica_open and rcDataObjOpen is that rc_replica_open returns the l1 descriptor info for parallel transfer purposes. |
This is the error output from the client as requested. Same as d-w-moore the object gets marked as stale. ---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[10], line 5
3 path = "/ghum/home/datateam_ghum/reproduce_error/collection1/joachim's collection"
4 coll = session.collections.get(path)
----> 5 session.data_objects.put("files/dummyfile_33mb.txt", path)
File ~/tests/python-irodsclient/venv/lib/python3.10/site-packages/irods/manager/data_object_manager.py:330, in DataObjectManager.put(self, local_path, irods_path, return_data_object, num_threads, updatables, **options)
328 o = deferred_call(self.open, (obj, "w"), options)
329 f.close()
--> 330 if not self.parallel_put(
331 local_path,
332 (obj, o),
333 total_bytes=sizelist[0],
334 num_threads=num_threads,
335 target_resource_name=options.get(kw.RESC_NAME_KW, "")
336 or options.get(kw.DEST_RESC_NAME_KW, ""),
337 open_options=options,
338 updatables=updatables,
339 ):
340 raise RuntimeError("parallel put failed")
341 else:
File ~/tests/python-irodsclient/venv/lib/python3.10/site-packages/irods/manager/data_object_manager.py:449, in DataObjectManager.parallel_put(self, file_, data_or_path_, async_, total_bytes, num_threads, target_resource_name, open_options, updatables, progressQueue)
431 def parallel_put(
432 self,
433 file_,
(...)
441 progressQueue=False,
442 ):
443 """Call into the irods.parallel library for multi-1247 PUT.
444
445 Called from a session.data_objects.put(...) on the condition that the
446 data object is determined to be of appropriate size for parallel upload.
447
448 """
--> 449 return parallel.io_main(
450 self.sess,
451 data_or_path_,
452 parallel.Oper.PUT | (parallel.Oper.NONBLOCKING if async_ else 0),
453 file_,
454 num_threads=num_threads,
455 total_bytes=total_bytes,
456 target_resource_name=target_resource_name,
457 open_options=open_options,
458 queueLength=(DEFAULT_QUEUE_DEPTH if progressQueue else 0),
459 updatables=updatables,
460 )
File ~/tests/python-irodsclient/venv/lib/python3.10/site-packages/irods/parallel.py:543, in io_main(session, Data, opr_, fname, R, **kwopt)
540 queueLength = kwopt.get("queueLength", 0)
542 pass_thru_options = ("updatables", "queueLength")
--> 543 retval = _io_multipart_threaded(
544 Operation,
545 (Data, Io),
546 replica_token,
547 resc_hier,
548 session,
549 fname,
550 total_bytes,
551 num_threads=num_threads,
552 **{k: v for k, v in kwopt.items() if k in pass_thru_options}
553 )
555 # SessionObject.data_objects.parallel_{put,get} will return:
556 # - immediately with an AsyncNotify instance, if Oper.NONBLOCKING flag is used.
557 # - upon completion with a boolean completion status, otherwise.
559 if Operation.isNonBlocking():
File ~/tests/python-irodsclient/venv/lib/python3.10/site-packages/irods/parallel.py:409, in _io_multipart_threaded(operation_, dataObj_and_IO, replica_token, hier_str, session, fname, total_size, num_threads, **extra_options)
407 for byte_range in ranges:
408 if Io is None:
--> 409 Io = session.data_objects.open(
410 Data_object.path,
411 Operation.data_object_mode(initial_open=False),
412 create=False,
413 finalize_on_close=False,
414 allow_redirect=False,
415 **{
416 kw.NUM_THREADS_KW: str(num_threads),
417 kw.DATA_SIZE_KW: str(total_size),
418 kw.RESC_HIER_STR_KW: hier_str,
419 kw.REPLICA_TOKEN_KW: replica_token,
420 }
421 )
422 mgr.add_io(Io)
423 logger.debug("target_host = %s", Io.raw.session.pool.account.host)
File ~/tests/python-irodsclient/venv/lib/python3.10/site-packages/irods/manager/data_object_manager.py:623, in DataObjectManager.open(self, path, mode, create, finalize_on_close, auto_close, returned_values, allow_redirect, **options)
619 message = iRODSMessage(
620 "RODS_API_REQ", msg=message_body, int_info=api_number["DATA_OBJ_OPEN_AN"]
621 )
622 conn.send(message)
--> 623 desc = conn.recv().int_info
625 raw = iRODSDataObjectFileRaw(
626 conn, desc, finalize_on_close=finalize_on_close, **options
627 )
628 raw.session = directed_sess
File ~/tests/python-irodsclient/venv/lib/python3.10/site-packages/irods/connection.py:179, in Connection.recv(self, into_buffer, return_message, acceptable_errors)
177 err_msg = None
178 if nominal_code(msg.int_info) not in acceptable_codes:
--> 179 exc = get_exception_by_code(msg.int_info, err_msg)
180 exc.server_msg = msg
181 raise exc
File ~/tests/python-irodsclient/venv/lib/python3.10/site-packages/irods/exception.py:214, in get_exception_by_code(code, message)
213 def get_exception_by_code(code, message=None):
--> 214 exc_class = iRODSExceptionMeta.codes[rounded_code(code)]
215 exc_instance = exc_class(message)
216 exc_instance.code = code
KeyError: -405000 |
Follow-up to my colleague Joachim's input – additional details:
iRODS Logs:
|
It looks like this is an issue in the server side instead of a pure PRC issue. Here is our trial result with the gocommand:
|
When uploading data to irods using data_objects.put() I get the following error
irods error: -405000 INTERMEDIATE_REPLICA_ACCESS
This happens when the file is larger than 32mb and there is an apostrophe in the path (eg. "zone/home/project/collections/user's collection"). The data object is created as a broken 0 bytes file. When I try to delete this file I get the following error:
irods error: -1803000 HIERARCHY_ERROR
If the parameter for data_objects.put() num_threads is set to 1 the upload is successful so it is most likely related to the threads.
The text was updated successfully, but these errors were encountered: