Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Singularity Revision #147

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docker/.htmaprc
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
DELIVERY_METHOD = "assume"

[MAP_OPTIONS]
REQUEST_DISK = "100MB"
12 changes: 6 additions & 6 deletions docker/condor_config.local
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@ EXECUTE=$(LOCAL_DIR)/execute
CRED_STORE_DIR=$(LOCAL_DIR)/cred_dir

# Tuning so jobs start quickly
SCHEDD_INTERVAL=5
NEGOTIATOR_INTERVAL=2
NEGOTIATOR_CYCLE_DELAY=5
STARTER_UPDATE_INTERVAL=5
SHADOW_QUEUE_UPDATE_INTERVAL=10
UPDATE_INTERVAL=5
SCHEDD_INTERVAL=1
NEGOTIATOR_INTERVAL=1
NEGOTIATOR_CYCLE_DELAY=1
STARTER_UPDATE_INTERVAL=1
SHADOW_QUEUE_UPDATE_INTERVAL=1
UPDATE_INTERVAL=1
RUNBENCHMARKS=0

# Don't use all the machine resources
Expand Down
6 changes: 0 additions & 6 deletions docs/source/dependencies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,12 +141,6 @@ If you want to use your own Singularity image, just change the ``'SINGULARITY.IM
When using this delivery method, HTMap will discover ``python3`` on the system ``PATH`` and use that to run your code.


.. warning::

This delivery method relies on the directory ``/htmap/scratch`` either existing in the Singularity image, or Singularity being able to run with ``overlayfs``.
If you get a ``stderr`` message from Singularity about a bind mount directory not existing, that's the problem.


Assume Dependencies are Present
-------------------------------

Expand Down
26 changes: 26 additions & 0 deletions docs/source/versions/v0_3_2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
v0.3.2
======

New Features
------------

* Singularity delivery no longer requires a specially-named directory in the
container and/or overlays.

Bug Fixes
---------

* Hopefully finally resolved a recurring issue with checkpoint directories being
returned to the submit node after execution errors.
Issue: https://github.com/htcondor/htmap/issues/128

Known Issues
------------

* Execution errors that result in the job being terminated but no output being
produced are still not handled entirely gracefully. Right now, the component
state will just show as ``ERRORED``, but there won't be an actual error report.
* Map component state may become corrupted when a map is manually vacated.
Force-removal may be needed to clean up maps if HTCondor and HTMap disagree
about the state of their components.
Issue: https://github.com/htcondor/htmap/issues/129
2 changes: 1 addition & 1 deletion htmap/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

__version__ = '0.3.1'
__version__ = '0.3.2'

from typing import Tuple as _Tuple
import logging as _logging
Expand Down
7 changes: 4 additions & 3 deletions htmap/run/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,8 +179,9 @@ def load_checkpoint(scratch_dir, transfer_dir):
old_dir.rename(transfer_dir / curr_dir.name)


def clean_and_remake_dir(dir):
shutil.rmtree(dir, ignore_errors = True)
def clean_and_remake_dir(dir: Path):
if dir.exists():
shutil.rmtree(dir)
dir.mkdir()


Expand Down Expand Up @@ -217,7 +218,6 @@ def main(component):
result_or_error = func(*args, **kwargs)
status = 'OK'
print('\n----- MAP COMPONENT OUTPUT END -----\n')

except Exception as e:
print('\n------- MAP COMPONENT ERROR --------\n')

Expand All @@ -235,6 +235,7 @@ def main(component):
)
status = 'ERR'

clean_and_remake_dir(scratch_dir / CHECKPOINT_CURRENT)
clean_and_remake_dir(transfer_dir)
save_output(component, status, result_or_error, transfer_dir)

Expand Down
8 changes: 5 additions & 3 deletions htmap/run/run_with_singularity.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ set -e
img=$1
component=$2

# would otherwise default to user home dir
export SINGULARITY_CACHEDIR=${_CONDOR_SCRATCH_DIR}
# singularity cachedir would otherwise default to user home dir
d=${_CONDOR_SCRATCH_DIR}/.htmap_singularity
mkdir ${d}
export SINGULARITY_CACHEDIR=${d}

singularity exec --bind ${_CONDOR_SCRATCH_DIR}:/htmap/scratch ${img} bash -c "cd /htmap/scratch && python3 run.py ${component}"
singularity exec --contain --bind ${_CONDOR_SCRATCH_DIR}:/tmp --workdir /tmp ${img} bash -c "python3 run.py ${component}"
6 changes: 3 additions & 3 deletions tests/cli/test_rerun.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

def test_rerun_map(cli):
m = htmap.map(str, range(1))
m.wait()
m.wait(180)

result = cli(['rerun', 'map', m.tag])
m.wait(180)
Expand All @@ -30,7 +30,7 @@ def test_rerun_map(cli):

def test_rerun_components(cli):
m = htmap.map(str, [0, 1])
m.wait()
m.wait(180)

result = cli(['rerun', 'components', m.tag, '0 1'])
m.wait(180)
Expand All @@ -41,7 +41,7 @@ def test_rerun_components(cli):

def test_rerun_components_out_range_cannot_rerun(cli):
m = htmap.map(str, [0])
m.wait()
m.wait(180)

result = cli(['rerun', 'components', m.tag, '5'])

Expand Down
2 changes: 1 addition & 1 deletion tests/integration/test_held_components.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def test_waiting_on_held_component_raises(mapped_doubler):
time.sleep(1) # wait for it to propagate

with pytest.raises(htmap.exceptions.MapComponentHeld):
m.wait()
m.wait(timeout = 180)


def test_getting_held_component_raises(mapped_doubler):
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/test_usage_tracking.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def test_memory_usage_is_nonzero_after_map_complete():
# need it run for at least 5 seconds for it generate an image size event
m = htmap.map(lambda x: time.sleep(10), [None])

m.wait()
m.wait(timeout = 180)
print(m.memory_usage)

assert all(x > 0 for x in m.memory_usage)