Skip to content

Commit ac280bd

Browse files
authored
v0.4.0 (#150)
* Act and Edit are no-ops on inactive maps (#155) * resolves #145 * Improvements to the map.stderr/stdout API (#149) * Add htmap-exec Docker image and change default image to it (#153) * move test infrastructure into tests dir * add htmap-exec image * updates docs * Transferring Arbitrary Output Files (#151)
1 parent 1af6a5d commit ac280bd

34 files changed

+1001
-582
lines changed

.travis.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ matrix:
1515
fast_finish: true
1616

1717
install:
18-
- travis_retry docker build -t htmap-test --file docker/Dockerfile --build-arg HTCONDOR_VERSION --build-arg PYTHON_VERSION=$TRAVIS_PYTHON_VERSION .
18+
- docker build -t htmap-test --file tests/_inf/Dockerfile --build-arg HTCONDOR_VERSION --build-arg PYTHON_VERSION=$TRAVIS_PYTHON_VERSION .
1919

2020
script:
21-
- travis_retry docker run htmap-test tests/travis.sh
21+
- docker run htmap-test tests/_inf/travis.sh

binder/.htmaprc

+3
Original file line numberDiff line numberDiff line change
@@ -1 +1,4 @@
11
DELIVERY_METHOD = "assume"
2+
3+
[MAP_OPTIONS]
4+
REQUEST_DISK = "100MB"

docs/source/api.rst

+15
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,14 @@ See :ref:`error_handling` for more details on error handling.
7878
.. autoclass:: htmap.ComponentStatus
7979
:members:
8080

81+
.. autoclass:: htmap.MapStdOut
82+
:members: get
83+
84+
.. autoclass:: htmap.MapStdErr
85+
:members: get
86+
87+
.. autoclass:: htmap.MapOutputFiles
88+
:members: get
8189

8290
.. _error_handling:
8391

@@ -146,6 +154,13 @@ Input File Transfer
146154

147155
.. autoclass:: htmap.TransferPath
148156

157+
158+
Output File Transfer
159+
--------------------
160+
161+
.. autofunction:: htmap.transfer_output_files
162+
163+
149164
Checkpointing
150165
-------------
151166

docs/source/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
author = 'HTCondor Team'
2727

2828
# The short X.Y version
29-
version = htmap.__version__[:5]
29+
version = htmap.__version__
3030
# The full version, including alpha/beta/rc tags
3131
release = htmap.__version__
3232

docs/source/dependencies.rst

+8-7
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ it's all of the execute nodes in the pool that your might map components might b
1919
Submit-side dependency management can be handled using standard Python package management tools.
2020
We recommend using ``miniconda`` as your package manager (https://docs.conda.io/en/latest/miniconda.html).
2121

22-
HTMap itself requires that execute-side can run a Python script using a Python install that has the module ``cloudpickle`` installed.
22+
HTMap itself requires that execute-side can run a Python script using a Python install that also has ``htmap`` installed.
2323
That Python installation also needs whatever other packages your code needs to run.
2424
For example, if you ``import numpy`` in your code, you need to have ``numpy`` installed execute-side.
2525

@@ -33,7 +33,8 @@ The built-in delivery methods are
3333

3434
More details on each of these methods can be found below.
3535

36-
The default delivery method is ``docker``, with image ``continuumio/anaconda3:latest``.
36+
The default delivery method is ``docker``, with the default image ``htcondor/htmap-exec:<version>``,
37+
where version will match the version of HTMap you are using submit-side.
3738
If your pool can run Docker jobs and your Python code does not depend on any custom packages
3839
(i.e., you never import any modules that you wrote yourself),
3940
this default behavior will likely work for you without requiring any changes.
@@ -73,8 +74,8 @@ At runtime:
7374
htmap.settings['DOCKER.IMAGE'] = "<repository>/<image>:<tag>"
7475
7576
In this mode, HTMap will run inside a Docker image that you provide.
76-
Remember that this Docker image needs to have the ``cloudpickle`` module installed.
77-
The default Docker image is `continuumio/anaconda3:latest <https://hub.docker.com/r/continuumio/anaconda3/>`_,
77+
Remember that this Docker image needs to have the ``htmap`` module installed.
78+
The default Docker image is `htcondor/htmap-exec <https://hub.docker.com/r/htcondor/htmap-exec/>`_,
7879
which is based on Python 3 and has many useful packages pre-installed.
7980

8081
If you want to use your own Docker image, just change the ``'DOCKER.IMAGE'`` setting.
@@ -83,11 +84,11 @@ For example, a very simple Dockerfile that can be used with HTMap is
8384

8485
.. code-block:: docker
8586
86-
FROM python:latest
87+
FROM python:3
8788
88-
RUN pip install --no-cache-dir cloudpickle
89+
RUN pip install --no-cache-dir htmap
8990
90-
This would create a Docker image with the latest version of Python and ``cloudpickle`` installed.
91+
This would create a Docker image with the latest versions of Python 3 and ``htmap`` installed.
9192
From here you could install more Python dependencies, or add more layers to account for other dependencies.
9293

9394
.. attention::

docs/source/index.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -34,15 +34,15 @@ Happy mapping!
3434
:doc:`dependencies`
3535
Information about how to manage your what your code depends on (e.g., other Python packages).
3636

37+
:doc:`recipes`
38+
Deeper dives on specific tasks.
39+
3740
:doc:`api`
3841
Public API documentation.
3942

4043
:doc:`settings`
4144
Documentation for the various settings.
4245

43-
:doc:`recipes`
44-
Deeper dives on specific, common tasks.
45-
4646
:doc:`tips-and-tricks`
4747
Useful code snippets, tips, and tricks.
4848

docs/source/recipes.rst

+5-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,10 @@ Recipes
55
66
:doc:`recipes/docker-image-cookbook`
77
How to build HTMap-compatible Docker images.
8-
Yes, this recipe is an entire cookbook!
8+
Yes, this single recipe is an entire cookbook!
9+
10+
:doc:`recipes/output-files`
11+
How to move arbitrary files back to the submit node.
912

1013
:doc:`recipes/wrapping-external-programs`
1114
How to send input and output to an external (i.e., non-Python) program from inside a mapped function.
@@ -19,5 +22,6 @@ Recipes
1922
:hidden:
2023

2124
recipes/docker-image-cookbook
25+
recipes/output-files
2226
recipes/wrapping-external-programs
2327
recipes/checkpointing-maps

docs/source/recipes/checkpointing-maps.rst

-4
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,6 @@
33
Checkpointing Maps
44
------------------
55

6-
.. attention::
7-
8-
To use this feature, HTMap itself must be installed in your execute environment (not just ``cloudpickle``).
9-
106
When running on opportunistic resources, HTCondor might "evict" your map components from the execute locations.
117
Evicted components return to the queue and, without your intervention, restart from scratch.
128
However, HTMap can preserve files across an eviction and make them available in the next run.

docs/source/recipes/docker-image-cookbook.rst

+66-19
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,20 @@ are installed on the computers your code actually runs on.
1414

1515
To use Docker, you write a **Dockerfile** which tells Docker how to generate an **image**,
1616
which is a blueprint to construct a **container**.
17-
The Dockerfile is a list of instructions, such as shell commands or instructions for Docker to copy files from the build environment into the image.
17+
The Dockerfile is a list of instructions, such as shell commands or instructions
18+
for Docker to copy files from the build environment into the image.
1819
You then tell Docker to "build" the image from the Dockerfile.
1920

20-
For use with HTMap, you then upload this image to `Docker Hub <https://hub.docker.com>`_, where it can then be downloaded to execute nodes in an HTCondor pool.
21-
When your HTMap component lands on an execute node, HTCondor will download your image from Docker Hub and run your code inside it using HTMap.
21+
For use with HTMap, you then upload this image to `Docker Hub <https://hub.docker.com>`_,
22+
where it can then be downloaded to execute nodes in an HTCondor pool.
23+
When your HTMap component lands on an execute node, HTCondor will download your
24+
image from Docker Hub and run your code inside it using HTMap.
2225

23-
The following sections describe, roughly in order of increasing complexity, different ways to build Docker images for use with HTMap.
26+
The following sections describe, roughly in order of increasing complexity,
27+
different ways to build Docker images for use with HTMap.
2428
Each level of complexity is introduced to solve a more advanced dependency management problem.
25-
We recommend reading them in order until reach one that works for your dependencies (each section assumes knowledge of the previous sections).
29+
We recommend reading them in order until reach one that works for your dependencies
30+
(each section assumes knowledge of the previous sections).
2631

2732
More detailed information on how Dockerfiles work can be found
2833
`in the Docker documentation itself <https://docs.docker.com/engine/reference/builder/>`_
@@ -37,10 +42,11 @@ This page only covers the bare minimum to get started with HTMap and Docker.
3742
Can I use HTMap's default image?
3843
--------------------------------
3944

40-
HTMap's default Docker image is `continuumio/anaconda3:latest <https://hub.docker.com/r/continuumio/anaconda3/>`_.
45+
HTMap's default Docker image is `htcondor/htmap-exec <https://hub.docker.com/r/htcondor/htmap-exec/>`_,
46+
which is itself based on`continuumio/anaconda3 <https://hub.docker.com/r/continuumio/anaconda3/>`_.
4147
It is based on Python 3 and has many useful packages pre-installed, such as ``numpy``, ``scipy``, and ``pandas``.
42-
If your software only depends on packages included in the `Anaconda distribution <https://docs.anaconda.com/anaconda/packages/pkg-docs/>`_ by default,
43-
you can use HTMap's default and won't need to create your own image.
48+
If your software only depends on packages included in the `Anaconda distribution <https://docs.anaconda.com/anaconda/packages/pkg-docs/>`_,
49+
you can use HTMap's default image and won't need to create your own.
4450

4551

4652
I depend on Python packages that aren't in the Anaconda distribution
@@ -52,13 +58,14 @@ I depend on Python packages that aren't in the Anaconda distribution
5258
and `make an account on Docker Hub <https://hub.docker.com/>`_.
5359

5460

55-
Let's pretend that there's a package called ``foobar`` that your Python code depends on, but isn't part of the Anaconda distribution.
61+
Let's pretend that there's a package called ``foobar`` that your Python function depends on,
62+
but isn't part of the Anaconda distribution.
5663
You will need to write your own Dockerfile to include this package in your Docker image.
5764

5865
Docker images are built in **layers**.
5966
You always start a Dockerfile by stating which existing Docker image you'd like to use as your base layer.
6067
A good choice is the same Anaconda image that HTMap uses as the default,
61-
which comes with both the ``conda`` package manager and the standard ``pip``.
68+
which comes with both the ``conda`` package manager and the standard ``pip``.
6269
Create a file named ``Dockerfile`` and write this into it:
6370

6471
.. code-block:: docker
@@ -67,18 +74,41 @@ Create a file named ``Dockerfile`` and write this into it:
6774
6875
FROM continuumio/anaconda3:latest
6976
70-
Lines that begin with a ``#`` are comments in a Dockerfile.
77+
RUN pip install --no-cache-dir htmap
78+
79+
ARG USER=htmap
80+
RUN groupadd ${USER} \
81+
&& useradd -m -g ${USER} ${USER}
82+
USER ${USER}
7183
7284
Each line in the Dockerfile starts with a short, capitalized word which tells Docker what kind of build instruction it is.
73-
``FROM`` means "start with this base image".
74-
Now we need to tell Docker to run a shell command during the build to install ``foobar``.
85+
86+
* ``FROM`` means "start with this base image".
87+
* ``RUN`` means "execute these shell commands in the container".
88+
* ``ARG`` means "set build argument" - it acts like an environment variable that's only set during the image build.
89+
90+
Lines that begin with a ``#`` are comments in a Dockerfile.
91+
The above lines say that we want to inherit from the image ``continuumio/anaconda3:latest`` and build on top of it.
92+
To be compatible with HTMap, we install ``htmap`` via ``pip``.
93+
We also set up a non-root user to do the execution, which is important for security.
94+
Naming that user ``htmap`` is arbitrary and has nothing to do with the ``htmap`` package itself.
95+
96+
Now we need to tell Docker to run a shell command during the build to install ``foobar``
97+
by adding one more line to the bottom of the Dockerfile.
7598

7699
.. code-block:: docker
77100
78101
# Dockerfile
79102
80103
FROM continuumio/anaconda3:latest
81104
105+
RUN pip install --no-cache-dir htmap
106+
107+
ARG USER=htmap
108+
RUN groupadd ${USER} \
109+
&& useradd -m -g ${USER} ${USER}
110+
USER ${USER}
111+
82112
# if foobar can be install via conda, use these lines
83113
RUN conda install -y foobar \
84114
&& conda clean -y --all
@@ -101,6 +131,13 @@ If you need install many packages, we recommend writing a ``requirements.txt`` f
101131
102132
FROM continuumio/anaconda3:latest
103133
134+
RUN pip install --no-cache-dir htmap
135+
136+
ARG USER=htmap
137+
RUN groupadd ${USER} \
138+
&& useradd -m -g ${USER} ${USER}
139+
USER ${USER}
140+
104141
COPY requirements.txt requirements.txt
105142
RUN pip install --no-cache-dir -r requirements.txt
106143
@@ -153,10 +190,13 @@ Instead of using the full Anaconda distribution, use a base Docker image that on
153190
154191
FROM continuumio/miniconda3:latest
155192
156-
RUN conda install -y cloudpickle \
157-
&& conda clean -y -all
193+
RUN pip install --no-cache-dir htmap
194+
195+
ARG USER=htmap
196+
RUN groupadd ${USER} \
197+
&& useradd -m -g ${USER} ${USER}
198+
USER ${USER}
158199
159-
Note that we need to install ``cloudpickle``, which HTMap depends on execute-side, ourselves.
160200
From here, install your particular dependencies as above.
161201

162202
If you prefer to not use ``conda``, an even-barer-bones image could be produced from
@@ -167,8 +207,14 @@ If you prefer to not use ``conda``, an even-barer-bones image could be produced
167207
168208
FROM python:latest
169209
170-
RUN pip install --no-cache-dir cloudpickle
210+
RUN pip install --no-cache-dir htmap
211+
212+
ARG USER=htmap
213+
RUN groupadd ${USER} \
214+
&& useradd -m -g ${USER} ${USER}
215+
USER ${USER}
171216
217+
We use ``python:latest`` as our base image, so we don't have ``conda`` anymore.
172218

173219
I want to use a Python package that's not on PyPI or Anaconda
174220
-------------------------------------------------------------
@@ -225,8 +271,9 @@ We recommend adding ``miniconda`` to the image by adding these lines to your Doc
225271
&& conda install python=${PYTHON_VERSION} \
226272
&& conda clean -y -all
227273
228-
After this, you can install any other Python packages you need as in the preceeding sections.
274+
After this, you can install HTMap and any other Python packages you need as in the preceeding sections.
229275

230276
Note that in this example we based the image on Ubuntu's base image and installed ``wget``,
231277
which we used to download the ``miniconda`` installer.
232-
Depending on your base image, you may need to use a different package manager (for example, ``yum``) or different command-line file download tool (for example, ``curl``).
278+
Depending on your base image, you may need to use a different package manager
279+
(for example, ``yum``) or different command-line file download tool (for example, ``curl``).

docs/source/recipes/output-files.rst

+90
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
.. py:currentmodule:: htmap
2+
3+
Output Files
4+
------------
5+
6+
If the "output" of your map function is a file, HTMap's
7+
basic functionality will not be sufficient for you.
8+
As a toy example, consider a function which takes a string and a number, and
9+
writes out a file containing that string repeated that number of times, with
10+
a space between each repetition.
11+
The file itself will be the output artifact of our function.
12+
13+
.. code-block:: python
14+
15+
import htmap
16+
17+
import itertools
18+
from pathlib import Path
19+
20+
@htmap.mapped
21+
def repeat(string, number):
22+
output_path = Path('repeated.txt')
23+
24+
with output_path.open(mode = 'w') as f:
25+
f.write(' '.join(itertools.repeat(string, number)))
26+
27+
This would work great locally, producing a file named ``repeated.txt`` in
28+
the directory we ran the code from.
29+
If this same code runs execute-side, the file will still be produced, but
30+
HTMap won't know that we care about the file.
31+
In fact, the map will appear to be spectacularly useless:
32+
33+
.. code-block:: python
34+
35+
with repeat.build_map() as mb:
36+
mb('foo', 5)
37+
mb('wiz', 3)
38+
mb('bam', 2)
39+
40+
repeated = mb.map
41+
42+
print(list(repeated))
43+
# [None, None, None]
44+
45+
A function with no ``return`` statement implicitly returns ``None``.
46+
There's no sign of our output file.
47+
48+
We need to tell HTMap that we are producing an output file.
49+
We can do this by adding a call to an HTMap hook function in our mapped function:
50+
51+
.. code-block:: python
52+
53+
import htmap
54+
55+
import itertools
56+
from pathlib import Path
57+
58+
@htmap.mapped
59+
def repeat(string, number):
60+
output_path = Path('repeated.txt')
61+
62+
with output_path.open(mode = 'w') as f:
63+
f.write(' '.join(itertools.repeat(string, number)))
64+
65+
htmap.transfer_output_files(output_path) # identical, except for this line
66+
67+
The :func:`htmap.transfer_output_files` function tells HTMap to move the files
68+
at the given paths back for us.
69+
We can then access those files using the :attr:`Map.output_files` attribute,
70+
which behaves like a sequence indexed by component numbers.
71+
The elements of the sequence are :class:`pathlib.Path` pointing to the
72+
directories containing the output files from each component, like so:
73+
74+
.. code-block:: python
75+
76+
with repeat.build_map() as mb:
77+
mb('foo', 5)
78+
mb('wiz', 3)
79+
mb('bam', 2)
80+
81+
repeated = mb.map
82+
83+
for component, base in enumerate(repeated.output_files):
84+
path = base / 'repeated.txt'
85+
print(component, path.read_text())
86+
87+
# 0 foo foo foo foo foo
88+
# 1 wiz wiz wiz
89+
# 2 bam bam
90+

0 commit comments

Comments
 (0)