Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove --jobinfo, standalone config.yml file #90

Merged
merged 2 commits into from
Apr 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 8 additions & 65 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,76 +26,19 @@ pip install git+https://github.com/GreenScheduler/cats

## Documentation

Full documentation is available at [greenscheduler.github.io/cats/](https://greenscheduler.github.io/cats/). The below sections
demonstrate some capability, for illustration, but please consult
the documentation for more details.
Documentation is available at [greenscheduler.github.io/cats/](https://greenscheduler.github.io/cats/).

#### Basic example
We recommend the
[quickstart](https://greenscheduler.github.io/cats/quickstart.html#basic-usage)
if you are new to CATS. CATS can optionally [display carbon footprint
savings](https://greenscheduler.github.io/cats/quickstart.html#displaying-carbon-footprint-estimates)
using a [configuration file](cats/config.yml).

You can run `cats` with:

```bash
cats -d <job_duration> --loc <postcode>
```

The postcode is optional, and can be pulled from the `config.yml` file or, if that is not present, inferred using the server IP address. Job duration is in minutes, specified as an integer.

The scheduler then calls a function that estimates the best time to start the job given predicted carbon intensity over the next 48 hours. The workflow is the same as for other popular schedulers. Switching to `cats` should be transparent to cluster users.

By default, the optimal time to start the job is shown in a human readable format. This information can be output in a machine readable format by passing `--format=json`. The date format in the machine readable output can be controlled using `--dateformat` which accepts a [strftime(3)](https://manpages.debian.org/stable/manpages-dev/strftime.3.en.html) format date.


#### Use with schedulers

You can use CATS with, for example, the ``at`` job scheduler by running:

```bash
cats -d 5 --loc OX1 --scheduler at --command 'ls'
```
This schedules a command (`ls`) that has an expected runtime less than 5 minutes using the at scheduler.

#### Console demonstration
### Console demonstration
CATS predicting optimal start time for the `ls` command in the `OX1` postcode:

![CATS animated usage example](cats.gif)

#### Displaying carbon footprint estimates

`cats` is able to provide an estimate for the carbon footprint
reduction resulting from delaying your job. To enable the footprint
estimation, you must provide information about the machine in the form
of a YAML configuration file. An example is given below:

```yaml
location: "EH8"
api: "carbonintensity.org.uk"
PUE: 1.20 # > 1
partitions:
CPU_partition:
type: CPU # CPU or GPU
model: "Xeon Gold 6142"
TDP: 9.4 # Thermal Design Power in W/core
GPU_partition:
type: GPU
model: "NVIDIA A100-SXM-80GB GPUs"
TDP: 300
CPU_model: "AMD EPYC 7763"
TDP_CPU: 4.4
```

Use the `--config` option to specify a path to the configuration
file. If no path is specified, `cats` looks for a file named
`config.yml` in the current directory.

Additionally, to obtain carbon footprints, job-specific information
must be provided to `cats` through the `--jobinfo` option. The
example below demonstrates running `cats` with footprint estimation
for a job using 8GB of memory, 2 CPU cores and no GPU:

```bash
cats -d 120 --config .config/config.yml \
--jobinfo cpus=2,gpus=0,memory=8,partition=CPU_partition
```

## Contributing

We welcome contributions from the community! If you find a bug or have an idea for a new feature, please open an issue on our GitHub repository or submit a pull request.
Expand Down
51 changes: 37 additions & 14 deletions cats/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
import logging
import subprocess
import sys
from argparse import ArgumentParser
from argparse import ArgumentParser, RawDescriptionHelpFormatter
from datetime import timedelta
from pathlib import Path
from typing import Optional

from .carbonFootprint import Estimates, get_footprint_reduction_estimate
Expand All @@ -19,6 +20,10 @@
SCHEDULER_DATE_FORMAT = {"at": "%Y%m%d%H%M"}


def indent_lines(lines, spaces):
return "\n".join(" " * spaces + line for line in lines.split("\n"))


def parse_arguments():
"""
Parse command line arguments
Expand All @@ -35,9 +40,10 @@ def parse_arguments():
(gCO2/kWh) of running the calculation now with the carbon intensity at that
time in the future. To undertake this calculation, cats needs to know the
predicted duration of the calculation (which you must supply, see `-d`) and
your location (which can be inferred from your IP address (but see `-l`). If
additional information about the power consumption of your computer is
available (see `--jobinfo`) the predicted CO2 usage will be reported.
your location, either inferred from your IP address, or passed using `-l`.
If additional information about the power consumption of your computer is
available and passed to CATS via the `--config` option, the predicted CO2
usage will be reported.

To make use of this information, you will need to couple cats with a task
scheduler of some kind. The command to schedule is specified with the `-c`
Expand All @@ -48,24 +54,41 @@ def parse_arguments():
cats -d 1 --loc RG1 --scheduler=at --command='ls'
"""

example_text = """
Examples\n
********\n
config_text = indent_lines(
Path(__file__).with_name("config.yml").read_text(), spaces=8
)
example_text = f"""
Examples
********

Cats can be used to report information on the best time to run a calculation and the amount
of CO2. Information about a 90 minute calculation in centeral Oxford can be found by running:
CATS can be used to report information on the best time to run a calculation
and the amount of CO2. Information about a 90 minute calculation in centeral
Oxford can be found by running:

cats -d 90 --loc OX1 --jobinfo="cpus=2,gpus=0,memory=8,partition=CPU_partition"
abhidg marked this conversation as resolved.
Show resolved Hide resolved
cats -d 90 --loc OX1

The `at` scheduler is available from the command line on most Linux and MacOS computers,
and can be the easest way to use cats to minimise the carbon intensity of calculations on
smaller computers. For example, the above calculation can be scheduled by running:
The `at` scheduler is available from the command line on most Linux and
MacOS computers, and can be the easest way to use cats to minimise the
carbon intensity of calculations on smaller computers. For example, the
above calculation can be scheduled by running:

cats -d 90 --loc OX1 -s at -c 'mycommand'

To report carbon footprint, pass the `--config` option to select a
configuration file and the `--profile` option to select a profile. An
example config file is given below:

{config_text}
abhidg marked this conversation as resolved.
Show resolved Hide resolved

The configuration file is documented in the Quickstart section of the online
documentation.
"""

parser = ArgumentParser(
prog="cats", description=description_text, epilog=example_text
prog="cats",
description=description_text,
epilog=example_text,
formatter_class=RawDescriptionHelpFormatter,
)

def positive_integer(string):
Expand Down
15 changes: 15 additions & 0 deletions cats/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
profiles:
my_cpu_only_profile:
cpu:
model: "Xeon Gold 6142"
power: 9.4 # in W, per core
nunits: 2
my_gpu_profile:
gpu:
model: "NVIDIA A100-SXM-80GB GPUs"
power: 300
nunits: 2
cpu:
model: "AMD EPYC 7763"
power: 4.4
nunits: 1
19 changes: 2 additions & 17 deletions docs/source/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,26 +79,11 @@ file <configuration-file>`.
You can define an arbitraty number of profiles as subsection of the
top-level ``profiles`` section:

.. code-block:: yaml
.. literalinclude :: ../../cats/config.yml
:language: yaml
:caption: *An example provision of machine information by YAML file
to enable estimation of the carbon footprint reduction.*

profiles:
my_cpu_only_profile:
cpu:
model: "Xeon Gold 6142"
power: 9.4 # in W, per core
nunits: 2
my_gpu_profile:
gpu:
model: "NVIDIA A100-SXM-80GB GPUs"
power: 300
nunits: 2
cpu:
model: "AMD EPYC 7763"
power: 4.4
nunits: 1

The name of the profile section is arbitrary, but each profile section
*must* contain one ``cpu`` section, or one ``gpu`` section, or both.
Each hardware type (``cpu`` or ``gpu``) section *must* contain the
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

[tool.setuptools]
packages = ["cats"]
package-data.cats = ["config.yml"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think config.yml would end up in lib/python3.X/site-packages/cats ? Might not be straightforward to use it from there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use relative file paths from __filename__


[project]
name = "climate-aware-task-scheduler"
Expand Down
Loading