S3 maps generation and loading by berekuk · Pull Request #108 · Metta-AI/former_mettagrid

berekuk · 2025-04-20T01:56:38Z

Plus documentation - check out https://github.com/Metta-AI/mettagrid/blob/78921f7856b72d740dcf76322232c3fe6fbb08bc/docs/mapgen.md

berekuk · 2025-04-20T02:02:08Z

+    return ObjectTypeNames
+
+def get_object_type_ascii():
+    return ObjectTypeAscii


I use ascii symbols from Cython render_ascii code, so I needed these to decode the symbols back to names.

This is not 100% safe, btw - in ascii, agent types are lost. Would be good to clean up later.

we should clean this up. i used ascii in the env for fast debugging, and it was never meant to be used for anything else. instead of getting ascii from the env, can you add a "symbols" config somewhere, and use that instead?

berekuk · 2025-04-20T02:02:26Z

Moved to tools/, as requested by David in Asana

daveey · 2025-04-20T03:53:15Z

@@ -0,0 +1,11 @@
+defaults:


do we need to index, can we just read the dir content from s3?

It might be somewhat slow with 10k+ maps - s3 only allows to list 1k maps at a time, so would require pagination.

aws s3 ls s3://softmax-public/maps/ for me takes 1 second. might be because I'm far from the US or because of https cold start, but I didn't want to take a risk and waiting for 10 seconds for each map. Maybe I should measure it and pagination is well-optimized?

Even keeping the index on s3 is somewhat slow and could be 10x multiplier on data sent... Which is why I tried to keep that part agnostic.

I could add lazy indexing & local cache if you think it'd be better (but then I'd need to come up with conventions on where to cache, which is another layer of configuration, so I wanted to avoid that).

daveey · 2025-04-20T04:02:58Z

+
+
+@hydra.main(version_base=None, config_path="../configs", config_name="mapgen")
+def main(cfg):


can we use argparse here isntead of hydra? i think it's a better match for utilities. we can still use hydra.compose() to load the map config, but other than that, i think this adds unnecessary configs and complexity.

python -m tools.mapgen --output-uri=s3://softmax-public/maps --max-maps=5 --num-workers=4 --env=env/mettagrid/simple

it would loop while output-uri has fewer than max-maps, creating and dumping a map in there with a random name. it would use a multiprocessing pool to do this in parallel with --num-workers

i don't want to maintain an index file, because it makes paralell generation harder, and listing a directory is easy

daveey · 2025-04-20T04:04:58Z

+        ascii_map.save(target_uri)
+
+    # Show the map if requested
+    show_env(env, cfg.mapgen.show)


you can add --view flag that shows each map. by default output-uri can be /tmp/metta-maps and max-maps can be inf, so that when view is used, it dumps a new random map into /tmp/ and displays it

daveey · 2025-04-20T04:07:25Z

+    return ObjectTypeNames
+
+def get_object_type_ascii():
+    return ObjectTypeAscii


we should clean this up. i used ascii in the env for fast debugging, and it was never meant to be used for anything else. instead of getting ascii from the env, can you add a "symbols" config somewhere, and use that instead?

daveey · 2025-04-20T04:09:09Z

+        return len(self.lines)
+
+    @staticmethod
+    def from_env(env: MettaGridEnv, gen_time: float) -> "StorableMap":


this seems un-nessary. we should not need an env to get a map, since maps are needed to make an env.

daveey · 2025-04-20T04:12:50Z

+from mettagrid.map.scene import Scene, TypedChild
+
+
+class RemoveAgents(Scene):


i don't love this pattern. i don't think i understand the problem you're trying to solve. we don't have to define game.num_agents, but we do need to compute it from the map, and make sure it doesn't change during a training run. open to ideas on how to do that best.

I'm not happy about game.num_agents config duplication either (previously discussed here), but I think this handles a valid scenario even if we get rid of num_agents.

If we have the pre-generated maps, it's reasonable to want to run modifications of them without regenerating the map from scratch. Compare: in Starcraft, you can start the same for N players with M<N players.

Scenes already support drawing on top of existing maps, so this is not that much of a hack. (My initial take was to call this ForceAgentCount and support both increasing and decreasing the number of agents, but sharing code with Random scene was too annoying and I used two scene calls instead.)

In general, my policy for adding new scenes is to add things that might be useful, and we can remove the stuff that's not used later.

Okay, let's ship this for now. Wanting to update pre-existing maps makes sense. Depending on how much that happens we may want a generic set of "update map" utilities, but this seems like a good tool to figure out what we'll want.

daveey · 2025-04-20T04:13:15Z

@@ -0,0 +1,14 @@
+from mettagrid.map.node import Node


do we need this if we don't removeAgents?

Technically we don't need Nop even with RemoveAgents, it's here just for a bit more clear config.

Load random uses this:

extra_root: _target_: mettagrid.map.scenes.nop.Nop children: - where: full scene: _target_: mettagrid.map.scenes.remove_agents.RemoveAgents - where: full scene: _target_: mettagrid.map.scenes.random.Random agents: 40

Instead, it could be:

extra_root: _target_: mettagrid.map.scenes.remove_agents.RemoveAgents children: - where: full scene: _target_: mettagrid.map.scenes.random.Random agents: 40

These are equivalent, I think the former reads a bit better, but not too stuck up on it.

(I'm not sure extra_root is the best option here, maybe Load and LoadRandom generators should accept the plain list of scenes, or maybe even MapGen generator should take the list of scenes instead of a single root scene. where: full is kind of ugly.)

daveey · 2025-04-20T04:13:59Z

+        python -m tools.index_s3_maps index_s3_maps.dir=s3://...
+    """
+
+    def __init__(self, index_uri: str, extra_root: SceneCfg | None = None):


just support list-dir

berekuk · 2025-04-21T22:22:13Z

rewrote mapgen.py and index_s3_maps.py with argparse
changed LoadRandom to list s3 directly; moved previous version with index file to LoadRandomFromIndex in case we need it later
changed StorableMap so that it works from numpy grid instead of env's render_ascii (not all symbols are supported; I'll upgrade the format for storing maps soon, want to avoid pickle, I guess the easiest option is CSV? I don't love that it's less readable though)
remove a few places where Hydra was used; scenes in mapgen configs are now addressed by the path (unfortunately this means that scene paths are absolute from the mettagrid root... will continue to think how to improve, ideally we want to track DictConfig source somehow, wrap OmegaConf.load so that it returns our patched version of DictConfig that's file source-aware?)

sasmith · 2025-04-22T08:53:19Z

            else:
                # Type check and handling
-                if isinstance(where, dict) and "tags" in where:
+                if (isinstance(where, dict) or isinstance(where, DictConfig)) and "tags" in where:


(a) This smells funny, and makes me want to move in the direction of having a "where" object. I don't think we need to do that yet.
(b) isinstance's second argument can be a tuple of classes. So this could be if isinstance(where, (dict, DictConfig)): ...

sasmith · 2025-04-22T09:08:58Z

+        s3 = get_s3_client()
+        s3.put_object(Bucket=bucket, Key=key, Body=text)
+    else:
+        with open(uri, "w") as f:


Unless I'm somehow really confused, this won't open uri as a uri, but as a file. For example, you should be passing path/my_file.txt rather than file://path/my_file.txt.

My preference would be for us to be using uris, vs mixing s3 uris and filenames, so I think this should be changed to expect and strip the file:// prefix.

I'm treating uri here in a loose sense, "the parameter that can be one of these syntax versions".

I tried to cover some subset of other config options in metta that are called "uri" - policy_uri and eval_db_uri. AFAICT both of these support plain file names.

I've changed to code to strip file:// but I'd prefer to keep the support for filenames. I like running mapgen with python -m tools.mapgen --output-dir=., and there'd be no auto-completion with file:// version.

Btw, there's a whole rabbit hole with "how relative paths are treated in file URIs" - file://path/to/file is supposed to be the absolute /path/to/file, and none of uri functions in metta repo do this... My code does things in the same incorrect way, for now (blindly strips file://).

Maybe we'll be able to extract these common functions under mettautils after we get monorepo?

I've changed to code to strip file:// but I'd prefer to keep the support for filenames. I like running mapgen with python -m tools.mapgen --output-dir=., and there'd be no auto-completion with file:// version.

Okay, this sounds good. I still feel squeamish about the mixing of bonafide uris and not-quite-uris, but don't need to be dogmatic.

Maybe we'll be able to extract these common functions under mettautils after we get monorepo?

Agreed.

sasmith · 2025-04-22T18:37:41Z

+Preview a random map:
+
+```bash
+python -m tools.mapgen ./configs/game/map_builder/load_random.yaml --overrides='dir=s3://BUCKET/DIR'


This is confusing, since nothing in this command would make me expect I'm going to view a map.

For now, let's separate mapgen commands from view commands, since it's easier to reason about.

sasmith · 2025-04-22T18:43:13Z

+
+logger = logging.getLogger(__name__)
+
+ascii_symbols = {


Okay for this diff, but let's fast-follow with consolidating these into mettagrid.yaml. E.g.,

objects: altar: hp: 30 input_battery: 3 output_heart: 1 max_output: 5 conversion_ticks: 1 cooldown: ${sampling:1, 20, 10} initial_items: 1 ascii_symbol: a ...

Several quick thoughts:

ascii is not a good format to store maps anyway (see all the exceptions in grid_object_to_ascii function in this file; no support for colored mines and agents)

DCSS maps have metadata on symbols in maps, SUBST: 1=agent.prey, 2=agent.predator, that might be interesting to try

or just dump the full CSV with long names? sacrifice readability, but more robust

generally, I really want to unify all our serialization formats and/or have tools for converting between them, I know we have at least four by now (this ascii, another completely different ascii in mettagrid.config.room.ascii, full names, int type ids...) - this is something I'm keeping in mind as high priority

Oh, yeah. We're with you on not using ascii in the long run, and were considering asking you to rip it out -- but it's still being used by folks.

sasmith · 2025-04-22T18:45:12Z

+ShowMode = Literal["raylib", "ascii", "ascii_border"]
+
+
+def show_map(storable_map: StorableMap, mode: ShowMode | None):


As per the comment on the readme, can we split this into a separate entry point (mapview.py / mapshow / whatever). Maybe with a subdir, so we get

python -m tools.map.gen and python -m tools.map.view.

Done; tools.map.gen can still do everything it could before, because "gen-and-view" is convenient. tools.map.view supports the same heuristics for dir vs file and reuses "load random" code for dirs.

sasmith · 2025-04-22T19:02:46Z

+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--output-dir", type=str, help="Output directory, e.g. ./maps or s3://.../dir")


Can we figure out how to reasonably collapse --output-dir and --output-uri? Sorry for the back and forth, and for at least some level of inconsistency on my part. Dave and I chatted, and where we've landed is

no output-dir, just output-uri.

output-uri is optional. If not provided, then assert count is 1 and dump to stdout.

If output-uri has a scheme, it should be s3 or file. If it's not provided, assume it's file.

If output-uri ends with a file suffix (let's say . and <= 4 characters? Seems like .yaml matters the most, and .txt / .map / similar also seem reasonable), then assume it's the full target, assert count is 1, and write the output there.

Otherwise, assume output-uri is a directory (/ prefix)
--output-uri can be

I feel like these heuristics are going to backfire for someone eventually, but ok :) Changed according to your spec. (I don't have any better ideas...)

Note: I realize that map.gen.view could be a bit more clever and test "is this a dir?" empirically instead of relying on heuristics. I did it with heuristics for now (same uri_is_file function in both scripts), because all combinations of file/dir and local/s3 would be annoying to handle; on s3 I think it's possible that name is both a dir and a file, because dirs don't really exist. And also because I expect that if heuristics will misfire then tools.map.gen will break too.

sasmith · 2025-04-22T19:09:54Z

+from mettagrid.map.scene import Scene, TypedChild
+
+
+class RemoveAgents(Scene):


Okay, let's ship this for now. Wanting to update pre-existing maps makes sense. Depending on how much that happens we may want a generic set of "update map" utilities, but this seems like a good tool to figure out what we'll want.

…rid/pull/108#discussion_r2054699842

berekuk · 2025-04-22T22:24:54Z

Addressed everything + added CLI tests.

sasmith

Approving and merging, and I'll also bring this to the metta repo.

We've discussed and are ready to merge.

berekuk requested a review from sasmith April 20, 2025 01:57

berekuk commented Apr 20, 2025

View reviewed changes

Comment thread tests/mapgen.py

Copy link
Copy Markdown

Contributor Author

berekuk Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to tools/, as requested by David in Asana

berekuk added 6 commits April 20, 2025 00:41

tests.mapgen can store AsciiMap files to s3

86473e0

mettagrid.map.from_uri, mettagrid.map.from_s3_dir

b30704a

move mapgen script to tools/

b27cbca

nop and remove_agents scenes

5f1ac6d

rename builders, add documentation

ce7a530

doc update

446eb2d

daveey assigned sasmith Apr 20, 2025

berekuk force-pushed the s3-mapgen branch from 78921f7 to 446eb2d Compare April 20, 2025 03:47

daveey previously requested changes Apr 20, 2025

View reviewed changes

berekuk added 5 commits April 21, 2025 13:18

simplify hydra; load_random doesn't need index; refactorings

86a1fc4

index_s3_maps argparse

70307e9

rewrite mapgen without hydra; generate maps without env

c793fd3

revert constants.pyx

758e427

Merge branch 'main' into s3-mapgen

e664f74

berekuk requested a review from daveey April 21, 2025 22:17

sasmith reviewed Apr 22, 2025

View reviewed changes

sasmith suggested changes Apr 22, 2025

View reviewed changes

berekuk added 6 commits April 22, 2025 10:04

simplify type checking for "where"

95aec62

comment about boto3

ceb5a01

extract mettagrid.map.utils.storage code; handle file:// uris

448a0b3

remove old script configs

98440fe

better --help

3b3c6d0

update docs

484765f

berekuk requested a review from sasmith April 22, 2025 13:50

sasmith suggested changes Apr 22, 2025

View reviewed changes

change mapgen options according to https://github.com/Metta-AI/mettag…

b486454

…rid/pull/108#discussion_r2054699842

berekuk added 2 commits April 22, 2025 19:16

tools.map.view script; tests; update docs

3cb907d

normalize uri

8b8baa4

berekuk requested a review from sasmith April 22, 2025 22:24

sasmith approved these changes Apr 23, 2025

View reviewed changes

sasmith merged commit 851931e into main Apr 23, 2025
1 check passed

berekuk mentioned this pull request Apr 23, 2025

Terrain generation #79

Closed



		@hydra.main(version_base=None, config_path="../configs", config_name="mapgen")
		def main(cfg):

		from mettagrid.map.scene import Scene, TypedChild


		class RemoveAgents(Scene):

		ShowMode = Literal["raylib", "ascii", "ascii_border"]


		def show_map(storable_map: StorableMap, mode: ShowMode \| None):


		logger = logging.getLogger(__name__)

		ascii_symbols = {

Conversation

berekuk commented Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

berekuk Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

berekuk commented Apr 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

berekuk Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

berekuk commented Apr 22, 2025

Uh oh!

sasmith left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

berekuk commented Apr 20, 2025 •

edited

Loading

berekuk Apr 20, 2025 •

edited

Loading

berekuk Apr 22, 2025 •

edited

Loading