Skip to content
This repository was archived by the owner on Jan 27, 2026. It is now read-only.

fix: change default task bridge socket file to be in home dir#605

Merged
noot merged 2 commits into
developfrom
noot/home-dir-socket
Jul 2, 2025
Merged

fix: change default task bridge socket file to be in home dir#605
noot merged 2 commits into
developfrom
noot/home-dir-socket

Conversation

@noot
Copy link
Copy Markdown
Contributor

@noot noot commented Jun 26, 2025

  • change the default task bridge socket file to be in the user's home directory
  • previously it was in /tmp, which is cleared on restart. the container persists on reboot and it tries to recreate the dir since it no longer exists but w root perms, which is not writable by the worker running as a user
  • it's now in the user prime-worker directory which is persistent

@noot noot requested a review from JannikSt June 27, 2025 01:25
Copy link
Copy Markdown
Member

@JannikSt JannikSt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Can you run this on a remote machine (e.g. a bigger single H100) for testing? You should be able to deploy the following task which actually generates data:

{
    "name": "Qwen3-4B:Math",
    "image": "primeintellect/prime-rl:commit-d769064",
    "env_vars": {
        "HF_HUB_CACHE": "/shared/hf_hub",
        "HF_HUB_DISABLE_PROGRESS_BARS": "0",
        "HF_HUB_ETAG_TIMEOUT": "500"
    },
    "cmd": [
		    "@configs/inference/synthetic-2/base.toml",
		    "@configs/inference/synthetic-2/qwen3-4b.toml",
		    "--data.name",
		    "PrimeIntellect/SYNTHETIC-2-Base-Math",
        "--parallel.pp.rank",
        "${GROUP_INDEX}",
        "--parallel.pp.world-size",
        "${GROUP_SIZE}",
        "--parallel.pp.iroh-seed",
        "${WORKER_P2P_SEED}",
        "--parallel.pp.iroh-peer-id",
        "${NEXT_P2P_ADDRESS}",
        "--group-id",
        "${GROUP_ID}",
        "--task-id",
        "${TASK_ID}",
        "--step-path",
        "/state/counter",
        "--log.level",
        "debug"
    ],
    "scheduling_config": {
        "plugins": {
            "node_groups": {
                "allowed_topologies": ["1x24GB", "1x40-48GB", "test-config"]
            }
        }
    },
    "storage_config": {
        "file_name_template": "Qwen/Qwen3-4B/PrimeIntellect/SYNTHETIC-2-Base-Math/1-${NODE_GROUP_ID}-${NODE_GROUP_SIZE}-${CURRENT_FILE_INDEX}-${NODE_GROUP_INDEX}.parquet"
    },
    "volume_mounts": [
        {
            "host_path": "/group-${GROUP_ID}-state",
            "container_path": "/state"
        }
    ]
}

You can also check the docker logs - it will mention that its logging data to the taskbridge after a while.

@noot noot merged commit b9e9b2b into develop Jul 2, 2025
1 check passed
@noot noot deleted the noot/home-dir-socket branch July 2, 2025 19:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants