This guide explains how one DeepScientist turn is actually driven.
Use it when you want to understand:
- how the runtime builds the prompt for each turn
- what each stage skill is for
- how the built-in MCP tools are structured
- which file or tool you should change when behavior feels wrong
If you only want the user-facing product overview, read 13 Core Architecture Guide first.
If you only want the built-in memory contract, read 07 Memory and MCP after this page.
DeepScientist does not run from one static mega-prompt.
For every turn, it rebuilds a prompt from:
- the core system prompt
- a shared interaction contract
- runtime state
- quest files
- startup contract
- selected memory
- connector-specific rules when needed
- the active skill structure
Then the agent works through three built-in MCP namespaces only:
memoryartifactbash_exec
The most important files are:
src/prompts/system.mdsrc/prompts/contracts/shared_interaction.mdsrc/prompts/connectors/qq.mdsrc/prompts/connectors/weixin.mdsrc/prompts/connectors/lingzhu.mdsrc/deepscientist/prompts/builder.pysrc/skills/*/SKILL.mdsrc/deepscientist/mcp/server.py
In practice:
system.mddefines the global operating stanceshared_interaction.mddefines user-visible continuity rulesconnectors/*.mdinject connector-specific behavior only when that connector is active or boundbuilder.pydecides prompt assembly order and runtime context sectionsSKILL.mdfiles define stage-specific execution disciplinemcp/server.pydefines the built-in tool surface
The current runtime assembles the turn prompt in roughly this order:
system.mdcontracts/shared_interaction.md- runtime context block
- active communication surface block
- optional connector contract block
- turn driver and continuation guard
- active user requirements
- quest context
- recent durable state
- research delivery policy
- paper and evidence snapshot
- retry recovery packet when this is a retry turn
- interaction style block
- priority memory for this turn
- recent conversation window
- current turn attachments
- current user message
That order matters.
The runtime is trying to answer three questions before the model acts:
- what is the quest trying to do now
- what durable state already exists
- what behavior rules apply on this surface and in this stage
This is the global DeepScientist operating contract.
It defines things like:
- long-horizon evidence-first behavior
- use
bash_execfor shell-like execution - use durable files, logs, and artifacts as truth
- do not end a quest early
- treat web, TUI, and connectors as one quest
- user-facing reporting style
- baseline confirmation discipline, including preserving the richer metric surface instead of keeping only one headline scalar when the source baseline exposes multiple comparable metrics or variants
If the agent starts sounding wrong everywhere, system.md is one of the first places to inspect.
This file defines the common continuity spine around artifact.interact(...).
It tells the agent:
artifact.interact(...)is the main user-visible thread- queued inbound user messages must be acknowledged and handled first
- blocking replies are for real decisions only
- progress updates should be concise and human-readable
- the real user-facing interaction message should stay complete; the runtime may derive a shorter preview separately, so the agent should not manually truncate the actual connector answer with
.../…
If the model is bad at staying in the same long-running thread, this file matters a lot.
The prompt builder adds a surface block each turn.
This tells the model:
- whether the current turn is local, QQ, Weixin, or another connector
- how many external connectors are bound
- which surface is active right now
- how much detail is appropriate on that surface
This is why connector behavior should not be hard-coded globally.
The same quest may be viewed from the web UI, TUI, or a connector, but the reply shape should adapt.
Connector prompt fragments are loaded only when needed.
Current connector prompt files are:
src/prompts/connectors/qq.mdsrc/prompts/connectors/weixin.mdsrc/prompts/connectors/lingzhu.md
These files define surface-specific rules such as:
- reply length
- text-first versus media-enabled behavior
- how attachments should be sent
- what not to expose in chat
For example:
- QQ is treated as a milestone operator surface
- Weixin is treated as a concise phone-side operator surface with
context_tokencontinuity - Lingzhu is treated as an even shorter, more constrained surface
If one connector needs behavior changes, change its connector prompt first before bloating the global system prompt.
The builder injects runtime facts such as:
quest_idquest_root- current workspace branch
- active idea id
- active analysis campaign id
- bound conversations
- startup contract
- baseline gate
- active interactions
- recent artifacts
- recent runs
This is what makes the prompt quest-aware instead of generic.
The builder reads these quest files directly into the prompt:
brief.mdplan.mdstatus.mdSUMMARY.md
This is important:
- the live prompt is not only based on chat history
- durable quest docs are treated as first-class truth surfaces
This block converts startup choices into concrete execution rules.
It includes logic around:
- whether paper delivery is required
- launch mode
- custom profile
- baseline routing
- idea routing
- paper branch behavior
- review gate behavior
If Start Research behavior feels wrong, you usually need to inspect:
- the
startup_contract src/deepscientist/prompts/builder.py- the stage skill the quest is currently using
This block tells the model how to speak on this turn.
It includes:
- locale bias
- blocking versus threaded behavior
- long-run update cadence
- how to acknowledge mailbox messages
- how to compress progress into human-readable updates
This is why DeepScientist can keep the same runtime but behave differently across:
- long experiments
- connector replies
- writing stages
- waiting-for-decision stages
DeepScientist does not inject memory randomly.
PromptBuilder uses a stage-specific memory plan.
Examples:
scoutpreferspapers,knowledge,decisionsbaselinepreferspapers,decisions,episodes,knowledgeideapreferspapers,ideas,decisions,knowledgeexperimentprefersideas,decisions,episodes,knowledge
This means the prompt is stage-biased on purpose.
The agent should not see the same memory bundle on every turn.
Prompt fragments can be overridden per quest under:
.codex/prompts/system.md.codex/prompts/contracts/shared_interaction.md.codex/prompts/connectors/<connector>.md
This means:
- repo defaults live under
src/prompts/ - quest-local prompt overrides live under
.codex/prompts/
Use quest-local prompt overrides only when the quest truly needs a different local contract.
Do not fork the repo-level prompt unless the change should affect the product globally.
DeepScientist currently has two skill layers:
- standard stage skills
- companion skills
These are the main research anchors:
| Skill | Use when | Main job | Usually hands off to |
|---|---|---|---|
scout |
the task frame is still unclear | task framing, baseline discovery, metric and dataset clarification | baseline or idea |
baseline |
a trustworthy baseline does not yet exist | attach, import, reproduce, repair, and verify the baseline | idea |
idea |
the baseline is clear but the next direction is not | generate, compare, and select durable research directions | experiment |
experiment |
one selected idea is ready to run | implement and evaluate the main run on one durable line | analysis-campaign, write, or decision |
analysis-campaign |
follow-up experiments are needed | run slices, ablations, robustness checks, or reviewer-facing supplements | write, decision, or finalize |
write |
there is enough evidence to draft | turn accepted evidence into outline, draft, and paper bundle work | review or finalize |
finalize |
the quest is near closure | consolidate claims, summaries, final state, and closure checks | quest completion approval |
decision |
a durable route choice is needed | make a clear go/stop/branch/reuse decision from evidence | another anchor |
These are auxiliary entry or quality-control skills:
| Skill | Use when | Main job |
|---|---|---|
figure-polish |
a figure is important beyond debug use | render-inspect-revise a milestone or paper figure |
intake-audit |
the quest already has meaningful prior state | trust-rank old assets and choose the correct next anchor |
review |
a substantial draft already exists | run a skeptical paper-like audit before claiming done |
rebuttal |
reviewer comments or revision requests exist | map reviewer pressure into experiments, text deltas, and response artifacts |
The daemon is not supposed to contain a giant hard-coded research scheduler.
Instead:
- the prompt defines the operating contract
- the skill defines stage-specific discipline
- the runtime persists state and routes turns
That is the core DeepScientist design choice.
These are the durable outputs you should expect:
| Skill | Typical durable outputs |
|---|---|
scout |
updated brief.md, updated plan.md, literature notes, framing memory |
baseline |
PLAN.md, CHECKLIST.md, baseline verification notes, confirmed or waived baseline state |
idea |
durable idea draft, selected idea package, rationale for why this route won |
experiment |
implementation changes, run logs, record_main_experiment(...), result evidence |
analysis-campaign |
campaign manifest, slice records, synthesis notes |
write |
selected outline, writing plan, draft, references, claim-evidence map, paper bundle |
finalize |
final summary, closure state, final quest health check |
decision |
durable route decision, next-anchor recommendation |
intake-audit |
trusted-versus-untrusted asset map, next anchor recommendation |
review |
review report, revision log, experiment TODO list |
rebuttal |
review matrix, response letter, text deltas, evidence-update plan |
figure-polish |
final polished figure assets and render-checked outputs |
DeepScientist keeps the built-in MCP surface intentionally small.
Only these namespaces are built in:
memoryartifactbash_exec
There is no separate public built-in git namespace.
Git-aware behavior is exposed through artifact.
Purpose:
- reusable knowledge
- lessons that should survive beyond one turn
- quest-local or global memory cards
Current built-in tools:
memory.write(...)memory.read(...)memory.search(...)memory.list_recent(...)memory.promote_to_global(...)
Use memory when the output should be remembered and reused later.
Do not use it for transient progress chatter.
Purpose:
- quest control plane
- durable research state
- Git-aware branch and worktree routing
- experiment and paper records
- user-visible interaction continuity
The artifact namespace is large, but it is still one family.
artifact.record(...)artifact.refresh_summary(...)artifact.render_git_graph(...)
artifact.checkpoint(...)artifact.prepare_branch(...)artifact.activate_branch(...)artifact.submit_idea(...)artifact.list_research_branches(...)artifact.resolve_runtime_refs(...)
artifact.publish_baseline(...)artifact.attach_baseline(...)artifact.confirm_baseline(...)artifact.waive_baseline(...)
artifact.record_main_experiment(...)artifact.create_analysis_campaign(...)artifact.get_analysis_campaign(...)artifact.record_analysis_slice(...)
artifact.submit_paper_outline(...)artifact.list_paper_outlines(...)artifact.submit_paper_bundle(...)
artifact.arxiv(...)artifact.interact(...)artifact.complete_quest(...)
The most important artifact tool for long-running collaboration is:
artifact.interact(...)
Because it keeps together:
- user-visible updates
- mailbox polling
- connector delivery
- threaded continuity
- attachment delivery
Purpose:
- durable shell execution
- monitored long runs
- durable logs
- stoppable and readable sessions
Current built-in tool:
bash_exec.bash_exec(...)
This one tool supports multiple modes:
detachawaitreadkilllisthistory
The design rule is simple:
- anything shell-like should go through
bash_exec - do not hide important execution in transient shell snippets
Use this mental model:
memory: rememberartifact: decide and recordbash_exec: run and monitor
Examples:
- a reusable lesson from a failed run ->
memory.write(...) - confirming a baseline ->
artifact.confirm_baseline(...) - launching training ->
bash_exec.bash_exec(mode='detach', ...) - notifying the user about the next checkpoint ->
artifact.interact(...)
If you mix these roles badly, the quest becomes harder to resume and audit.
A typical turn looks like this:
- a user or connector message arrives
- the daemon restores quest snapshot and history
PromptBuilderassembles the current turn prompt- the active skill defines the stage discipline
- priority memory is injected
- the agent uses
memory,artifact, andbash_exec - outputs are persisted into files, artifacts, memory cards, logs, and Git state
artifact.interact(...)keeps the user-facing thread continuous
That is why DeepScientist feels more like a persistent workshop than a stateless chat.
Use this quick rule:
- change
src/prompts/system.mdwhen the global operating stance is wrong - change
src/prompts/contracts/shared_interaction.mdwhen continuity behavior is wrong - change
src/prompts/connectors/*.mdwhen one connector behaves wrong - change
src/skills/<skill>/SKILL.mdwhen one stage behaves wrong - change
src/deepscientist/prompts/builder.pywhen prompt assembly or runtime context selection is wrong - change
src/deepscientist/mcp/server.pywhen the built-in tool surface itself is wrong
Do not use a giant prompt patch to fix a real MCP contract bug.
Do not use a new MCP tool to fix a stage-discipline problem that belongs in a skill.