Proposed functional decomposition plan. #408

1larity · 2026-02-10T00:59:31Z

1larity
Feb 10, 2026

@ChuxiJ, I grabbed some post-its and came up with a code structure hierarchy for decomposing the whole project.

For new players who might like to know what this is and why it is required:
Functional decomposition is needed because the codebase has grown into a few very large files that mix unrelated responsibilities (UI wiring, inference logic, API handling, model loading, etc.), which makes changes risky and hard to review.

By splitting these into small, focused modules, contributors can understand and modify one part without accidentally breaking others, tests become more targeted, and merge conflicts are easier to resolve in a fast-moving project. In practice, this lowers defect risk, speeds onboarding, and gives us a clean foundation for future features like external LLM providers, multi-LoRA composition, and stem workflows without turning core files into bottlenecks.

I'll just have to accept that the project moves rapidly, so a code freeze is impractical, so I have designed a safe, rolling code strategy, that I have put at the end.
I've also tried to future proof it, so there's some stub proposals for interesting feature requests and ideas I've seen discussed. This is what i propose to implement:-

Directory and acestep/
├─ api/ │ ├─ http/ │ │ ├─ app.py │ │ ├─ lifespan.py │ │ ├─ auth.py │ │ ├─ routes_generation.py │ │ ├─ routes_models.py │ │ ├─ routes_health.py │ │ ├─ schemas_requests.py │ │ └─ schemas_responses.py │ └─ jobs/ │ ├─ store.py │ ├─ executor.py │ └─ models.py │
├─ core/ │ ├─ generation/ │ │ ├─ handler/ │ │ │ ├─ │ │ │ ├─ init_service.py │ │ │ ├─ lora_manager.py │ │ │ ├─ lora_stack.py │ │ │ ├─ stem_pipeline.py │ │ │ ├─ diffusion.py │ │ │ ├─ decode.py │ │ │ ├─ io_audio.py │ │ │ └─ progress.py │ │ ├─ inference/ │ │ │ ├─ params.py
│ │ │ ├─ config.py
│ │ │ ├─ generate.py
│ │ │ ├─ sample.py
│ │ │ └─ format_text.py
│ │ └─ logits/ │ │ ├─ processor.py
│ │ ├─ masks.py
│ │ └─ rules.py
│ │
│ ├─ llm/ │ │ ├─ contracts/ │ │ │ ├─ provider.py │ │ │ └─ planner.py │ │ ├─ providers/ │ │ │ ├─ acestep_local.py │ │ │ ├─ ollama.py │ │ │ ├─ openai.py │ │ │ └─ anthropic.py │ │ └─ runtime/ │ │ ├─ router.py │ │ ├─ fallback.py │ │ └─ prompts.py │ │
│ ├─ audio/ │ │ ├─ normalize.py
│ │ ├─ resample.py
│ │ ├─ loudness.py
│ │ ├─ codec.py
│ │ └─ stems/ │ │ ├─ splitter.py │ │ ├─ merger.py │ │ └─ metadata.py │ │
│ ├─ scoring/ │ │ └─ alignment/
│ │ ├─ aligner.py
│ │ ├─ lyric_scorer.py
│ │ └─ metrics.py
│ │
│ └─ system/ │ ├─ gpu/
│ │ ├─ detect.py
│ │ ├─ config.py
│ │ └─ policy.py
│ ├─ models/
│ │ ├─ downloader.py
│ │ └─ resolver.py
│ ├─ cache/
│ │ └─ local.py
│ ├─ constants.py
│ └─ debug.py
│
├─ dataset/ │ ├─ runtime/ │ │ └─ dataset_handler.py
│ └─ builder/ │ ├─ scan.py
│ ├─ preprocess.py
│ ├─ labels.py
│ ├─ metadata.py
│ ├─ serialization.py
│ └─ models.py
│
├─ training/ │ ├─ trainer/
│ │ ├─ trainer.py
│ │ ├─ loop.py
│ │ ├─ checkpointing.py
│ │ └─ metrics.py
│ ├─ data/
│ │ ├─ module.py
│ │ └─ collate.py
│ └─ lora/
│ ├─ config.py
│ ├─ apply.py
│ ├─ save_load.py
│ └─ compose.py │
└─ ui/ └─ gradio/
├─ events/ │ ├─ init.py │ ├─ generation_wiring.py
│ ├─ sample_wiring.py
│ ├─ metadata_wiring.py
│ ├─ results_wiring.py
│ └─ training_wiring.py
├─ interfaces/ │ ├─ generation_layout.py
│ ├─ training_layout.py
│ ├─ result_layout.py
│ └─ dataset_layout.py
├─ i18n/ │ ├─ service.py
│ └─ loaders.py
└─ api/ ├─ routes.py
└─ dto.py file structure:
# External service boundary (HTTP / WebSocket / auth / jobs)
# FastAPI app composition, routes, schemas, middleware
# Build FastAPI app and register routers
# Startup/shutdown resource management
# API key / JWT verification
# Generation endpoints
# Model discovery endpoints
# Health / readiness endpoints
# Request DTOs
# Response DTOs
# Async task execution and job state
# Job persistence interface / in-memory impl
# Worker orchestration and retries
# Job record / status models
# Domain logic, no UI or transport coupling
# Music generation orchestration
# Decomposed replacement for monolithic handler.py
init.py # Facade preserving public API
# Model / service init flows
# LoRA load / unload / activate logic
# Stub: multi-LoRA merge / weights ordering
# Stub: stem split / join orchestration
# Diffusion inference loop
# Latent / audio decode pipeline
# Audio input normalization / parsing
# Progress estimation and telemetry
# Param validation + high-level generation functions
# Constraint decoding internals
# Text planning / inference abstraction layer
# Provider-agnostic interfaces
# TextInferenceProvider protocol / base class
# Stub: planning interface for long prompts / workflows
# Concrete provider adapters
# Current built-in local LLM adapter
# Stub: Ollama adapter
# Stub: OpenAI adapter
# Stub: Claude / Anthropic adapter
# Routing, fallback, rate-limit, provider selection
# Chooses provider by policy / config
# Retry / fallback chain logic
# Prompt templates and rendering
# Reusable DSP / audio utility components
# Stub area for stem functionality
# Stub: source separation interface
# Stub: stem recomposition rules
# Stub: stem labels / types / schema
# Alignment / evaluation metrics
# Infra helpers used across domains
# Dataset lifecycle (runtime + training prep)
# Dataset ops used by app / UI at runtime
# Offline dataset build / preprocess pipeline
# Fine-tuning and training orchestration
# Stub: multi-LoRA composition strategies
# Presentation layer only
# UI event binding and callback wiring
# Facade only; imports wiring modules
# UI layout / components construction
# Localization loading and lookup
# UI-specific API bridge (if needed)

Split plan:
Per-file Assessment (all first-party code files)
Split now (currently>200 LOC):

Decomposition / Refactor Plan

Monolith → Structured Domains

handler.py
└─ acestep/core/generation/handler/ # Major monolith split by responsibility
├─ init.py # Public facade
├─ init_service.py # Model / service initialization
├─ diffusion.py # Diffusion inference loop
├─ decode.py # Latent → audio decode
├─ io_audio.py # Audio input parsing / normalization
├─ progress.py # Progress + telemetry
├─ lora_manager.py # LoRA load / unload / activate
├─ lora_stack.py # Stub: multi-LoRA composition
└─ stem_pipeline.py # Stub: stem split / join orchestration

llm_inference.py
└─ acestep/core/llm/runtime/
├─ router.py # Provider selection / routing
├─ fallback.py # Retry + fallback logic
└─ prompts.py # Prompt templates / rendering

api_server.py
└─ acestep/api/
├─ http/
│ ├─ app.py # FastAPI app wiring
│ ├─ auth.py # Auth / API key handling
│ ├─ routes_generation.py # Generation endpoints
│ ├─ routes_models.py # Model discovery
│ └─ routes_health.py # Health / readiness
└─ jobs/
├─ executor.py # Async job orchestration
├─ store.py # Job persistence
└─ models.py # Job state models

constrained_logits_processor.py
└─ acestep/core/generation/logits/
├─ processor.py # Logits processor
├─ masks.py # Token masks
└─ rules.py # Constraint rules

results_handlers.py
└─ ui/gradio/events/
├─ results_wiring.py # Event wiring
└─ helpers.py # Shared helpers

cli.py
└─ acestep/app/cli/
├─ init.py # Facade
├─ generate.py # Generation commands
├─ train.py # Training commands
└─ models.py # Model management commands

inference.py
└─ acestep/core/generation/inference/
├─ params.py # Parameter schemas / validation
├─ config.py # Inference config
├─ generate.py # High-level generation entrypoint
└─ sample.py # Sampling utilities

profile_inference.py
└─ acestep/app/profiling/
├─ init.py
├─ profiler.py # Profiling harness
└─ report.py # Timing / memory reports

init.py
└─ keep as facade only; move logic into wiring / runtime modules

openrouter_api_server.py
└─ options:
├─ openrouter/api/ # If kept as separate boundary
└─ or merge into:
acestep/api/http/

generation_handlers.py
└─ ui/gradio/events/
├─ generation_wiring.py # Generation flow wiring
└─ helpers.py

dit_alignment_score.py
└─ acestep/core/scoring/alignment/
├─ aligner.py
├─ lyric_scorer.py
└─ metrics.py

training_handlers.py
└─ ui/gradio/events/
├─ training_wiring.py
└─ helpers.py

generation.py
└─ ui/gradio/interfaces/
├─ generation_layout.py # Main layout
└─ sections/ # Logical UI sections
├─ prompt.py
├─ params.py
└─ output.py

openrouter_adapter.py
└─ openrouter/
├─ openrouter_client.py # HTTP client
├─ requests.py # Request builders
└─ responses.py # Response parsing

stress_test.py
└─ tests/stress/
├─ scenarios/
│ ├─ long_prompt.py
│ └─ batch_load.py
└─ runner.py

trainer.py
└─ training/trainer/
├─ trainer.py
├─ loop.py
├─ checkpointing.py
└─ metrics.py

model_downloader.py
└─ acestep/core/system/models/
├─ downloader.py
└─ resolver.py

training.py
└─ ui/gradio/interfaces/
├─ training_layout.py
└─ sections/
├─ dataset.py
├─ lora.py
└─ run.py

client_test.py
└─ tests/client/
├─ test_generation.py
└─ helpers.py

gpu_config.py
└─ acestep/core/system/gpu/
├─ detect.py
├─ config.py
└─ policy.py

result.py
└─ ui/gradio/interfaces/
├─ result_layout.py
└─ sections/
├─ summary.py
├─ audio.py
└─ metadata.py

api_routes.py
└─ ui/gradio/api/
├─ routes.py
└─ dto.py

data_module.py
└─ training/data/
├─ module.py
└─ collate.py

lora_utils.py
└─ training/lora/
├─ config.py
├─ apply.py
├─ save_load.py
└─ compose.py

acestep_v15_pipeline.py
└─ acestep/core/pipeline/acestep_v15/
├─ pipeline.py
├─ stages.py
└─ config.py

test_time_scaling.py
└─ tests/perf/
├─ scaling.py
└─ benchmarks.py

audio_utils.py
└─ acestep/core/audio/
├─ normalize.py
├─ resample.py
├─ loudness.py
└─ codec.py

check_gpu.py
└─ acestep/core/system/gpu/
├─ detect.py
├─ report.py
└─ bench.py

openrouter_models.py
└─ openrouter/models/
├─ catalog.py
├─ schema.py
└─ filters.py

preprocess.py
└─ data/builder/
├─ scan.py
├─ preprocess.py
├─ labels.py
├─ metadata.py
└─ serialization.py

Relocate (mostly keep, 80–200 LOC)

constants.py → core/system/constants.py
generate_examples.py → generate_examples.py
i18n.py → ui/gradio/i18n/service.py
local_cache.py → core/system/cache/local.py
label_single.py → data/builder/labels.py
prepare_vae_calibration_data.py
→ prepare_calibration.py
debug_utils.py → core/system/debug.py
models.py → models.py
configs.py → configs.py
dataset.py → ui/gradio/interfaces/dataset_layout.py
init.py → keep facade
scan.py → scan.py
dataset_handler.py → data/runtime/dataset_handler.py

Keep as Small Leaf Modules (<80 LOC)

init.py (facades)
dataset_builder.py
audio_io.py
builder.py
core.py
csv_metadata.py
dataframe.py
label_all.py
label_utils.py
metadata.py
preprocess_audio.py
preprocess_context.py
preprocess_encoder.py
preprocess_lyrics.py
preprocess_manifest.py
preprocess_text.py
preprocess_utils.py
preprocess_vae.py
serialization.py
update_sample.py

Vendored Code

acestep/third_parts/nano-vllm/*
→ No decomposition unless explicitly forked

Stub Contracts for Proposed Features

External Text Providers
├─ provider.py
│ ├─ generate_text(request) → response
│ ├─ plan(request) → plan_response
│ └─ health() → provider_status

Multi-LoRA
├─ lora_stack.py
│ ├─ register_lora(name, path)
│ ├─ set_active_loras([{name, weight}])
│ └─ compose_strategy(strategy_name) # sum / sequential / gated

Stem Split / Join
├─ splitter.py
│ └─ split(audio) → {vocals, drums, bass, other, ...}
└─ merger.py
└─ merge(stems, gains, pan) → audio

Migration plan:
Due to the rapidly moving nature of the code-base, a code freeze is improctical, as would be usual for such a refactor, I will have to do a rolling code strategy.

Continuous Migration Model (no freeze)
Run decomposition as a rolling effort on top of upstream. Before each segment, rebase to latest upstream; after rebase, run a fixed quick parity suite. Keep each segment scoped to one subsystem and short-lived (target 1-3 days) to minimize conflict cost.

Define Safety Gates Once
Establish a stable “must-pass” gate used every segment: Gradio startup, API /health, one representative generation path, import smoke, and touched-subsystem tests. Add LOC policy checks (warn >150, fail >200) for first-party modules.

Create Target Package Skeleton
Add the target folder hierarchy (api/core/ui/data/training) and init.py intent comments. This is structure-only, zero behavior change.

Add Compatibility Facades
For modules that will move, keep old import paths alive via thin re-exports. This lets upstream continue merging while decomposition proceeds.

Migrate UI Wiring First (low risk)
Split init.py into dedicated wiring files, keep init.py as facade. Then split large interface layout files similarly. Validate all event bindings still work.

Migrate API Monolith
Decompose api_server.py into api/http/* and api/jobs/* while preserving existing entrypoint behavior. Validate endpoint parity (/health, /v1/models, /release_task, /query_result, /v1/audio).

Decompose Inference and Logits Core
Split inference.py and constrained_logits_processor.py into cohesive modules (params/config/generate, processor/rules/masks). Use deterministic seed-based parity checks.

Decompose handler.py and llm_inference.py Incrementally
Break into responsibility-focused modules behind stable facades. Move first, then split internals. Avoid behavior changes in same PR as structural moves.

Add Future-Proof Stubs (off by default)
Introduce provider contracts and stub adapters (ollama/openai/anthropic), multi-LoRA composition scaffolding, and stem split/merge interfaces. Guard all with feature flags so runtime behavior stays unchanged.

Cutover and De-shim Gradually
Switch internal imports to new paths subsystem by subsystem. Keep shims through a deprecation window, then remove after downstream usage is updated. Keep rebasing-before-segment and the same safety gates until migration completes.

Segment execution template (repeat each PR):

Rebase on upstream.
Implement one scoped structural change.
Run fixed parity + targeted tests.
Merge quickly.
Start next segment from fresh rebase.

Revisions:
"Data" directory proposal renamed to "dataset" to prevent clash with .gitignored user data directory.

Progress update after merged PR #431 against decomposition plan #408:

Detailed variance + revised structure/breakdown plan

Compared against:

Where this landed:

Completed the first substantial handler.py decomposition slice, focused on LoRA.
Added stable facade entrypoint under acestep/core/generation/handler/__init__.py.
Extracted LoRA responsibilities into focused handler modules plus a reusable core LoRA service domain.
Added progress.py in handler path as part of decomposition groundwork.
Added tests for service + integration paths to protect behavior while refactoring.

What is still pending from the original sequence:

UI event/layout wiring decomposition.
API monolith decomposition.
inference/logits decomposition.
llm runtime/provider decomposition.

Variance from original plan (intentional and required):

Introduced acestep/core/lora/* earlier than planned as a first-class reusable runtime domain.
Changed handler LoRA split from a single-node assumption (lora_manager.py) to a two-layer model:
- Handler adapters: acestep/core/generation/handler/lora/*
- Reusable domain service: acestep/core/lora/*
Prioritized regression stabilization during decomposition instead of strict move-only sequencing.
Brought progress module extraction forward in sequence.
Kept training/lora/* scoped for training-specific concerns, while generation/runtime LoRA logic now resides in core/lora/*.

Revised plan doc:

docs/functional-decomposition-plan-408-r1.md

FD Update #456 / #464 ): Init-Service decomposition progress

Completed today

1) Core decomposition slice delivered

Extracted init/offload-adjacent handler logic from acestep/handler.py into:
- acestep/core/generation/handler/init_service.py
Kept AceStepHandler API surface stable via mixin inheritance:
- AceStepHandler(InitServiceMixin, LoraManagerMixin, ProgressMixin)
Updated exports:
- acestep/core/generation/handler/__init__.py

2) Review-driven fixes for pre-existing issues applied

Corrected get_available_checkpoints docstring to match actual behavior/return.
Added XPU support to _empty_cache.
Fixed parameter safety in move paths:
- ensured moved params remain torch.nn.Parameter (no silent deregistration).
Fixed quantized fallback path to also return torch.nn.Parameter.
Hardened _is_on_target_device fallback for malformed device strings (warn + conservative False).
Normalized _device_type() to backend tokens (cuda/mps/xpu/cpu) for routing helpers.
Applied minor style follow-up (indentation consistency).

3) Test coverage expanded

Added focused tests in:
- acestep/core/generation/handler/init_service_test.py
Coverage includes:
- device normalization/alias handling
- CUDA/MPS/XPU cache and synchronize routing
- malformed fallback behavior
- parameter type preservation in recursive moves
- quantized fallback preserving Parameter

4) Validation status

Unit tests passed (focused + related small suite).
Manual UI regression checks passed for impacted flows.

Branch / PR handling notes

Rebased decomposition branch onto latest upstream/main.
Confirmed effective diff scope is narrow (init-service FD slice only).
Created separate branch/PR path to avoid unintentionally updating earlier PR thread.

Known follow-up (explicitly deferred)

External indexed-device handling (cuda:0 / cuda:1) is still not centralized.
Proposed follow-up:
- introduce a helper returning both backend token and full indexed device ref,
- use backend token only for capability routing,
- use full device ref for placement/external refs,
- add multi-GPU indexed tests.

Net result

This work package is a clean FD increment: behavior-preserving extraction plus targeted hardening and tests, with one explicit indexing follow-up tracked for a later slice.

ChuxiJ · 2026-02-10T03:01:47Z

ChuxiJ
Feb 10, 2026
Maintainer

Don't rush into refactoring yet. We first need to deliver a stable version that ensures device compatibility. We can discuss the refactoring plan afterward. Making too many changes at once will easily introduce new bugs and break usability. Once the stable version is out, you can develop the refactored version on the dev branch.

0 replies

1larity · 2026-02-10T09:05:55Z

1larity
Feb 10, 2026
Author

I'll be firm here, I think your attempts at stability will be hampered considerably by the current structure.

My practical experience with fixing the pre-process and train issues on low VRAM required far too many review and fix loops, simply due to the fact that handling.py is awful to work with, as it is almost impossible to read, even for an experienced eye as mine.
The regression bug I caused can be directly linked to this as the code agent 'got hooked' on the output file debug pattern used correctly in other functions in the same file (and possible mis recognition of variable scope), and decided that was correct for the breaking function. If those functions were in separate modules this probably would not have happened.
Regressions have been an obvious issue in PRs, you will be chasing your tail fixing stuff that was already working, and gets incidentally broken by context degradation in agents due to having such big files

I do have a day job and family, this will take at least a couple of weeks probably!

0 replies

ChuxiJ · 2026-02-10T09:10:54Z

ChuxiJ
Feb 10, 2026
Maintainer

#417

You’re right, refactoring is necessary. We can start your refactoring plan once this PR has been fully tested and verified—we can begin with the handler first.

0 replies

ChuxiJ · 2026-02-10T09:12:52Z

ChuxiJ
Feb 10, 2026
Maintainer

I understand this takes time, and I’ve been thinking about how to optimize these parts every day. Thank you for your thoughtful suggestions and contributions.

0 replies

sigalarm · 2026-02-11T00:43:42Z

sigalarm
Feb 11, 2026

Let me know if you want to do this as a one person job, or if I would be able to help.

0 replies

1larity · 2026-02-11T10:49:24Z

1larity
Feb 11, 2026
Author

@sigalarm If you want, this should be simple and low risk:

Migrate UI Wiring First (low risk)
Split init.py into dedicated wiring files, keep init.py as facade. Then split large interface layout files similarly. Validate all event bindings still work.

I wanted to start with the wiring, but @ChuxiJ wanted handling.py done, and I somewhat agree, getting this sorted would be a big win, it's a major blocker for the project at the moment, and needs a lot of tedious care and attention, so it's where I have actually started.

0 replies

jayvenn21 · 2026-02-13T16:25:57Z

jayvenn21
Feb 13, 2026

@1larity since the refactoring would break a lot of the components in the entire repo, I think we should go ahead and open up issues for each part. I'm not sure if @ChuxiJ wants us to go ahead and do this ourselves or have these framed issues. once again, if there are any tasks regarding refactoring that are open, I'd be more than happy to tackle them.

0 replies

1larity · 2026-02-13T17:23:37Z

1larity
Feb 13, 2026
Author

@jayvenn21 That would be helpful it is a lot of work. If I could be bold, I would base the issues broken down using the "Monolith → Structured Domains" section of the plan, as the file structure at the start is quite mutable depending on the actual implementation of the monolithic files. This will contain regressions somewhat as well.

Adding AGENTS MD has added some good steering for agents, reducing the knowledge threshold required from contributors considerably.

0 replies

Proposed functional decomposition plan. #408

Uh oh!

Uh oh!

1larity Feb 10, 2026

Decomposition / Refactor Plan

Monolith → Structured Domains

Relocate (mostly keep, 80–200 LOC)

Keep as Small Leaf Modules (<80 LOC)

Vendored Code

Stub Contracts for Proposed Features

Completed today

1) Core decomposition slice delivered

2) Review-driven fixes for pre-existing issues applied

3) Test coverage expanded

4) Validation status

Branch / PR handling notes

Known follow-up (explicitly deferred)

Net result

Replies: 8 comments

Uh oh!

ChuxiJ Feb 10, 2026 Maintainer

Uh oh!

1larity Feb 10, 2026 Author

Uh oh!

ChuxiJ Feb 10, 2026 Maintainer

Uh oh!

ChuxiJ Feb 10, 2026 Maintainer

Uh oh!

sigalarm Feb 11, 2026

Uh oh!

1larity Feb 11, 2026 Author

Uh oh!

jayvenn21 Feb 13, 2026

Uh oh!

1larity Feb 13, 2026 Author

1larity
Feb 10, 2026

ChuxiJ
Feb 10, 2026
Maintainer

1larity
Feb 10, 2026
Author

ChuxiJ
Feb 10, 2026
Maintainer

ChuxiJ
Feb 10, 2026
Maintainer

sigalarm
Feb 11, 2026

1larity
Feb 11, 2026
Author

jayvenn21
Feb 13, 2026

1larity
Feb 13, 2026
Author