Replies: 8 comments
-
|
Don't rush into refactoring yet. We first need to deliver a stable version that ensures device compatibility. We can discuss the refactoring plan afterward. Making too many changes at once will easily introduce new bugs and break usability. Once the stable version is out, you can develop the refactored version on the dev branch. |
Beta Was this translation helpful? Give feedback.
-
|
I'll be firm here, I think your attempts at stability will be hampered considerably by the current structure. My practical experience with fixing the pre-process and train issues on low VRAM required far too many review and fix loops, simply due to the fact that handling.py is awful to work with, as it is almost impossible to read, even for an experienced eye as mine. I do have a day job and family, this will take at least a couple of weeks probably! |
Beta Was this translation helpful? Give feedback.
-
|
You’re right, refactoring is necessary. We can start your refactoring plan once this PR has been fully tested and verified—we can begin with the handler first. |
Beta Was this translation helpful? Give feedback.
-
|
I understand this takes time, and I’ve been thinking about how to optimize these parts every day. Thank you for your thoughtful suggestions and contributions. |
Beta Was this translation helpful? Give feedback.
-
|
Let me know if you want to do this as a one person job, or if I would be able to help. |
Beta Was this translation helpful? Give feedback.
-
|
@sigalarm If you want, this should be simple and low risk: Migrate UI Wiring First (low risk) I wanted to start with the wiring, but @ChuxiJ wanted handling.py done, and I somewhat agree, getting this sorted would be a big win, it's a major blocker for the project at the moment, and needs a lot of tedious care and attention, so it's where I have actually started. |
Beta Was this translation helpful? Give feedback.
-
|
@1larity since the refactoring would break a lot of the components in the entire repo, I think we should go ahead and open up issues for each part. I'm not sure if @ChuxiJ wants us to go ahead and do this ourselves or have these framed issues. once again, if there are any tasks regarding refactoring that are open, I'd be more than happy to tackle them. |
Beta Was this translation helpful? Give feedback.
-
|
@jayvenn21 That would be helpful it is a lot of work. If I could be bold, I would base the issues broken down using the "Monolith → Structured Domains" section of the plan, as the file structure at the start is quite mutable depending on the actual implementation of the monolithic files. This will contain regressions somewhat as well. Adding AGENTS MD has added some good steering for agents, reducing the knowledge threshold required from contributors considerably. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
@ChuxiJ, I grabbed some post-its and came up with a code structure hierarchy for decomposing the whole project.
For new players who might like to know what this is and why it is required:
Functional decomposition is needed because the codebase has grown into a few very large files that mix unrelated responsibilities (UI wiring, inference logic, API handling, model loading, etc.), which makes changes risky and hard to review.
By splitting these into small, focused modules, contributors can understand and modify one part without accidentally breaking others, tests become more targeted, and merge conflicts are easier to resolve in a fast-moving project. In practice, this lowers defect risk, speeds onboarding, and gives us a clean foundation for future features like external LLM providers, multi-LoRA composition, and stem workflows without turning core files into bottlenecks.
I'll just have to accept that the project moves rapidly, so a code freeze is impractical, so I have designed a safe, rolling code strategy, that I have put at the end.
I've also tried to future proof it, so there's some stub proposals for interesting feature requests and ideas I've seen discussed. This is what i propose to implement:-
Directory and file structure:
acestep/
├─ api/ # External service boundary (HTTP / WebSocket / auth / jobs)
│ ├─ http/ # FastAPI app composition, routes, schemas, middleware
│ │ ├─ app.py # Build FastAPI app and register routers
│ │ ├─ lifespan.py # Startup/shutdown resource management
│ │ ├─ auth.py # API key / JWT verification
│ │ ├─ routes_generation.py # Generation endpoints
│ │ ├─ routes_models.py # Model discovery endpoints
│ │ ├─ routes_health.py # Health / readiness endpoints
│ │ ├─ schemas_requests.py # Request DTOs
│ │ └─ schemas_responses.py # Response DTOs
│ └─ jobs/ # Async task execution and job state
│ ├─ store.py # Job persistence interface / in-memory impl
│ ├─ executor.py # Worker orchestration and retries
│ └─ models.py # Job record / status models
│
├─ core/ # Domain logic, no UI or transport coupling
│ ├─ generation/ # Music generation orchestration
│ │ ├─ handler/ # Decomposed replacement for monolithic handler.py
│ │ │ ├─ init.py # Facade preserving public API
│ │ │ ├─ init_service.py # Model / service init flows
│ │ │ ├─ lora_manager.py # LoRA load / unload / activate logic
│ │ │ ├─ lora_stack.py # Stub: multi-LoRA merge / weights ordering
│ │ │ ├─ stem_pipeline.py # Stub: stem split / join orchestration
│ │ │ ├─ diffusion.py # Diffusion inference loop
│ │ │ ├─ decode.py # Latent / audio decode pipeline
│ │ │ ├─ io_audio.py # Audio input normalization / parsing
│ │ │ └─ progress.py # Progress estimation and telemetry
│ │ ├─ inference/ # Param validation + high-level generation functions
│ │ │ ├─ params.py
│ │ │ ├─ config.py
│ │ │ ├─ generate.py
│ │ │ ├─ sample.py
│ │ │ └─ format_text.py
│ │ └─ logits/ # Constraint decoding internals
│ │ ├─ processor.py
│ │ ├─ masks.py
│ │ └─ rules.py
│ │
│ ├─ llm/ # Text planning / inference abstraction layer
│ │ ├─ contracts/ # Provider-agnostic interfaces
│ │ │ ├─ provider.py # TextInferenceProvider protocol / base class
│ │ │ └─ planner.py # Stub: planning interface for long prompts / workflows
│ │ ├─ providers/ # Concrete provider adapters
│ │ │ ├─ acestep_local.py # Current built-in local LLM adapter
│ │ │ ├─ ollama.py # Stub: Ollama adapter
│ │ │ ├─ openai.py # Stub: OpenAI adapter
│ │ │ └─ anthropic.py # Stub: Claude / Anthropic adapter
│ │ └─ runtime/ # Routing, fallback, rate-limit, provider selection
│ │ ├─ router.py # Chooses provider by policy / config
│ │ ├─ fallback.py # Retry / fallback chain logic
│ │ └─ prompts.py # Prompt templates and rendering
│ │
│ ├─ audio/ # Reusable DSP / audio utility components
│ │ ├─ normalize.py
│ │ ├─ resample.py
│ │ ├─ loudness.py
│ │ ├─ codec.py
│ │ └─ stems/ # Stub area for stem functionality
│ │ ├─ splitter.py # Stub: source separation interface
│ │ ├─ merger.py # Stub: stem recomposition rules
│ │ └─ metadata.py # Stub: stem labels / types / schema
│ │
│ ├─ scoring/ # Alignment / evaluation metrics
│ │ └─ alignment/
│ │ ├─ aligner.py
│ │ ├─ lyric_scorer.py
│ │ └─ metrics.py
│ │
│ └─ system/ # Infra helpers used across domains
│ ├─ gpu/
│ │ ├─ detect.py
│ │ ├─ config.py
│ │ └─ policy.py
│ ├─ models/
│ │ ├─ downloader.py
│ │ └─ resolver.py
│ ├─ cache/
│ │ └─ local.py
│ ├─ constants.py
│ └─ debug.py
│
├─ dataset/ # Dataset lifecycle (runtime + training prep)
│ ├─ runtime/ # Dataset ops used by app / UI at runtime
│ │ └─ dataset_handler.py
│ └─ builder/ # Offline dataset build / preprocess pipeline
│ ├─ scan.py
│ ├─ preprocess.py
│ ├─ labels.py
│ ├─ metadata.py
│ ├─ serialization.py
│ └─ models.py
│
├─ training/ # Fine-tuning and training orchestration
│ ├─ trainer/
│ │ ├─ trainer.py
│ │ ├─ loop.py
│ │ ├─ checkpointing.py
│ │ └─ metrics.py
│ ├─ data/
│ │ ├─ module.py
│ │ └─ collate.py
│ └─ lora/
│ ├─ config.py
│ ├─ apply.py
│ ├─ save_load.py
│ └─ compose.py # Stub: multi-LoRA composition strategies
│
└─ ui/ # Presentation layer only
└─ gradio/
├─ events/ # UI event binding and callback wiring
│ ├─ init.py # Facade only; imports wiring modules
│ ├─ generation_wiring.py
│ ├─ sample_wiring.py
│ ├─ metadata_wiring.py
│ ├─ results_wiring.py
│ └─ training_wiring.py
├─ interfaces/ # UI layout / components construction
│ ├─ generation_layout.py
│ ├─ training_layout.py
│ ├─ result_layout.py
│ └─ dataset_layout.py
├─ i18n/ # Localization loading and lookup
│ ├─ service.py
│ └─ loaders.py
└─ api/ # UI-specific API bridge (if needed)
├─ routes.py
└─ dto.py
Split plan:
Per-file Assessment (all first-party code files)
Split now (currently>200 LOC):
Decomposition / Refactor Plan
Monolith → Structured Domains
handler.py
└─ acestep/core/generation/handler/ # Major monolith split by responsibility
├─ init.py # Public facade
├─ init_service.py # Model / service initialization
├─ diffusion.py # Diffusion inference loop
├─ decode.py # Latent → audio decode
├─ io_audio.py # Audio input parsing / normalization
├─ progress.py # Progress + telemetry
├─ lora_manager.py # LoRA load / unload / activate
├─ lora_stack.py # Stub: multi-LoRA composition
└─ stem_pipeline.py # Stub: stem split / join orchestration
llm_inference.py
└─ acestep/core/llm/runtime/
├─ router.py # Provider selection / routing
├─ fallback.py # Retry + fallback logic
└─ prompts.py # Prompt templates / rendering
api_server.py
└─ acestep/api/
├─ http/
│ ├─ app.py # FastAPI app wiring
│ ├─ auth.py # Auth / API key handling
│ ├─ routes_generation.py # Generation endpoints
│ ├─ routes_models.py # Model discovery
│ └─ routes_health.py # Health / readiness
└─ jobs/
├─ executor.py # Async job orchestration
├─ store.py # Job persistence
└─ models.py # Job state models
constrained_logits_processor.py
└─ acestep/core/generation/logits/
├─ processor.py # Logits processor
├─ masks.py # Token masks
└─ rules.py # Constraint rules
results_handlers.py
└─ ui/gradio/events/
├─ results_wiring.py # Event wiring
└─ helpers.py # Shared helpers
cli.py
└─ acestep/app/cli/
├─ init.py # Facade
├─ generate.py # Generation commands
├─ train.py # Training commands
└─ models.py # Model management commands
inference.py
└─ acestep/core/generation/inference/
├─ params.py # Parameter schemas / validation
├─ config.py # Inference config
├─ generate.py # High-level generation entrypoint
└─ sample.py # Sampling utilities
profile_inference.py
└─ acestep/app/profiling/
├─ init.py
├─ profiler.py # Profiling harness
└─ report.py # Timing / memory reports
init.py
└─ keep as facade only; move logic into wiring / runtime modules
openrouter_api_server.py
└─ options:
├─ openrouter/api/ # If kept as separate boundary
└─ or merge into:
acestep/api/http/
generation_handlers.py
└─ ui/gradio/events/
├─ generation_wiring.py # Generation flow wiring
└─ helpers.py
dit_alignment_score.py
└─ acestep/core/scoring/alignment/
├─ aligner.py
├─ lyric_scorer.py
└─ metrics.py
training_handlers.py
└─ ui/gradio/events/
├─ training_wiring.py
└─ helpers.py
generation.py
└─ ui/gradio/interfaces/
├─ generation_layout.py # Main layout
└─ sections/ # Logical UI sections
├─ prompt.py
├─ params.py
└─ output.py
openrouter_adapter.py
└─ openrouter/
├─ openrouter_client.py # HTTP client
├─ requests.py # Request builders
└─ responses.py # Response parsing
stress_test.py
└─ tests/stress/
├─ scenarios/
│ ├─ long_prompt.py
│ └─ batch_load.py
└─ runner.py
trainer.py
└─ training/trainer/
├─ trainer.py
├─ loop.py
├─ checkpointing.py
└─ metrics.py
model_downloader.py
└─ acestep/core/system/models/
├─ downloader.py
└─ resolver.py
training.py
└─ ui/gradio/interfaces/
├─ training_layout.py
└─ sections/
├─ dataset.py
├─ lora.py
└─ run.py
client_test.py
└─ tests/client/
├─ test_generation.py
└─ helpers.py
gpu_config.py
└─ acestep/core/system/gpu/
├─ detect.py
├─ config.py
└─ policy.py
result.py
└─ ui/gradio/interfaces/
├─ result_layout.py
└─ sections/
├─ summary.py
├─ audio.py
└─ metadata.py
api_routes.py
└─ ui/gradio/api/
├─ routes.py
└─ dto.py
data_module.py
└─ training/data/
├─ module.py
└─ collate.py
lora_utils.py
└─ training/lora/
├─ config.py
├─ apply.py
├─ save_load.py
└─ compose.py
acestep_v15_pipeline.py
└─ acestep/core/pipeline/acestep_v15/
├─ pipeline.py
├─ stages.py
└─ config.py
test_time_scaling.py
└─ tests/perf/
├─ scaling.py
└─ benchmarks.py
audio_utils.py
└─ acestep/core/audio/
├─ normalize.py
├─ resample.py
├─ loudness.py
└─ codec.py
check_gpu.py
└─ acestep/core/system/gpu/
├─ detect.py
├─ report.py
└─ bench.py
openrouter_models.py
└─ openrouter/models/
├─ catalog.py
├─ schema.py
└─ filters.py
preprocess.py
└─ data/builder/
├─ scan.py
├─ preprocess.py
├─ labels.py
├─ metadata.py
└─ serialization.py
Relocate (mostly keep, 80–200 LOC)
constants.py → core/system/constants.py
generate_examples.py → generate_examples.py
i18n.py → ui/gradio/i18n/service.py
local_cache.py → core/system/cache/local.py
label_single.py → data/builder/labels.py
prepare_vae_calibration_data.py
→ prepare_calibration.py
debug_utils.py → core/system/debug.py
models.py → models.py
configs.py → configs.py
dataset.py → ui/gradio/interfaces/dataset_layout.py
init.py → keep facade
scan.py → scan.py
dataset_handler.py → data/runtime/dataset_handler.py
Keep as Small Leaf Modules (<80 LOC)
init.py (facades)
dataset_builder.py
audio_io.py
builder.py
core.py
csv_metadata.py
dataframe.py
label_all.py
label_utils.py
metadata.py
preprocess_audio.py
preprocess_context.py
preprocess_encoder.py
preprocess_lyrics.py
preprocess_manifest.py
preprocess_text.py
preprocess_utils.py
preprocess_vae.py
serialization.py
update_sample.py
Vendored Code
acestep/third_parts/nano-vllm/*
→ No decomposition unless explicitly forked
Stub Contracts for Proposed Features
External Text Providers
├─ provider.py
│ ├─ generate_text(request) → response
│ ├─ plan(request) → plan_response
│ └─ health() → provider_status
Multi-LoRA
├─ lora_stack.py
│ ├─ register_lora(name, path)
│ ├─ set_active_loras([{name, weight}])
│ └─ compose_strategy(strategy_name) # sum / sequential / gated
Stem Split / Join
├─ splitter.py
│ └─ split(audio) → {vocals, drums, bass, other, ...}
└─ merger.py
└─ merge(stems, gains, pan) → audio
Migration plan:
Due to the rapidly moving nature of the code-base, a code freeze is improctical, as would be usual for such a refactor, I will have to do a rolling code strategy.
Continuous Migration Model (no freeze)
Run decomposition as a rolling effort on top of upstream. Before each segment, rebase to latest upstream; after rebase, run a fixed quick parity suite. Keep each segment scoped to one subsystem and short-lived (target 1-3 days) to minimize conflict cost.
Define Safety Gates Once
Establish a stable “must-pass” gate used every segment: Gradio startup, API /health, one representative generation path, import smoke, and touched-subsystem tests. Add LOC policy checks (warn >150, fail >200) for first-party modules.
Create Target Package Skeleton
Add the target folder hierarchy (api/core/ui/data/training) and init.py intent comments. This is structure-only, zero behavior change.
Add Compatibility Facades
For modules that will move, keep old import paths alive via thin re-exports. This lets upstream continue merging while decomposition proceeds.
Migrate UI Wiring First (low risk)
Split init.py into dedicated wiring files, keep init.py as facade. Then split large interface layout files similarly. Validate all event bindings still work.
Migrate API Monolith
Decompose api_server.py into api/http/* and api/jobs/* while preserving existing entrypoint behavior. Validate endpoint parity (/health, /v1/models, /release_task, /query_result, /v1/audio).
Decompose Inference and Logits Core
Split inference.py and constrained_logits_processor.py into cohesive modules (params/config/generate, processor/rules/masks). Use deterministic seed-based parity checks.
Decompose handler.py and llm_inference.py Incrementally
Break into responsibility-focused modules behind stable facades. Move first, then split internals. Avoid behavior changes in same PR as structural moves.
Add Future-Proof Stubs (off by default)
Introduce provider contracts and stub adapters (ollama/openai/anthropic), multi-LoRA composition scaffolding, and stem split/merge interfaces. Guard all with feature flags so runtime behavior stays unchanged.
Cutover and De-shim Gradually
Switch internal imports to new paths subsystem by subsystem. Keep shims through a deprecation window, then remove after downstream usage is updated. Keep rebasing-before-segment and the same safety gates until migration completes.
Segment execution template (repeat each PR):
Rebase on upstream.
Implement one scoped structural change.
Run fixed parity + targeted tests.
Merge quickly.
Start next segment from fresh rebase.
Revisions:
"Data" directory proposal renamed to "dataset" to prevent clash with .gitignored user data directory.
Progress update after merged PR #431 against decomposition plan #408:
Detailed variance + revised structure/breakdown plan
Compared against:
Where this landed:
handler.pydecomposition slice, focused on LoRA.acestep/core/generation/handler/__init__.py.progress.pyin handler path as part of decomposition groundwork.What is still pending from the original sequence:
Variance from original plan (intentional and required):
acestep/core/lora/*earlier than planned as a first-class reusable runtime domain.lora_manager.py) to a two-layer model:acestep/core/generation/handler/lora/*acestep/core/lora/*training/lora/*scoped for training-specific concerns, while generation/runtime LoRA logic now resides incore/lora/*.Revised plan doc:
docs/functional-decomposition-plan-408-r1.mdFD Update #456 / #464 ): Init-Service decomposition progress
Completed today
1) Core decomposition slice delivered
acestep/handler.pyinto:acestep/core/generation/handler/init_service.pyAceStepHandlerAPI surface stable via mixin inheritance:AceStepHandler(InitServiceMixin, LoraManagerMixin, ProgressMixin)acestep/core/generation/handler/__init__.py2) Review-driven fixes for pre-existing issues applied
get_available_checkpointsdocstring to match actual behavior/return._empty_cache.torch.nn.Parameter(no silent deregistration).torch.nn.Parameter._is_on_target_devicefallback for malformed device strings (warn + conservativeFalse)._device_type()to backend tokens (cuda/mps/xpu/cpu) for routing helpers.3) Test coverage expanded
acestep/core/generation/handler/init_service_test.pyParameter4) Validation status
Branch / PR handling notes
upstream/main.Known follow-up (explicitly deferred)
cuda:0/cuda:1) is still not centralized.Net result
This work package is a clean FD increment: behavior-preserving extraction plus targeted hardening and tests, with one explicit indexing follow-up tracked for a later slice.
Beta Was this translation helpful? Give feedback.
All reactions