Refactor losses instantiation and chunked CE #2531

felipemello1 · 2025-03-27T04:41:10Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

IMPORTANT: Recipes do NOT work with older version of ChunkedCrosEntropy anymore, because we dont expect transformer to chunk the outputs.

Problem:

We have seen many chunked losses being added to torchtune. The current setup put the chunking burden on the model.
Users have interest in using losses that require model.output.weight as input, e.g. liger losses

Solution:

Enable the recipe to call loss(weight, input, targets)
Reimplement ChunkedCE, so that chunking and projection happens in the loss.
Adds protocol so that new losses can follow the same pattern

PROFILING: https://drive.google.com/drive/folders/1jHOCuOF74F9lmmJv7wxbcK-i_wtB2stf?usp=sharing

Changelog

Updated full_distributed and lora_distributed
Tested with lora llama 3.2 distributed (TiedLinear)
Implemented new ChunkedCE

TODO: when approved, will implement it to the other recipes/losses/update configs

Test

ChunkedCrossEntropyLoss

tune run --nproc_per_node 2 lora_finetune_distributed --config /data/users/felipemello/torchtune/recipes/configs/llama3_2/1B_lora.yaml \
metric_logger=torchtune.training.metric_logging.WandBLogger \
dataset.packed=True \
dataset.split=train[:50%] \
tokenizer.max_seq_len=4096 \
gradient_accumulation_steps=1 \
batch_size=4 \
max_steps_per_epoch=20 \
compile=True \
use_output_weight_in_loss=True \
loss=torchtune.modules.loss.sft_losses.ChunkedCrossEntropyLoss

To reproduce

fork ----> https://github.com/pytorch/torchtune
git clone https://github.com/<YOUR_GITHUB_USER>/torchtune.git

cd torchtune
conda create -n torchtune python=3.11
conda activate torchtune
pip install --pre --upgrade torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu124
pip install -e ".[dev]"
pre-commit install

git remote add felipemello1 https://github.com/felipemello1/torchtune.git
git checkout -b loss_refactor felipemello1/loss_refactor

pytorch-bot · 2025-03-27T04:41:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2531

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f4b5fc1 with merge base 1be43b6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchtune/modules/loss/ce_chunked_output_loss.py

torchtune/modules/transformer.py

torchtune/training/_compile.py

torchtune/modules/transformer.py

torchtune/modules/loss/sft_losses.py

torchtune/modules/model_fusion/_deep_fusion.py

torchtune/modules/transformer.py

torchtune/modules/model_fusion/_deep_fusion.py

torchtune/modules/loss/sft_losses.py

joecummings · 2025-04-04T20:38:42Z

torchtune/modules/loss/sft_losses.py

+        return total_loss / total_elements
+
+
+class ChunkedCrossEntropywithAutograd(torch.autograd.Function):


Why did you want to add these Autograd versions? How does this help you test?

this version is based off horace's code from a few months back. In this implementation, the chunks are not held up in memory. He coded it to show that you dont need trition.

I dont want to keep it in torchtune, because it would be hard to use to for KD/RL losses. This is more a reference for the compile folks. They are working on enabling the chunking on compile to match the autograd memory perf.

Without autograd

with autograd

Can you put a comment in the code to that effect?

SalmanMohammadi · 2025-04-04T20:43:51Z

@felipemello1 this is awesome. Out of curioisity did you happen to benchmark against the existing CEWithChunkedOutputLoss?

I wonder if we could simplify the configuration further by removing the need for the user to also specify use_output_weight_in_loss? Could we define a

class BaseLoss(Protocol):
    is_chunked: bool

and do

-                if self.use_output_weight_in_loss:
+                if self.loss_fn.is_chunked:
                    weight = self._model.get_output_weight()
                    current_loss = self._loss_fn(weight, outputs, labels)
                else:
                    labels = labels.reshape(-1)
                    logits = logits.reshape(-1, logits.size(-1))
                    outputs = outputs.reshape(-1, outputs.size(-1))
                    current_loss = self._loss_fn(outputs, labels)

It would require either 1) requiring that all losses use this protocol (which tbh I wouldn't be opposed to as we start to support more custom losses without needing to modify recipes), or doing a hasattr check on self._loss_fn and relying on an identifying field on just the chunked losses.

wdyt?

torchtune/modules/transformer.py

felipemello1 · 2025-04-04T20:49:25Z

I wonder if we could simplify the configuration further by removing the need for the user to also specify use_output_weight_in_loss?

@SalmanMohammadi , i thought about it and even implemented, but then realized that it would be hard to support 3rd party libraries, unless we create some sort of loss adapter, which we may need to do anyway, because not all libraries follow the patten (weight, input, label). They may follow (label, weight, input), for example.

the loss adapter could be something like:

config.yaml

loss:
	_component_: torchtune.loss.lossadapter
   loss: path.to.loss
   requires_weight_input: True
   input_order: ["label", "weight", "input"]

Co-authored-by: salman <[email protected]>

torchtune/modules/transformer.py

torchtune/training/_compile.py

SalmanMohammadi · 2025-04-15T16:35:08Z

recipes/full_finetune_distributed.py

                # Shift labels to compute loss
                # equivalent to doing labels[..., 1:] and logits[..., :-1, :]
                # But this way we dont need to slice the logits. We just add an ignore index to labels.
                labels = torch.hstack(
                    (labels[..., 1:], self.ignore_labels_cache[: labels.shape[0]])
                )
-                if not isinstance(logits, list):
+
+                if self.use_output_weight_in_loss:


SalmanMohammadi · 2025-04-15T16:36:37Z

recipes/full_finetune_distributed.py

-            # set num_output_chunks for model
-            self._model.set_num_output_chunks(self._loss_fn.num_output_chunks)
+        # The loss may handle the output projection. If true, the model should skip it.
+        self.use_output_weight_in_loss = getattr(


tangential point: if the contract is that SFT losses follow the protocols defined in loss_protocols, do we need to make this check?

someone may try to use a loss that is not from torchtune, e.g. vanilla F.cross_entropy_loss

torchtune/modules/loss/cross_entropy_loss.py

SalmanMohammadi · 2025-04-15T16:40:48Z

torchtune/modules/loss/loss_protocols.py

+
+
+class SFTLoss(Protocol):
+    """Protocol for loss functions in torchtune used in sft recipes."""


Suggested change

"""Protocol for loss functions in torchtune used in sft recipes."""

"""Protocol for loss functions in torchtune used in SFT recipes."""

I dont know if i like "SFT" here, since it may not be obvious for a new reader what it means

Well, I am a new reader, and if I see all capitalized letters I immediately think that it's an abbreviation, not just a word.
Actually, it's an Initialism as I've just learned.

SalmanMohammadi · 2025-04-15T16:41:01Z

torchtune/modules/loss/loss_protocols.py

+
+
+class SFTLossWithProjection(Protocol):
+    """Protocol for loss functions in torchtune used in Supervised Finetune recipes and that require


Suggested change

"""Protocol for loss functions in torchtune used in Supervised Finetune recipes and that require

"""Protocol for loss functions in torchtune used in SFT recipes and that require

I prefer "SFTI dont know if i like "SFT" here, since it may not be obvious for a new reader what it means

SalmanMohammadi

real nice

pbontrager

Thanks for this big effort. This looks good and I'm happy to approve it now. Please finish going through and resolving the open comments before landing.

pbontrager · 2025-03-28T15:13:08Z

recipes/full_finetune_distributed.py

+        # skip final projection, since the loss takes hidden input instead of logits
+        self.skip_unembedding = cfg.get("loss_takes_embeddings", False)
+        self._model.set_skip_unembedding(self.skip_unembedding)


nit: skip_output_layer

pbontrager · 2025-04-16T14:44:16Z

torchtune/modules/loss/loss_protocols.py

+import torch
+
+
+class SFTLossWithProjection(Protocol):


I agree that this name is confusing. I think we should just standardize on "fused" or "linear", or "chunked". All the names have issues which we've discussed but if we're consistent at least people should be able to learn the term quickly.

pbontrager · 2025-04-16T14:48:29Z

torchtune/modules/loss/cross_entropy_loss.py

+                target_chunks[idx],
+            )
+
+        return total_loss / total_elements


nit: it'd be nice to offer the same 'reduction' option as most pytorch losses to control returning the mean, sum, or no reduction

pbontrager · 2025-04-16T14:50:11Z

recipes/lora_finetune_distributed.py

@@ -301,9 +301,12 @@ def setup(self, cfg: DictConfig) -> None:
        if self._compile:


What's the plan for rolling this out to the other sft recipes?

Recipes NOT being updated should still work with configs NOT being updated

Recipes being updated should NOT work anymore with old ce_with_chunked_outputs_loss

So any recipe that is changed also requires the configs to be updated with the new loss

TODO: need to check if the deprecation warnings work fine. This can be checked by running a recipe/config that has not been updated.

pbontrager · 2025-04-16T14:55:49Z

torchtune/modules/transformer.py

@@ -396,6 +400,7 @@ def __init__(
        self.head_dim = head_dim
        self.causal_mask = None
        self.num_output_chunks = 0
+        self._skip_output_projection = False



You should enforce in init that the output module has the "weight" property

Co-authored-by: salman <[email protected]>

ebsmothers · 2025-04-30T04:39:17Z

recipes/dev/early_exit_finetune_distributed.py

+        elif getattr(self._loss_fn, "linear_loss", False):
+            raise ValueError(
+                "Linear losses are not supported yet for KD. Please use the deprecated CEWithChunkedOutputLoss."
+            )


Wrong error message? Also can we open a high-priority issue for this? I don't like being in a state where half our configs are on a deprecated API

ebsmothers · 2025-04-30T04:40:40Z

torchtune/modules/loss/ce_chunked_output_loss.py

+        msg = (
+            "'CEWithChunkedOutputLoss' is deprecated and will be removed in future versions. "
+            "Please use `torchtune.modules.loss.LinearCrossEntropyLoss` instead."
+        )
+        log_once(logger=logger, msg=msg, level=logging.WARNING)


nit: isn't this what the deprecated decorator is for?

ebsmothers · 2025-04-30T04:41:09Z

torchtune/modules/loss/ce_chunked_output_loss.py

@@ -42,6 +52,13 @@ def compute_cross_entropy(
            logits.float(), labels, ignore_index=self.ignore_index, reduction="sum"
        )

+    def apply_compile_strategy(self, *args, **kwargs):
+        """Applies compile only to the fkl_loss function."""


this isn't fkl?

No, it's not :)

ebsmothers · 2025-04-30T04:43:16Z

torchtune/modules/model_fusion/_deep_fusion.py

@@ -96,6 +100,12 @@ def __init__(
    def set_num_output_chunks(self, num_output_chunks: int) -> None:
        """Used to save memory in combination with :class:`~torchtune.modules.loss.CEWithChunkedOutputLoss`.
        This should be called before the first forward pass, in the recipe."""
+        msg = (


same here, can we use deprecated decorator?

ebsmothers · 2025-04-30T04:50:35Z

torchtune/modules/transformer.py

+        """
+        # Accessing the weight directly will not trigger FSDP hooks
+        # to gather the full tensor so we have to unshard manually
+        if isinstance(self.output, FSDPModule):


So this is not relevant for early fusion or deep fusion?

ebsmothers · 2025-04-30T04:51:15Z

torchtune/modules/transformer.py

+        # to gather the full tensor so we have to unshard manually
+        if isinstance(self.output, FSDPModule):
+            self.output.unshard()
+            weight = self.output.weight.clone()


so the fix was the removal of detach? also any memory implications of the clone here?

ebsmothers · 2025-04-30T04:54:21Z

torchtune/modules/loss/loss_protocols.py

+    """Protocol for loss functions in torchtune used in Supervised Finetune recipes that require
+    model output linear projection weights in loss computation."""
+
+    linear_loss: bool = True


Can we leave a comment in these classes explaining what this field means? Also an example usage in the docstrings of both would help a lot imo

ebsmothers · 2025-04-30T04:54:56Z

torchtune/modules/loss/cross_entropy_loss.py

+class LinearCrossEntropyLoss(nn.Module, SFTLinearLoss):
+    """Memory efficient Cross-entropy loss that incrementally computes loss for chunks of tokens
+    by masking ignored tokens, calculating logits and then applying cross-entropy loss. Combines
+    the linear projection with the cross-entropy calculation for futher memory savings.


nit

Suggested change

the linear projection with the cross-entropy calculation for futher memory savings.

the linear projection with the cross-entropy calculation for further memory savings.

ebsmothers · 2025-04-30T04:56:08Z

torchtune/modules/loss/cross_entropy_loss.py

+            mask_pre_projection (bool): Whether to mask the output tensor before projection, avoiding
+                computing it for tokens that will be ignored during CE anyway. Default is True.


Maybe I'm out of the loop here, but why do we need to expose this? This doesn't seem like an intuitive parameter to me, is there a reason someone would want to modify it?

I'm also curious.

ebsmothers

Just a few more comments, but really happy with how this turned out. This addresses a longstanding problem of inflexibility in our losses with a clear UX and opens us up to other more memory-efficient CE implementations.

Co-authored-by: Felipe Mello <[email protected]> Co-authored-by: salman <[email protected]> Co-authored-by: Philip Bontrager <[email protected]> Co-authored-by: joecummings <[email protected]>

Andrei-Aksionov · 2025-05-05T16:03:22Z

recipes/dev/lora_finetune_distributed_multi_dataset.py


-                if not isinstance(logits, list):
+                if self.linear_loss:


I have a question: what if we can encapsulate the logic specific to linear cross entropy in the LinearCrossEntropy class itself?
When we check whether it's a linear ce or not, we can assign self._loss_fn.output_weights=... and then just inject it during forward pass without a need to provide it explicitly.
This way we don't need a custom logic for loss calculation, so it could be unified for all losses.

Maybe it can somehow affect compilations? 🤷
Or there is a plan to have a functional version of this loss in the future?
Or it's just plain dumb? 😆

Andrei-Aksionov · 2025-05-05T16:07:24Z

recipes/dev/lora_finetune_distributed_multi_dataset.py


-                # Compute loss
                # Loss is normalized by default so we multiply by the number of tokens


Nit: is it indeed normalized?
For me, it's more like "aggregated".
Normalization is a different thing, no?

Andrei-Aksionov · 2025-05-05T16:13:57Z

recipes/configs/code_llama2/7B_full_low_memory.yaml

@@ -63,7 +63,7 @@ optimizer:
  lr: 2e-5
 optimizer_in_bwd: True  # True saves memory. Requires gradient_accumulation_steps=1
 loss:
-  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
+  _component_: torchtune.modules.loss.LinearCrossEntropyLoss


Idea: turn on compilation for LinearCrossEntropyLoss by default.
Without compilation there shouldn't be any benefits like online softmax, simultaneous logits and loss calculation, ...
So it won't be, em, linear at all 🤓

Andrei-Aksionov · 2025-05-05T16:15:16Z

torchtune/modules/loss/ce_chunked_output_loss.py

@@ -42,6 +52,13 @@ def compute_cross_entropy(
            logits.float(), labels, ignore_index=self.ignore_index, reduction="sum"
        )

+    def apply_compile_strategy(self, *args, **kwargs):
+        """Applies compile only to the fkl_loss function."""


No, it's not :)

Andrei-Aksionov · 2025-05-05T16:20:45Z

torchtune/modules/loss/cross_entropy_loss.py

+            mask_pre_projection (bool): Whether to mask the output tensor before projection, avoiding
+                computing it for tokens that will be ignored during CE anyway. Default is True.


I'm also curious.

Andrei-Aksionov · 2025-05-05T16:27:45Z

torchtune/modules/loss/cross_entropy_loss.py

+        total_elements = mask.sum()
+
+        # Chunk along sequence dimension
+        hidden_chunks = outputs.tensor_split(self.num_output_chunks, dim=1)


Out of curiosity: does it make a difference in peak memory consumption in case of linear CE?

If compilation works as I anticipate it in case of LinearCrossEntropyLoss class, where logits calculation and loss calculation are within the same method and thus compilation can produce basically the same kernels as custom kernels for cut cross entropy, so if it's true, than chunking is no longer needed.

Maybe only when someone uses LinearCrossEntropyLoss without compilation 🤔

Andrei-Aksionov · 2025-05-05T16:42:37Z

torchtune/modules/loss/loss_protocols.py

+
+
+class SFTLoss(Protocol):
+    """Protocol for loss functions in torchtune used in sft recipes."""


Well, I am a new reader, and if I see all capitalized letters I immediately think that it's an abbreviation, not just a word.
Actually, it's an Initialism as I've just learned.

Andrei-Aksionov · 2025-05-05T16:51:05Z

Hello @felipemello1
Sorry for the late review of this PR. Just left a couple of comments.
But in overall, great job!

Refactor losses instantion and chunked CE

74814e8

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 27, 2025

felipemello1 mentioned this pull request Mar 27, 2025

Linear Cross Entropy #2507

Closed

13 tasks

felipemello1 commented Mar 27, 2025

View reviewed changes

torchtune/modules/loss/ce_chunked_output_loss.py Outdated Show resolved Hide resolved

felipemello1 commented Mar 27, 2025

View reviewed changes

torchtune/modules/loss/ce_chunked_output_loss.py Outdated Show resolved Hide resolved

felipemello1 commented Mar 27, 2025

View reviewed changes

torchtune/modules/transformer.py Outdated Show resolved Hide resolved

felipemello1 commented Mar 27, 2025

View reviewed changes

torchtune/training/_compile.py Show resolved Hide resolved

felipemello1 changed the title ~~Refactor losses installation and chunked CE~~ Refactor losses instantiation and chunked CE Mar 27, 2025

felipemello1 marked this pull request as draft March 31, 2025 14:39

Felipe Mello added 7 commits March 31, 2025 10:44

updates

8eb9fb3

Merge branch 'main' into loss_refactor

70a475d

fix test

5e76d8f

fix test

578fdd7

add ChunkedCrossEntropywithAutogradLoss test

75a34ac

add ChunkedCrossEntropywithAutogradLoss test

a67307c

remove var

21601bb

felipemello1 marked this pull request as ready for review March 31, 2025 22:16

SalmanMohammadi reviewed Apr 4, 2025

View reviewed changes

torchtune/modules/transformer.py Show resolved Hide resolved

SalmanMohammadi reviewed Apr 4, 2025

View reviewed changes

torchtune/modules/loss/sft_losses.py Outdated Show resolved Hide resolved

SalmanMohammadi reviewed Apr 4, 2025

View reviewed changes

torchtune/modules/model_fusion/_deep_fusion.py Outdated Show resolved Hide resolved

SalmanMohammadi reviewed Apr 4, 2025

View reviewed changes

torchtune/modules/model_fusion/_deep_fusion.py Show resolved Hide resolved

joecummings reviewed Apr 4, 2025

View reviewed changes

torchtune/modules/transformer.py Outdated Show resolved Hide resolved

torchtune/modules/transformer.py Outdated Show resolved Hide resolved

torchtune/modules/model_fusion/_deep_fusion.py Show resolved Hide resolved

torchtune/modules/loss/sft_losses.py Outdated Show resolved Hide resolved

joecummings reviewed Apr 4, 2025

View reviewed changes

SalmanMohammadi reviewed Apr 4, 2025

View reviewed changes

torchtune/modules/transformer.py Outdated Show resolved Hide resolved

Update torchtune/modules/transformer.py

d98bc9c

Co-authored-by: salman <[email protected]>

SalmanMohammadi reviewed Apr 4, 2025

View reviewed changes

torchtune/modules/transformer.py Outdated Show resolved Hide resolved

SalmanMohammadi reviewed Apr 4, 2025

View reviewed changes

torchtune/training/_compile.py Outdated Show resolved Hide resolved

SalmanMohammadi reviewed Apr 15, 2025

View reviewed changes

torchtune/modules/loss/cross_entropy_loss.py Outdated Show resolved Hide resolved

SalmanMohammadi reviewed Apr 15, 2025

View reviewed changes

SalmanMohammadi approved these changes Apr 15, 2025

View reviewed changes

pbontrager approved these changes Apr 16, 2025

View reviewed changes

felipemello1 and others added 7 commits April 28, 2025 16:06

Update torchtune/modules/loss/loss_protocols.py

14ccb10

Co-authored-by: salman <[email protected]>

Update torchtune/modules/loss/cross_entropy_loss.py

bfb038a

Co-authored-by: salman <[email protected]>

update from comments, update configs

be2330a

Merge branch 'main' into loss_refactor

2ac52f1

Collect sharded output weight

1d08faa

Fully unshard and reshard if accessing the output layer

2de3db2

Apply compile strategy to deprecated loss method for now

f4b5fc1

ebsmothers reviewed Apr 30, 2025

View reviewed changes

pbontrager merged commit 9c06c8b into pytorch:main Apr 30, 2025
14 checks passed

Andrei-Aksionov reviewed May 5, 2025

View reviewed changes

felipemello1 deleted the loss_refactor branch May 5, 2025 18:49

pbontrager mentioned this pull request May 7, 2025

Linear Loss Tracker #2688

Open

		return total_loss / total_elements


		class ChunkedCrossEntropywithAutograd(torch.autograd.Function):



		class SFTLoss(Protocol):
		"""Protocol for loss functions in torchtune used in sft recipes."""



		class SFTLossWithProjection(Protocol):
		"""Protocol for loss functions in torchtune used in Supervised Finetune recipes and that require

		@@ -301,9 +301,12 @@ def setup(self, cfg: DictConfig) -> None:
		if self._compile:

	the linear projection with the cross-entropy calculation for futher memory savings.
	the linear projection with the cross-entropy calculation for further memory savings.

		mask_pre_projection (bool): Whether to mask the output tensor before projection, avoiding
		computing it for tokens that will be ignored during CE anyway. Default is True.


		# Compute loss
		# Loss is normalized by default so we multiply by the number of tokens

Refactor losses instantiation and chunked CE #2531

Refactor losses instantiation and chunked CE #2531

Uh oh!

Conversation

felipemello1 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changelog

Test

To reproduce

Uh oh!

pytorch-bot bot commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2531

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SalmanMohammadi commented Apr 4, 2025

Uh oh!

Uh oh!

felipemello1 commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SalmanMohammadi left a comment

Choose a reason for hiding this comment

Uh oh!

pbontrager left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felipemello1 commented Mar 27, 2025 •

edited

Loading

pytorch-bot bot commented Mar 27, 2025 •

edited

Loading

felipemello1 commented Apr 4, 2025 •

edited

Loading