You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/reference/llms.rst
+23-1Lines changed: 23 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -10,9 +10,29 @@ TorchRL offers a set of tools for LLM post-training, as well as some examples fo
10
10
Collectors
11
11
----------
12
12
13
-
TorchRL offers a specialized collector class (:class:`~torchrl.collectors.llm.LLMCollector`) that is tailored for LLM
13
+
TorchRL offers specialized collector classes (:class:`~torchrl.collectors.llm.LLMCollector` and :class:`~torchrl.collectors.llm.RayLLMCollector`) that are tailored for LLM
14
14
use cases. We also provide dedicated updaters for some inference engines.
15
15
16
+
LLM Collectors allow to track the version of the policy, which is useful for some use cases.
17
+
This is done by adding a :class:`~torchrl.envs.llm.transforms.PolicyVersion` transform to the environment, which is
18
+
then incremented by the collector after each weight update. To do this, one either provides the stateful version of the
19
+
transform, or a boolean to the collector constructor.
20
+
21
+
>>> from torchrl.envs.llm.transforms import PolicyVersion
22
+
>>> from torchrl.collectors.llm import LLMCollector
23
+
>>> from torchrl.collectors.llm.weight_update import vLLMUpdater
0 commit comments