Skip to content

Commit 4b8916b

Browse files
committed
[Store] Extract some code from Indexer to a dedicated class + Introduce Ingester
1 parent da3c61a commit 4b8916b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+564
-444
lines changed

demo/AGENTS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ symfony console mcp:server
6262
- **Agents**: blog, stream, youtube, wikipedia, audio
6363
- **Platform**: OpenAI integration
6464
- **Store**: ChromaDB vector store
65-
- **Indexer**: Text embedding model
65+
- **Ingester**: Text embedding model
6666

6767
### Chat Pattern
6868
- `Chat` class: Message flow and session management
@@ -76,4 +76,4 @@ symfony console mcp:server
7676
- OpenAI GPT-4o-mini default model
7777
- ChromaDB on port 8080
7878
- LiveComponents for real-time UI
79-
- Symfony DI and best practices
79+
- Symfony DI and best practices

demo/CLAUDE.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ This is a Symfony 7.3 demo application showcasing AI integration capabilities us
1010

1111
### Core Components
1212
- **Chat Systems**: Multiple specialized chat implementations in `src/` (Blog, YouTube, Wikipedia, Audio, Stream)
13-
- **Twig LiveComponents**: Interactive UI components using Symfony UX for real-time chat interfaces
13+
- **Twig LiveComponents**: Interactive UI components using Symfony UX for real-time chat interfaces
1414
- **AI Agents**: Configured agents with different models, tools, and system prompts
1515
- **Vector Store**: ChromaDB integration for embedding storage and similarity search
1616
- **MCP Tools**: Model Context Protocol tools for extending agent capabilities
@@ -36,7 +36,7 @@ composer install
3636
echo "OPENAI_API_KEY='sk-...'" > .env.local
3737

3838
# Initialize vector store
39-
symfony console ai:store:index blog -vv
39+
symfony console ai:store:ingest blog -vv
4040

4141
# Test vector store
4242
symfony console ai:store:retrieve blog "Week of Symfony"
@@ -81,7 +81,7 @@ symfony console mcp:server
8181
- **Agents**: Multiple pre-configured agents (blog, stream, youtube, wikipedia, audio)
8282
- **Platform**: OpenAI integration with API key from environment
8383
- **Store**: ChromaDB vector store for similarity search
84-
- **Indexer**: Text embedding model configuration
84+
- **Ingester**: Text embedding model configuration
8585

8686
### Chat Implementations
8787
Each chat type follows the pattern:
@@ -100,4 +100,4 @@ Chat history stored in Symfony sessions with component-specific keys (e.g., 'blo
100100
- ChromaDB runs on port 8080 (mapped from container port 8000)
101101
- Application follows Symfony best practices with dependency injection
102102
- LiveComponents provide real-time UI updates without custom JavaScript
103-
- MCP server enables tool integration for AI agents
103+
- MCP server enables tool integration for AI agents

demo/config/packages/ai.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ ai:
8989
openai:
9090
platform: 'ai.platform.openai'
9191
model: 'text-embedding-ada-002'
92-
indexer:
92+
ingester:
9393
blog:
9494
loader: 'Symfony\AI\Store\Document\Loader\RssFeedLoader'
9595
source: 'https://feeds.feedburner.com/symfony/blog'

docs/bundles/ai-bundle.rst

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ Advanced Example with Multiple Agents
124124
mistral_embeddings:
125125
platform: 'ai.platform.mistral'
126126
model: 'mistral-embed'
127-
indexer:
127+
ingester:
128128
default:
129129
loader: 'Symfony\AI\Store\Document\Loader\InMemoryLoader'
130130
vectorizer: 'ai.vectorizer.openai_embeddings'
@@ -721,26 +721,26 @@ The ``ai:store:drop`` command drops the infrastructure for a store (e.g., remove
721721
This command only works with stores that implement ``ManagedStoreInterface``.
722722
Not all store types support drop operations.
723723

724-
``ai:store:index``
725-
~~~~~~~~~~~~~~~~~~
724+
``ai:store:ingest``
725+
~~~~~~~~~~~~~~~~~~~
726726

727-
The ``ai:store:index`` command indexes documents into a store using a configured indexer.
727+
The ``ai:store:ingest`` command ingests documents into a store using a configured ingester.
728728

729729
.. code-block:: terminal
730730
731-
$ php bin/console ai:store:index <indexer>
731+
$ php bin/console ai:store:ingest <ingester>
732732
733-
# Index using the default indexer
734-
$ php bin/console ai:store:index default
733+
# Ingest using the default ingester
734+
$ php bin/console ai:store:ingest default
735735
736736
# Override the configured source with a single file
737-
$ php bin/console ai:store:index blog --source=/path/to/file.txt
737+
$ php bin/console ai:store:ingest blog --source=/path/to/file.txt
738738
739739
# Override with multiple sources
740-
$ php bin/console ai:store:index blog --source=/path/to/file1.txt --source=/path/to/file2.txt
740+
$ php bin/console ai:store:ingest blog --source=/path/to/file1.txt --source=/path/to/file2.txt
741741
742-
The ``--source`` (or ``-s``) option allows you to override the source(s) configured in your indexer.
743-
This is useful for ad-hoc indexing operations or testing different data sources.
742+
The ``--source`` (or ``-s``) option allows you to override the source(s) configured in your ingester.
743+
This is useful for ad-hoc ingesting operations or testing different data sources.
744744

745745
Usage
746746
-----
@@ -935,7 +935,7 @@ Vectorizers
935935
-----------
936936

937937
Vectorizers are components that convert text documents into vector embeddings for storage and retrieval.
938-
They can be configured once and reused across multiple indexers, providing better maintainability and consistency.
938+
They can be configured once and reused across multiple ingesters, providing better maintainability and consistency.
939939

940940
Configuring Vectorizers
941941
~~~~~~~~~~~~~~~~~~~~~~~
@@ -961,15 +961,15 @@ Vectorizers are defined in the ``vectorizer`` section of your configuration:
961961
platform: 'ai.platform.mistral'
962962
model: 'mistral-embed'
963963
964-
Using Vectorizers in Indexers
965-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
964+
Using Vectorizers in Ingesters
965+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
966966

967-
Once configured, vectorizers can be referenced by name in indexer configurations:
967+
Once configured, vectorizers can be referenced by name in ingester configurations:
968968

969969
.. code-block:: yaml
970970
971971
ai:
972-
indexer:
972+
ingester:
973973
documents:
974974
loader: 'Symfony\AI\Store\Document\Loader\TextFileLoader'
975975
vectorizer: 'ai.vectorizer.openai_small'
@@ -988,14 +988,14 @@ Once configured, vectorizers can be referenced by name in indexer configurations
988988
Benefits of Configured Vectorizers
989989
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
990990

991-
* **Reusability**: Define once, use in multiple indexers
992-
* **Consistency**: Ensure all indexers using the same vectorizer have identical embedding configuration
991+
* **Reusability**: Define once, use in multiple ingesters
992+
* **Consistency**: Ensure all ingesters using the same vectorizer have identical embedding configuration
993993
* **Maintainability**: Change vectorizer settings in one place
994994

995995
Retrievers
996996
----------
997997

998-
Retrievers are the opposite of indexers. While indexers populate a vector store with documents,
998+
Retrievers are the opposite of ingesters. While ingesters populate a vector store with documents,
999999
retrievers allow you to search for documents in a store based on a query string.
10001000
They vectorize the query and retrieve similar documents from the store.
10011001

docs/components/store.rst

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,19 +19,21 @@ implemented by different concrete and vendor-specific implementations, so called
1919
On top of those bridges, the Store component provides higher level features to populate and query those stores with and
2020
for documents.
2121

22-
Indexing
23-
--------
22+
Ingesting
23+
---------
2424

25-
One higher level feature is the :class:`Symfony\\AI\\Store\\Indexer`. The purpose of this service is to populate a store with documents.
25+
One higher level feature is the :class:`Symfony\\AI\\Store\\Ingester`. The purpose of this service is to populate a store with documents.
2626
Therefore it accepts one or multiple :class:`Symfony\\AI\\Store\\Document\\TextDocument` objects, converts them into embeddings and stores them in the
2727
used vector store::
2828

2929
use Symfony\AI\Store\Document\TextDocument;
3030
use Symfony\AI\Store\Indexer;
31+
use Symfony\AI\Store\Ingester;
3132

32-
$indexer = new Indexer($platform, $model, $store);
33-
$document = new TextDocument('This is a sample document.');
34-
$indexer->index($document);
33+
$document = [new TextDocument('This is a sample document.')];
34+
$loader = new InMemoryLoader($documents)
35+
$indexer = new Ingester($loader, new Indexer($vectorizer, $store));
36+
$indexer->index();
3537

3638
You can find more advanced usage in combination with an Agent using the store for RAG in the examples folder.
3739

docs/cookbook/rag-implementation.rst

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -89,17 +89,20 @@ Use a vectorizer to convert documents into embeddings and store them::
8989
use Symfony\AI\Store\Document\Loader\InMemoryLoader;
9090
use Symfony\AI\Store\Document\Vectorizer;
9191
use Symfony\AI\Store\Indexer;
92+
use Symfony\AI\Store\Ingester;
9293

9394
$platform = PlatformFactory::create(env('OPENAI_API_KEY'));
9495
$vectorizer = new Vectorizer($platform, 'text-embedding-3-small');
95-
$indexer = new Indexer(
96+
$ingester = new Ingester(
9697
new InMemoryLoader($documents),
97-
$vectorizer,
98-
$store
98+
new Indexer(
99+
$vectorizer,
100+
$store
101+
),
99102
);
100-
$indexer->index($documents);
103+
$ingester->ingest();
101104

102-
The indexer handles:
105+
The ingester handles:
103106

104107
* Loading documents from the source
105108
* Generating vector embeddings
@@ -324,7 +327,7 @@ Index documents in batches for better performance::
324327

325328
$batchSize = 100;
326329
foreach (array_chunk($documents, $batchSize) as $batch) {
327-
$indexer->index($batch);
330+
$ingester->ingest(options: $batch);
328331
}
329332

330333
Caching Embeddings
Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -15,29 +15,31 @@
1515
use Symfony\AI\Store\Document\Transformer\TextSplitTransformer;
1616
use Symfony\AI\Store\Document\Vectorizer;
1717
use Symfony\AI\Store\Indexer;
18+
use Symfony\AI\Store\Ingester;
1819
use Symfony\AI\Store\InMemory\Store as InMemoryStore;
1920

2021
require_once dirname(__DIR__).'/bootstrap.php';
2122

2223
$platform = PlatformFactory::create(env('OPENAI_API_KEY'), http_client());
2324
$store = new InMemoryStore();
2425
$vectorizer = new Vectorizer($platform, 'text-embedding-3-small');
25-
$indexer = new Indexer(
26+
$ingester = new Ingester(
2627
loader: new TextFileLoader(),
27-
vectorizer: $vectorizer,
28-
store: $store,
29-
source: [
30-
dirname(__DIR__, 2).'/fixtures/movies/gladiator.md',
31-
dirname(__DIR__, 2).'/fixtures/movies/inception.md',
32-
dirname(__DIR__, 2).'/fixtures/movies/jurassic-park.md',
33-
],
34-
transformers: [
35-
new TextReplaceTransformer(search: '## Plot', replace: '## Synopsis'),
36-
new TextSplitTransformer(chunkSize: 500, overlap: 100),
37-
],
28+
indexer: new Indexer(
29+
vectorizer: $vectorizer,
30+
store: $store,
31+
transformers: [
32+
new TextReplaceTransformer(search: '## Plot', replace: '## Synopsis'),
33+
new TextSplitTransformer(chunkSize: 500, overlap: 100),
34+
],
35+
),
3836
);
3937

40-
$indexer->index();
38+
$ingester->ingest([
39+
dirname(__DIR__, 2).'/fixtures/movies/gladiator.md',
40+
dirname(__DIR__, 2).'/fixtures/movies/inception.md',
41+
dirname(__DIR__, 2).'/fixtures/movies/jurassic-park.md',
42+
]);
4143

4244
$vector = $vectorizer->vectorize('Roman gladiator revenge');
4345
$results = $store->query($vector);

examples/indexer/index-inmemory-loader.php renamed to examples/ingester/index-inmemory-loader.php

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
use Symfony\AI\Store\Document\Transformer\TextSplitTransformer;
1717
use Symfony\AI\Store\Document\Vectorizer;
1818
use Symfony\AI\Store\Indexer;
19+
use Symfony\AI\Store\Ingester;
1920
use Symfony\AI\Store\InMemory\Store as InMemoryStore;
2021
use Symfony\Component\Uid\Uuid;
2122

@@ -38,17 +39,18 @@
3839
),
3940
];
4041

41-
$indexer = new Indexer(
42+
$ingester = new Ingester(
4243
loader: new InMemoryLoader($documents),
43-
vectorizer: $vectorizer,
44-
store: $store,
45-
source: null,
46-
transformers: [
47-
new TextSplitTransformer(chunkSize: 100, overlap: 20),
48-
],
44+
indexer: new Indexer(
45+
vectorizer: $vectorizer,
46+
store: $store,
47+
transformers: [
48+
new TextSplitTransformer(chunkSize: 100, overlap: 20),
49+
],
50+
),
4951
);
5052

51-
$indexer->index();
53+
$ingester->ingest();
5254

5355
$vector = $vectorizer->vectorize('machine learning artificial intelligence');
5456
$results = $store->query($vector);
Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
use Symfony\AI\Store\Document\Transformer\TextSplitTransformer;
1515
use Symfony\AI\Store\Document\Vectorizer;
1616
use Symfony\AI\Store\Indexer;
17+
use Symfony\AI\Store\Ingester;
1718
use Symfony\AI\Store\InMemory\Store as InMemoryStore;
1819
use Symfony\Component\HttpClient\HttpClient;
1920

@@ -22,20 +23,21 @@
2223
$platform = PlatformFactory::create(env('OPENAI_API_KEY'), http_client());
2324
$store = new InMemoryStore();
2425
$vectorizer = new Vectorizer($platform, 'text-embedding-3-small');
25-
$indexer = new Indexer(
26+
$ingester = new Ingester(
2627
loader: new RssFeedLoader(HttpClient::create()),
27-
vectorizer: $vectorizer,
28-
store: $store,
29-
source: [
30-
'https://feeds.feedburner.com/symfony/blog',
31-
'https://www.tagesschau.de/index~rss2.xml',
32-
],
33-
transformers: [
34-
new TextSplitTransformer(chunkSize: 500, overlap: 100),
35-
],
28+
indexer: new Indexer(
29+
vectorizer: $vectorizer,
30+
store: $store,
31+
transformers: [
32+
new TextSplitTransformer(chunkSize: 500, overlap: 100),
33+
],
34+
)
3635
);
3736

38-
$indexer->index();
37+
$ingester->ingest([
38+
'https://feeds.feedburner.com/symfony/blog',
39+
'https://www.tagesschau.de/index~rss2.xml',
40+
]);
3941

4042
$vector = $vectorizer->vectorize('Week of Symfony');
4143
$results = $store->query($vector);
Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
use Symfony\AI\Store\Document\Transformer\TextTrimTransformer;
1818
use Symfony\AI\Store\Document\Vectorizer;
1919
use Symfony\AI\Store\Indexer;
20+
use Symfony\AI\Store\Ingester;
2021
use Symfony\AI\Store\InMemory\Store as InMemoryStore;
2122
use Symfony\Component\Uid\Uuid;
2223

@@ -56,18 +57,19 @@
5657
new TextContainsFilter('SPAM:', caseSensitive: true),
5758
];
5859

59-
$indexer = new Indexer(
60+
$ingester = new Ingester(
6061
loader: new InMemoryLoader($documents),
61-
vectorizer: $vectorizer,
62-
store: $store,
63-
source: null,
64-
filters: $filters,
65-
transformers: [
66-
new TextTrimTransformer(),
67-
],
62+
indexer: new Indexer(
63+
vectorizer: $vectorizer,
64+
store: $store,
65+
filters: $filters,
66+
transformers: [
67+
new TextTrimTransformer(),
68+
],
69+
),
6870
);
6971

70-
$indexer->index();
72+
$ingester->ingest();
7173

7274
$vector = $vectorizer->vectorize('technology artificial intelligence');
7375
$results = $store->query($vector);

0 commit comments

Comments
 (0)