Skip to content

Conversation

@chr-hertel
Copy link
Member

@chr-hertel chr-hertel commented Dec 15, 2025

Q A
Bug fix? no
New feature? yes
Docs? no
Issues Fix #911 ; fix #875
License MIT

This is not done, but rather to sketch the idea

This solves:

  • State in Indexer
  • Indexing from state with DocumentCollection
  • Enables factory for bundle to keep sources in indexer config (to be proven)
  • Opens up for more complex use cases like loading various types in one go, e.g. from one folder

Addressing Concerns from #997:

  • "We don't want to support everything. In my humble opinion, it lacks genericness."

    Users could actually quite easy implement their own source objects and loaders to throw them into the indexer - including custom entities or specific URLs, etc.

  • "Then, to me, Rss, Url, or Source are data objects, not stateless services. The naming is confusing."
    That's even the idea, yes - they are state objects to be handed over to the indexing process instead of being part of the constructor which lacks flexibility - (why we had the withSources)

  • "I'm not a fan of the Strategy Design Pattern" + "We are fighting this pattern so much in symfony/serializer to avoid performance issues!"

    I would argue that the performance challenge in serializer comes with object nesting and therefore the runtime complexity grows in a different way since we're mostly dealing with collections here - might be different with folders, but that's a different story - the slowest part of the indexing will be the vectorizing/http calls anyhow

  • "This class does too much. It should be split"

    Not sure i follow this reasoning since the amount of operations in total is the same, see sketch in [Store] Extract some code from Indexer to a dedicated class + Introduce Ingester #1138. "indexing" is our high level operation and the indexer itself acts as integrator of multiple operation services. this refactoring makes it a bit more visible now with shifting some loading logic.

cc @lyrixx @OskarStark

there are still quite some hypothesis and assumption in those line - but i think it helps to follow the idea

@chr-hertel chr-hertel added the Store Issues & PRs about the AI Store component label Dec 15, 2025
Comment on lines +36 to +40
$sources = TextFileLoader::createSource([
dirname(__DIR__, 2).'/fixtures/movies/gladiator.md',
dirname(__DIR__, 2).'/fixtures/movies/inception.md',
dirname(__DIR__, 2).'/fixtures/movies/jurassic-park.md',
]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also be like this:

Suggested change
$sources = TextFileLoader::createSource([
dirname(__DIR__, 2).'/fixtures/movies/gladiator.md',
dirname(__DIR__, 2).'/fixtures/movies/inception.md',
dirname(__DIR__, 2).'/fixtures/movies/jurassic-park.md',
]);
$sources = [
new TextFile(dirname(__DIR__, 2).'/fixtures/movies/gladiator.md'),
new TextFile(dirname(__DIR__, 2).'/fixtures/movies/inception.md'),
new TextFile(dirname(__DIR__, 2).'/fixtures/movies/jurassic-park.md'),
];
  • assumption is that factory will help with bundle config
  • validation could move from factory to constructor

Copy link
Member

@lyrixx lyrixx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool Cool! It's a good start.
I asked a few questions

use Psr\Log\NullLogger;
use Symfony\AI\Store\Document\EmbeddableDocumentInterface;
use Symfony\AI\Store\Document\FilterInterface;
use Symfony\AI\Store\Document\LoaderInterface;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface is not used anymore, it should be removed, shouldn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i get the point - it was renamed to SourceTypeLoaderInterface, but is not needed here anymore

}

public function index(array $options = []): void
public function index(iterable|SourceInterface $sources, array $options = []): void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so now, we must pass something when indexing? We loose the ability to run :

bin/console ai.store.index indexer_name

to index everything.

Or I missed something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that's open and needs to be tackled in the bundle 👍 will follow

foreach ($sources as $source) {
$loaderFound = false;
foreach ($this->sourceTypeLoaders as $sourceTypeLoader) {
if ($sourceTypeLoader->supports($source)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part I don't like.
We already have a strong coupling between a model, and its implementation.
We could use the DIC (service locator) to solve this.

$this->sourceTypeLoaders->get($source::class)->load($source);

To solve this we have a map like

[
    source class => loader,
    source class 2 => loader 2
]

I blogged about this pattern a while ago

It's quite simple and very performant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, especially without any additional argument or dynamic the supports(SourceInterface $source) is over the top - so is the dependency to the entire DIC component for that pattern.

will adopt to something slim like:

/**
 * @return class-string<SourceInterface>
 */
public static function supportedSource(): string;

so we could have a map in the loader that makes accessing the corresponding loader also quite efficient

$vectorizer = new Vectorizer($platform, 'text-embedding-3-small');
$indexer = new Indexer(
loader: new TextFileLoader(),
loader: new Loader([TextFileLoader::supportedSource() => new TextFileLoader()]),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also be

new Loader([TextFile::class => new TextFileLoader()]),

or we shift that part into the Loader implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature New feature Status: Needs Review Store Issues & PRs about the AI Store component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Store] Indexer is nice, but hard to use directly

3 participants