Skip to content

Conversation

@grgr
Copy link

@grgr grgr commented Oct 29, 2025

What this does

This PR adds text-to-speech functionality. With it you can call for example "audio = RubyLLM.tts('Hello')".
Docs are extended in the core-functionality section.

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
    • For provider changes: Re-recorded VCR cassettes with bundle exec rake vcr:record[provider_name]
    • All tests pass: bundle exec rspec
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

@instance ||= new
end

def provider_for(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change intentional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no! I will change it back.

parse_image_response(response, model:)
end

def tts(input, model:, voice:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be named with a simple verb like the other ones? Maybe RubyLLM.speak?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RubyLLM.speak was my first idea as well. Than I thought RubyLLM.tts might be clearer for users.
Also there is an alias 'say' for 'ask' in lib/ruby_llm/active_record/chat_methods.rb.
And the difference between 'say' and 'speak' might be confusing.
So, if you are happier with 'speak', I could change it.

Copy link
Contributor

@tpaulshippy tpaulshippy Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is a bit confusing. Honestly as I said in discord, asking an LLM library to do this is a bit confusing in general. But I guess we have transcribe so, why not? Not sure what naming would be best. Up to @crmne ultimately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants