Skip to content

Add biomedical multimodal dataset preparation tools for Gemma fine-tu… #241 #247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SH20RAJ
Copy link

@SH20RAJ SH20RAJ commented Apr 5, 2025

…ning
#241
This commit addresses issue #210 by providing tools for preparing biomedical multimodal datasets with text, images, tables, and formulas for Gemma fine-tuning.

  • Add preprocess_pdfs.py for extracting content from PDFs
  • Add create_dataset.py for structuring the dataset
  • Add finetune_gemma.py for fine-tuning Gemma models
  • Add comprehensive documentation and requirements

The solution enables users to convert biomedical PDFs to a format suitable for Gemma fine-tuning while preserving the semantic relationships between text and non-text elements.

…ning

This commit addresses issue google-deepmind#210 by providing tools for preparing biomedical multimodal datasets with text, images, tables, and formulas for Gemma fine-tuning.

- Add preprocess_pdfs.py for extracting content from PDFs
- Add create_dataset.py for structuring the dataset
- Add finetune_gemma.py for fine-tuning Gemma models
- Add comprehensive documentation and requirements

The solution enables users to convert biomedical PDFs to a format suitable for Gemma fine-tuning while preserving the semantic relationships between text and non-text elements.
Copy link

google-cla bot commented Apr 5, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant