tarifin is a full-stack LLM-powered recipe assistant that enables users to request recipes via voice and receive personalized, culturally diverse, and health-conscious suggestions in real time β both as text and speech.
The system includes a fine-tuned Nous Hermes 2 - Mistral 7B, a Flask streaming API backend, and a Flutter mobile client supporting voice input/output (STT/TTS).
This is the home screen showing the list of saved or past recipe conversations. Users can tap on any item to view the full response or start a new request using the floating action button.
π¬ Example Prompt:
"All I have are chickpeas, carrots, and some tahini. I want to make a healthy but different dinner with these. What can I make?"
π§ Model Response:
A full recipe titled "Chickpea and Carrot Tahini Salad" including ingredients and step-by-step instructions.
The response is displayed using flutter_markdown and read aloud using flutter_tts. Input can be provided by voice using speech_to_text, making the experience entirely hands-free and user-friendly.
| Component | Specification |
|---|---|
| OS | Windows 11 Pro |
| Linux Subsystem | WSL2 (Ubuntu 22.04 LTS) |
| Python Env | venv-based isolated environment |
| CUDA Version | 11.8 |
| PyTorch | 2.2+ with CUDA support |
| Transformers | HuggingFace Transformers |
| TRL | trl (for SFTTrainer) |
| Component | Detail |
|---|---|
| GPU | NVIDIA RTX 4060 (Laptop) β 8 GB VRAM |
| CPU | Intel Core i5-12500H (12-Core Hybrid) |
| RAM | 16 GB DDR4 RAM |
- Path:
/model_files/data/all_data.jsonl - Size: 4,800 Alpaca-style training samples
- Each Sample Contains:
instruction: The userβs natural language requestinput: Optional contextoutput: A detailed, minimum 1000-word recipemetadata: Nutritional info, allergens, cuisine type, etc.
{
"instruction": "Suggest a low-calorie Turkish dinner for a diabetic patient",
"input": "",
"output": "To prepare a balanced Turkish meal for someone managing diabetes...",
"metadata": {
"calories": "430 kcal",
"diet": "diabetic-friendly",
"cuisine": "Turkish",
"allergens": "nut-free"
}
}- Base Model: Nous Hermes 2 - Mistral 7B
- Quantization: 4-bit NF4 (via
bitsandbytes) - Fine-tuning: LoRA (via
peft) +SFTTrainer
- Load and quantize model using NF4
- Apply LoRA (PEFT) for efficient training
- Filter and format dataset (
prompt + metadata + output) - Tokenize with
max_length = 1024 - Fine-tune using
SFTTrainerfor 2 epochs
To enhance learning dynamics, training was split into 3 progressive phases based on output length:
- Goal: Teach the model task format and cultural variability
- Result: Learned the question-answer pattern effectively
- Goal: Improve fluency and semantic consistency
- Result: Better contextual flow and structural awareness
- Goal: Handle complex, multi-step recipe generation
- Filter Applied:
len(output) >= 3000 - Result: Stable performance across lengthy, dense outputs
TrainingArguments(
output_dir="./output_longest",
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
num_train_epochs=2,
learning_rate=2e-4,
fp16=True,
save_strategy="steps",
save_steps=100,
save_total_limit=2,
logging_steps=10
)- Effective batch size: 4
- Checkpointing: every 100 steps, only 2 retained
- Precision: Mixed (fp16) for optimized memory usage
AutoTokenizer + AutoModel (Nous Hermes 2 - Mistral 7B)
β
4-bit quantization (NF4) + LoRA (PEFT)
β
Dataset loaded β long outputs filtered (β₯ 3000 words)
β
Prompt + Metadata β `text` field merged
β
Tokenizer applied (max length 1024)
β
Trained with SFTTrainer (2 epochs)
β
Saved to ./output_longest
- Training resumed over 3 stages using progressive dataset splits.
- Each phase resumed training from the previous checkpoint using
output_dir. - Model was re-saved after each phase using
.save_model().
After completing all fine-tuning stages, the training loss was monitored using trainer_state.json.
The plot below visualizes the loss trend across training steps:
- Initial loss was above 1.2, indicating complex generation at the start.
- A steady decline is observed throughout the training process.
- Final loss converged around 0.38β0.42, showing:
- Stable and effective fine-tuning
- No significant signs of overfitting
- Consistent generation quality even with long outputs
β After testing the model with various held-out prompts and unseen data, we confirmed that it produces rich, structured, and context-aware recipes β validating the success of the fine-tuning process.
File:
/model_files/gradio_exe.py
python model_files/gradio_exe.py- Token-wise streaming via
TextIteratorStreamer - Threaded generation with dynamic Markdown preview
- Real-time evaluation for developer convenience
File:
/model_files/app.py
cd model_files
python app.pyRequest:
{ "text": "Suggest a quick and healthy gluten-free Turkish lunch option" }Response:
- Content-Type:
text/plain - Token-wise streamed output using
yield
- Receive JSON request
- Use tokenizer + streamer in a separate thread
generate()streams output line-by-line via Flask
β The API runs at
http://localhost:5000/generate
After Flask API deployment, a lightweight Android app was developed using Flutter to provide seamless voice-based interaction.
- User speaks a recipe request
- Speech is converted to text via
speech_to_text - Text is POSTed to the Flask API
- Response is streamed back
- It is both rendered on screen and spoken aloud via
flutter_tts
| Package | Functionality |
|---|---|
speech_to_text |
Converts voice to text |
http |
Sends requests to backend API |
flutter_tts |
Text-to-speech playback of results |
flutter_markdown |
Rich text rendering for model output |
uuid |
Unique message/session identification |
User speaks into mic β STT (speech_to_text)
β
Text sent to Flask API β HTTP POST
β
Streaming response shown in Markdown
β
Result spoken aloud via TTS (flutter_tts)
main.dart: Entry point, handles STT/TTS logicchat_home.dart: UI + HTTP streaming integrationChatMessage,ChatSession: message model structure
MIT License Β© 2025 β Eren Yurtcu

