docs: estrutura Wiki de tutoriais e configuração de APIs de IA#36
docs: estrutura Wiki de tutoriais e configuração de APIs de IA#36Rossi-Luciano wants to merge 50 commits intoscieloorg:mainfrom
Conversation
… de nuevas apps y aumento del límite de campos
…s, textos con idioma y manejo flexible de fechas
…ón de referencias
…s y ampliación de tipos soportados (confproc, full_text, etc.)
…istas de búsqueda, utilidades y hooks de Wagtail
…o, utilidades y hooks de Wagtail
…ones OMML a MathML
…es de inferencia, tareas y hooks de Wagtail
…s de procesamiento de datos
…ial.py y eliminación de migraciones intermedias
…n de Django y traducción de verbose_name a inglés
Corrige el tipo de excepción para responder 404 cuando el registro no existe.
…nlaces Reduce ruido en logs y mantiene la función enfocada a su retorno.
Mejora legibilidad y buenas prácticas de manejo de errores.
…a prompt de referencias Se agregan comillas a campos textuales y se corrigen comas/keys para evitar errores de parseo del prompt.
Permite traducción de 'Mixed Citation' y 'Rating from 1 to 10'.
…eference status' (incluye migraciones)
- function_llama passou a ser LlamaInputSettings em llama.py - generic_llama passou a ser llama.py com LlamaService
There was a problem hiding this comment.
Pull Request Overview
This pull request introduces a comprehensive document markup and XML generation system for processing DOCX files and managing references. The PR adds new applications (markup_doc and model_ai) with AI-powered metadata extraction, reference parsing, and XML/HTML generation capabilities. Key changes include renaming menu identifiers from xml_manager to xml_files and xml_manager admin group consolidation, adding new dependencies for AI processing (Google Generative AI, python-docx, langid), and implementing a complete workflow for converting DOCX documents to SciELO-compliant XML.
Key Changes
- Added
markup_docapp with DOCX processing, AI-based labeling, XML generation, and SciELO package creation - Added
model_aiapp for managing LLM models (Llama/Gemini) with download capabilities - Renamed XML manager menu from
xml_managertoxml_filesand consolidated menu structure - Added new package dependencies: google-generativeai, python-docx, and langid
Reviewed Changes
Copilot reviewed 59 out of 70 changed files in this pull request and generated 91 comments.
Show a summary per file
| File | Description |
|---|---|
| requirements/base.txt | Added AI processing dependencies (google-generativeai, langid, python-docx) |
| xml_manager/wagtail_hooks.py | Renamed menu identifiers and consolidated menu structure for XML management |
| reference/wagtail_hooks.py | Refactored import statements and renamed admin class with menu order adjustment |
| reference/models.py | Added ReferenceStatus enum and replaced estatus with status field |
| reference/marker.py | Updated imports to use new model_ai.llama module |
| reference/data_utils.py | Enhanced error handling and updated to use ReferenceStatus enum |
| model_ai/* | New app for managing AI models with Llama/Gemini integration |
| markup_doc/* | New app for DOCX processing, metadata extraction, and XML generation |
| markuplib/* | New library for DOCX processing and OMML to MathML conversion |
Comments suppressed due to low confidence (1)
markup_doc/sync_api.py:108
- Except block directly handles BaseException.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'uri': {'type': 'string'}, | ||
| 'access_date': {'type': 'string'}, | ||
| 'version': {'type': 'string'}, | ||
| "full_text": {"type": "integer"}, |
There was a problem hiding this comment.
The type for 'full_text' should be 'string', not 'integer'. This field contains textual reference content, not numeric data.
| # FIXME: Hardcoded model name | ||
| model = genai.GenerativeModel('models/gemini-2.0-flash') |
There was a problem hiding this comment.
The Gemini model name is hardcoded. Consider making this configurable through the LlamaModel database entry or environment variable to support different model versions and avoid requiring code changes for model updates.
| except: | ||
| print('**ERROR url') | ||
| print(url) | ||
| url = None |
There was a problem hiding this comment.
Bare except clause catches all exceptions including SystemExit and KeyboardInterrupt. Use except Exception: instead and consider logging the actual exception for debugging.
There was a problem hiding this comment.
Trocar print por logging e inserir uma mensagem mais descritiva do error.
| except Exception: | ||
| # si no hay match, dejarlo como está | ||
| pass |
There was a problem hiding this comment.
Silent exception handling without logging makes debugging difficult. Consider logging the exception to help diagnose image lookup failures.
| }); | ||
|
|
||
| document.addEventListener("DOMContentLoaded", function () { | ||
| const journalInput = document.querySelector("#id_journal"); |
There was a problem hiding this comment.
Unused variable journalInput.
| } | ||
| stream_data.append(obj.copy()) | ||
|
|
||
| for i, auth in enumerate(output['authors']): |
There was a problem hiding this comment.
Nested for statement uses loop variable 'i' of enclosing for statement.
| } | ||
| stream_data.append(obj.copy()) | ||
|
|
||
| for i, aff in enumerate(output['affiliations']): |
There was a problem hiding this comment.
Nested for statement uses loop variable 'i' of enclosing for statement.
| else: | ||
| break | ||
|
|
||
| for i, val in enumerate(vals[1:], start=1): |
There was a problem hiding this comment.
Nested for statement uses loop variable 'i' of enclosing for statement.
| and b.value.get('label') == '<kwd-group>' | ||
| ] | ||
|
|
||
| for i, val in enumerate(vals): |
There was a problem hiding this comment.
Nested for statement uses loop variable 'i' of enclosing for statement.
| ) | ||
|
|
||
| # Respuesta HTTP | ||
| with open(zip_path, "rb") as fp: |
There was a problem hiding this comment.
File may not be closed if an exception is raised.
- Adiciona scielo_xml_tools.yml com novos caminhos de volume - Move volumes para estrutura ../markup_data/ - Corrige nomes de containers no Makefile (markapi_local_*) - Adiciona .ipython/ ao .dockerignore - Adiciona huggingface-hub ao requirements/local.txt - Atualiza .gitignore para ignorar backups e arquivos temporários
O que esse PR faz?
Este PR implementa melhorias em documentação e configuração do sistema:
Documentação estruturada via Wiki:
Configuração de APIs de IA:
Padronização de código:
xml_manager/models.pyOnde a revisão poderia começar?
Documentação Wiki:
Arquivos de código:
/.envs/.local/.djangoLLAMA_ENABLED=True/xml_manager/models.pyXMLDocument,XMLDocumentPDF,XMLDocumentHTMLcreate()reformatadosComo este poderia ser testado manualmente?
Documentação:
Código:
python manage.py test xml_managerAlgum cenário de contexto que queira dar?
Documentação:
Quais são tickets relevantes?
NA
Referências