Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
91a2457
upload: refactor choices - new statuses, validation categories and re…
robertatakenaka Feb 18, 2026
b6aaff4
upload: migrate PermissionHelper to wagtail_modeladmin and add analys…
robertatakenaka Feb 18, 2026
6e84e3d
upload: migrate ButtonHelper to wagtail_modeladmin and refactor butto…
robertatakenaka Feb 18, 2026
f80d9e2
Initial plan
Copilot Feb 18, 2026
6cba944
Add Company, JournalTeamMember, CompanyTeamMember, and JournalCompany…
Copilot Feb 18, 2026
713b562
Fix wagtail_modeladmin import and add migration for new team models
Copilot Feb 18, 2026
4f7d074
Fix linting issues in team app (remove unused imports, fix f-strings)
Copilot Feb 18, 2026
686515c
Add VisualIdentityMixin with url/logo fields and certified_since to C…
Copilot Feb 18, 2026
323c68e
Add personal_contact field to Company and migrate from ModelAdmin to …
Copilot Feb 18, 2026
f9b9369
Initial plan
Copilot Feb 19, 2026
d65bb47
Add role field to CollectionTeamMember with manager/member profiles
Copilot Feb 19, 2026
67d3b7c
Initial plan
Copilot Feb 19, 2026
022d9d2
Convert ModelAdmin to SnippetViewSet in all wagtail_hooks files
Copilot Feb 19, 2026
66f1a63
Initial plan
Copilot Feb 19, 2026
b944b3a
Add atomic = False to migration to fix PostgreSQL trigger events error
Copilot Feb 19, 2026
9461f19
Initial plan
Copilot Feb 19, 2026
f74a914
Implement get_queryset in collection SnippetViewSets with access control
Copilot Feb 19, 2026
4887fce
Revert changes to collection/utils.py; move helpers into wagtail_hook…
Copilot Feb 19, 2026
7c353e7
Use get_user_membership_ids from team.models in collection wagtail_hooks
Copilot Feb 19, 2026
e37af7a
Revert get_user_membership_ids from team/models.py; restore private h…
Copilot Feb 19, 2026
38db829
Use dict-based get_user_membership_ids in collection wagtail_hooks ge…
Copilot Feb 19, 2026
f22cd7c
Initial plan
Copilot Feb 19, 2026
38c885a
Implement get_queryset in IssueSnippetViewSet and TOCSnippetViewSet
Copilot Feb 19, 2026
ca45a45
Refactor: extract get_user_membership_ids to team/models for reuse
Copilot Feb 19, 2026
4c81f78
Resolve journals from company contracts in get_user_membership_ids
Copilot Feb 19, 2026
5ad306f
Initial plan
Copilot Feb 19, 2026
f695132
Fix duplicate MinioConfiguration snippet registration in collection/w…
Copilot Feb 19, 2026
1198f17
Import MinioConfigurationViewSet from files_storage in CollectionView…
Copilot Feb 19, 2026
d9d09e6
Replace get_user_membership_ids with priority-based implementation in…
Copilot Feb 19, 2026
96d1992
refactor: move ClassicWebsiteConfigurationViewSet para o app de migra…
robertatakenaka Feb 22, 2026
9406f7b
refactor: remove campo bucket_app_subdir do MinioConfigurationViewSet
robertatakenaka Feb 22, 2026
262fb46
refactor: remove ClassicWebsiteConfigurationViewSet do grupo de migração
robertatakenaka Feb 22, 2026
78d3803
refactor: simplifica filtragem de permissões removendo redundância de…
robertatakenaka Feb 22, 2026
f0d6204
fix: atualiza url reversa para a listagem de coleções no wagtail snip…
robertatakenaka Feb 22, 2026
cb5a4e0
refactor: substitui formulários customizados pelo CoreAdminModelForm …
robertatakenaka Feb 22, 2026
7951061
feat: padroniza ViewSets de team com CommonControlFieldCreateView e C…
robertatakenaka Feb 22, 2026
2597b1d
Add commands to Makefike
patymori Feb 11, 2026
c725073
Change production compose
patymori Feb 11, 2026
f70554d
Remove Celery Worker service container name
patymori Feb 11, 2026
94aeb50
Fix Makefile to not use worker container name
patymori Feb 11, 2026
470d68a
Set default value to Makefile's target var numworkers
patymori Feb 11, 2026
46dba0f
Improvements in Makefile commands
patymori Feb 12, 2026
817f476
Initial plan
Copilot Mar 16, 2026
b549431
Fix TypeError in get_contribs when affiliation lacks original and org…
Copilot Mar 16, 2026
f940b29
Initial plan
Copilot Mar 16, 2026
68e7f71
Fix KeyError in XMLError.get_numbers() for unexpected status/reaction…
Copilot Mar 16, 2026
c0218c2
Skip entries with unexpected status values entirely in XMLError.get_n…
Copilot Mar 16, 2026
a704216
Remove total_ok and total_ukn from _get_numbers() — only expected err…
Copilot Mar 16, 2026
0e3008f
Initial plan
Copilot Mar 16, 2026
63a0e46
Fix task_load_records_from_counter_dict: handle JSONDecodeError in fe…
Copilot Mar 16, 2026
a2877b5
Address code review: add clarifying comment for page increment in exc…
Copilot Mar 16, 2026
8d0636f
feat(pid_provider): adiciona sps_pkg_name e deprecated_sps_pkg_name a…
robertatakenaka Mar 16, 2026
698e4a8
fix(pid_provider): torna pid_v3 opcional em get_by_pid_v3
robertatakenaka Mar 16, 2026
cc448e0
Corrige a coleta de dados de journal via api do core e atribuição de …
robertatakenaka Mar 16, 2026
ee6f038
Potential fix for pull request finding
robertatakenaka Mar 17, 2026
bf6e525
Initial plan
Copilot Mar 17, 2026
dceb13a
Strip <br> HTML tags from journal contact address in publication payload
Copilot Mar 17, 2026
49c390b
Simplify strip logic in _clean_br_tags method
Copilot Mar 17, 2026
2dd8198
Initial plan
Copilot Mar 17, 2026
ceea9c4
Fix Unicode surrogate DataError by sanitizing data before saving to J…
Copilot Mar 17, 2026
3f0e2e0
Address code review: handle all surrogate types, restore try/except p…
Copilot Mar 17, 2026
efbcf2e
Initial plan
Copilot Mar 16, 2026
9ecc0a9
Fix duplicate Article records: add unique constraint on pid_v3 and ha…
Copilot Mar 16, 2026
0d194d6
Optimize duplicate cleanup using filter().exclude().delete() instead …
Copilot Mar 17, 2026
7593b89
Apply reviewer feedback: None guards, bulk delete in migration, norma…
Copilot Mar 17, 2026
1bb8199
Adiciona constantes e choices de PID status para rastreamento de arti…
robertatakenaka Mar 18, 2026
e78d442
Adiciona get_pid_list em ClassicWebsiteConfiguration e get_lines em M…
robertatakenaka Mar 18, 2026
afb9467
Adiciona campo pid_status ao modelo ArticleProc
robertatakenaka Mar 18, 2026
c6aa344
Adiciona ClassicWebsiteArticlePidTracker e função track_classic_websi…
robertatakenaka Mar 18, 2026
0a5a6ec
Adiciona task Celery task_track_classic_website_article_pids
robertatakenaka Mar 18, 2026
d393031
Adiciona pid_status ao list_display e list_filter de ArticleProcViewSet
robertatakenaka Mar 18, 2026
f777381
Corrige migration.choices e adiciona proc/migrations/0013_articleproc…
robertatakenaka Mar 18, 2026
29ccbc5
Refactora ClassicWebsiteArticlePidTracker: otimiza queries e simplifi…
robertatakenaka Mar 18, 2026
ddfe0b0
Remove parâmetro force_update de task_track_classic_website_article_pids
robertatakenaka Mar 18, 2026
66ff968
Atualiza scielo_migration 1.10.7
robertatakenaka Mar 18, 2026
c8c638e
Atualiza packtools 4.16.1
robertatakenaka Mar 18, 2026
e02ad02
Initial plan
Copilot Mar 16, 2026
823e485
Fix supplementary material upload failing when file has no extension
Copilot Mar 16, 2026
d41be8b
Address code review: optimize os.path.splitext usage in migration code
Copilot Mar 16, 2026
a672d1d
Fix supplementary material registered without extension in basename
Copilot Mar 17, 2026
c9c6771
Initial plan
Copilot Mar 19, 2026
3c6ef11
Fix duplicate articles in Wagtail admin by adding .distinct() to quer…
Copilot Mar 19, 2026
7fd5e7c
Initial plan
Copilot Mar 19, 2026
f3ae709
Add detection and removal of migrated articles with invalid PID v2
Copilot Mar 19, 2026
d417e42
Use MigratedArticle.document.order (v121) instead of Article.position
Copilot Mar 19, 2026
861a603
Consolidate article removals into single unified operation
Copilot Mar 19, 2026
433a3cc
Apply review feedback: fix N+1 query, normalize suffix logging, remov…
Copilot Mar 19, 2026
bb1c248
Initial plan
Copilot Mar 18, 2026
64d8865
Improve package reception flow: use local journal/issue data first, t…
Copilot Mar 18, 2026
e2a1ee2
Add tests for upload controller local-first journal/issue lookup flow
Copilot Mar 18, 2026
197c7e2
Address code review: use Portuguese comments to match codebase conven…
Copilot Mar 18, 2026
baa18c4
Refactor: replace _check_journal/_check_issue functions with JournalD…
Copilot Mar 18, 2026
042e905
Fix: use item.publication_year in similar issues, clean up debug logg…
Copilot Mar 18, 2026
5ec8bdc
Refactor: move JournalDataChecker/IssueDataChecker to proc/source_cor…
Copilot Mar 18, 2026
916d5d4
Fix test: remove stale count() mock, use exists() to match implementa…
Copilot Mar 18, 2026
4988dc3
Absorb ensure_journal_proc_exists/ensure_issue_proc_exists into check…
Copilot Mar 18, 2026
a337c83
Remove redundant from_xmltree overrides from upload subclasses
Copilot Mar 19, 2026
cb4e1fa
Extract BaseDataChecker with common get_or_fetch/refresh using self.m…
Copilot Mar 19, 2026
408cfcd
Make BaseDataChecker an abstract base class with @abstractmethod
Copilot Mar 19, 2026
f94e575
Considera que MigratedArticle possa ser criado sem data
robertatakenaka Mar 22, 2026
00e6f85
Corrige get_queryset
robertatakenaka Mar 22, 2026
49601e3
Atualiza versão para v2.12.1rc
robertatakenaka Mar 23, 2026
e7016ef
Corrige construção de URLs no OPACHarvester
robertatakenaka Mar 23, 2026
2a285de
Adiciona parâmetro verify configurável na PublicationAPI
robertatakenaka Mar 23, 2026
d1ffa8e
Adiciona parâmetro verify configurável no PidProviderAPIClient
robertatakenaka Mar 23, 2026
7b4fcd2
Adiciona XMLURLViewSet ao painel administrativo do PidProvider
robertatakenaka Mar 23, 2026
75a2230
Adiciona parâmetro verify na task de press release
robertatakenaka Mar 23, 2026
f909a53
Adiciona parâmetro verify no script de carga de press releases
robertatakenaka Mar 23, 2026
1daeacd
Adiciona parâmetro verify nas tasks de publicação
robertatakenaka Mar 23, 2026
5c6d8ef
Adiciona agendamento de press releases e parâmetro verify no scheduler
robertatakenaka Mar 23, 2026
dff92ee
Corrige atualização do registro e adiciona logging de exceções no Pid…
robertatakenaka Mar 23, 2026
d80c35a
Adiciona parâmetro 'stop' para limitar coleta em task_load_records_fr…
robertatakenaka Mar 23, 2026
5ada594
Inclui artigos com status pendente na seleção de issues a processar
robertatakenaka Mar 23, 2026
354562d
Initial plan
Copilot Mar 23, 2026
6bb6692
docs: add guide for task_load_records_from_counter_dict in EN, ES, PT-BR
Copilot Mar 23, 2026
1bb77d8
docs: convert guides to markdown, add PID v3 source warning, fix opac…
Copilot Mar 23, 2026
51b024a
docs: move guide files into docs/pid_provider/ subdirectory
Copilot Mar 24, 2026
731598a
Initial plan
Copilot Mar 23, 2026
a09cf38
Add documentation guides for task_track_classic_website_article_pids …
Copilot Mar 23, 2026
dde585b
Move guides to docs/processing/, rewrite purpose to emphasize migrati…
Copilot Mar 23, 2026
36c572f
Adiciona script de instalação/atualização
patymori Mar 11, 2026
0947691
Melhorias no install.sh
patymori Mar 11, 2026
e78d3c7
Potential fix for pull request finding
patymori Mar 17, 2026
86fbadb
Correções no script install.sh
patymori Mar 17, 2026
92199bd
Potential fix for pull request finding
patymori Mar 23, 2026
9504722
Potential fix for pull request finding
patymori Mar 23, 2026
3f015ed
Refatora migrations de article: remove AlterField de pid_v3 da 0008 e…
robertatakenaka Mar 24, 2026
80599fb
Reorganiza migrations de journal: remove JournalTOC, merges e migrati…
robertatakenaka Mar 24, 2026
5b35035
Reorganiza migrations de publication: substitui 0004 e ajusta dependê…
robertatakenaka Mar 24, 2026
1bd0f28
Atualiza dependências de journal nas migrations de upload para nova 0005
robertatakenaka Mar 24, 2026
1aa7057
Adiciona migration de rename de índice em team
robertatakenaka Mar 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ ifneq ($(shell docker compose version 2>/dev/null),)
DOCKER_COMPOSE=docker compose
else
DOCKER_COMPOSE=docker-compose
DOCKER_COMPATIBILITY=--compatibility
endif

help: ## Show this help
Expand Down Expand Up @@ -47,6 +48,10 @@ build_no_cache: ## Build app using $(compose) --no-cache
up: ## Start app using $(compose)
$(DOCKER_COMPOSE) -f $(compose) up -d

up_scale: ## Start app using $(compose) and scaling worker up to $(numworkers)
$(eval numworkers ?= 1)
$(DOCKER_COMPOSE) $(DOCKER_COMPATIBILITY) -f $(compose) up -d --scale celeryworker=$(numworkers)

logs: ## See all app logs using $(compose)
$(DOCKER_COMPOSE) -f $(compose) logs -f

Expand All @@ -58,6 +63,12 @@ restart:
ps: ## See all containers using $(compose)
$(DOCKER_COMPOSE) -f $(compose) ps

top: ## See docker top using $(compose)
$(DOCKER_COMPOSE) -f $(compose) top

stats: ## See docker stats using $(compose)
$(DOCKER_COMPOSE) -f $(compose) stats

rm: ## Remove all containers using $(compose)
$(DOCKER_COMPOSE) -f $(compose) rm -f

Expand Down Expand Up @@ -149,7 +160,7 @@ volume_down: ## Remove all volume
$(DOCKER_COMPOSE) -f $(compose) down -v

clean_celery_logs:
@sudo truncate -s 0 $$(docker inspect --format='{{.LogPath}}' upload_production_celeryworker)
@sudo truncate -s 0 $$(docker inspect --format='{{.LogPath}}' $$($(DOCKER_COMPOSE) -f $(compose) ps -q celeryworker))

exclude_upload_production_django: ## Exclude all productions containers
@if [ -n "$$(docker images --format '{{.Repository}}:{{.Tag}}' | grep 'infrascielo/upload' | grep -v 'upload_production_postgres')" ]; then \
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
v3.0.0rc20
v2.12.1rc
41 changes: 41 additions & 0 deletions article/migrations/0008_add_unique_pid_v3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Generated by Django 5.2.3 on 2026-03-16 23:00

from django.db import migrations, models


def remove_duplicate_articles(apps, schema_editor):
"""Remove duplicate Article records with the same pid_v3, keeping the most recently updated one."""
Article = apps.get_model("article", "Article")
from django.db.models import Count

# Normalize empty strings to NULL so the unique constraint ignores them
Article.objects.filter(pid_v3="").update(pid_v3=None)

duplicates = (
Article.objects.values("pid_v3")
.exclude(pid_v3__isnull=True)
.annotate(count=Count("id"))
.filter(count__gt=1)
)
for dup in duplicates:
pid_v3 = dup["pid_v3"]
keep = Article.objects.filter(pid_v3=pid_v3).order_by("-updated").first()
if keep:
Article.objects.filter(pid_v3=pid_v3).exclude(pk=keep.pk).delete()


class Migration(migrations.Migration):
dependencies = [
("article", "0007_alter_article_options_article_first_pubdate_iso"),
]

operations = [
migrations.RunPython(
remove_duplicate_articles,
migrations.RunPython.noop,
),
migrations.RemoveIndex(
model_name="article",
name="article_art_pid_v3_2370cc_idx",
),
]
19 changes: 19 additions & 0 deletions article/migrations/0009_alter_article_pid_v3_unique.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Generated by Django 5.2.3 on 2026-03-16 23:00

from django.db import migrations, models


class Migration(migrations.Migration):
dependencies = [
("article", "0008_add_unique_pid_v3"),
]

operations = [
migrations.AlterField(
model_name="article",
name="pid_v3",
field=models.CharField(
blank=True, max_length=23, null=True, unique=True, verbose_name="PID v3"
),
),
]
159 changes: 156 additions & 3 deletions article/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ class Article(ClusterableModel, CommonControlField):
SPSPkg, blank=True, null=True, on_delete=models.SET_NULL
)
# PID v3
pid_v3 = models.CharField(_("PID v3"), max_length=23, blank=True, null=True)
pid_v3 = models.CharField(_("PID v3"), max_length=23, blank=True, null=True, unique=True)
pid_v2 = models.CharField(_("PID v2"), max_length=23, blank=True, null=True)

# Article type
Expand Down Expand Up @@ -124,7 +124,6 @@ class Article(ClusterableModel, CommonControlField):

class Meta:
indexes = [
models.Index(fields=["pid_v3"]),
models.Index(fields=["status"]),
]
ordering = ["position", "fpage", "-first_pubdate_iso"]
Expand Down Expand Up @@ -205,7 +204,15 @@ def article_langs(self):
@classmethod
def get(cls, pid_v3):
if pid_v3:
return cls.objects.get(pid_v3=pid_v3)
try:
return cls.objects.get(pid_v3=pid_v3)
except cls.MultipleObjectsReturned:
qs = cls.objects.filter(pid_v3=pid_v3).order_by("-updated")
obj = qs.first()
if obj is None:
raise cls.DoesNotExist
qs.exclude(pk=obj.pk).delete()
return obj
raise ValueError("Article.get requires pid_v3")

@classmethod
Expand Down Expand Up @@ -470,6 +477,152 @@ def get_repeated_items(cls, field_name, journal=None):
.values_list(field_name, flat=True)
)

@staticmethod
def has_valid_pid_v2(pid_v2, order):
"""
Check if pid_v2 last 5 digits match the order value
(from MigratedArticle.document.order / v121) padded with zeros.
"""
try:
if not pid_v2 or not order:
return True
if len(pid_v2) < 5:
return True
expected_suffix = str(int(str(order).strip())).zfill(5)
actual_suffix = pid_v2[-5:]
return actual_suffix == expected_suffix
except (TypeError, IndexError, ValueError):
return True

@classmethod
def exclude_articles_with_invalid_pid_v2(cls, journal=None):
"""
Find and delete migrated articles whose pid_v2 last 5 digits
don't match the order (v121) from MigratedArticle.document.order.
Uses ArticleProc.migrated_data to access the migration data.
Only applies to migrated articles.
"""
from proc.models import ArticleProc

filters = {
"migrated_data__isnull": False,
"sps_pkg__isnull": False,
}
if journal:
filters["issue_proc__journal_proc__journal"] = journal

article_procs = ArticleProc.objects.filter(
**filters
).select_related("migrated_data", "sps_pkg")

# Bulk-fetch Article records to avoid N+1 queries via ArticleProc.article
sps_pkg_id_list = [
ap.sps_pkg_id for ap in article_procs if ap.sps_pkg_id
]
articles_by_sps_pkg = {}
if sps_pkg_id_list:
for article in Article.objects.filter(
sps_pkg_id__in=sps_pkg_id_list
).only("id", "pid_v2", "sps_pkg_id", "pp_xml_id"):
articles_by_sps_pkg[article.sps_pkg_id] = article

events = []
sps_pkg_ids = set()
pp_xml_ids = set()
article_ids = []

for article_proc in article_procs:
try:
article = articles_by_sps_pkg.get(article_proc.sps_pkg_id)
if not article or not article.pid_v2:
continue

order = article_proc.migrated_data.document.order
if not order:
continue

if not cls.has_valid_pid_v2(article.pid_v2, order):
try:
expected_suffix = str(int(str(order).strip())).zfill(5)
except (TypeError, ValueError):
expected_suffix = str(order)
events.append(
f"Invalid pid_v2: {article.pid_v2} "
f"(order={order}, "
f"expected suffix={expected_suffix}, "
f"actual suffix={article.pid_v2[-5:]})"
)
article_ids.append(article.id)
if article.sps_pkg_id:
sps_pkg_ids.add(article.sps_pkg_id)
if article.pp_xml_id:
pp_xml_ids.add(article.pp_xml_id)
except Exception as e:
logging.exception(
f"Error checking pid_v2 for ArticleProc {article_proc}: {e}"
)

if not article_ids:
events.append("No migrated articles with invalid pid_v2 found")
return events

with transaction.atomic():
deleted_articles, _ = cls.objects.filter(id__in=article_ids).delete()
events.append(f"Articles deletados: {deleted_articles}")

if sps_pkg_ids:
deleted_sps, _ = SPSPkg.objects.filter(id__in=sps_pkg_ids).delete()
events.append(f"SPSPkg deletados: {deleted_sps}")

if pp_xml_ids:
deleted_pp, _ = PidProviderXML.objects.filter(id__in=pp_xml_ids).delete()
events.append(f"PidProviderXML deletados: {deleted_pp}")

return events

@classmethod
def exclude_inconvenient_articles(cls, journal, user, timeout=None):
"""
Remove all inconvenient article records in a unified operation:
1. Migrated articles with invalid pid_v2 (suffix doesn't match order from v121)
2. Duplicate articles (repeated pid_v2 or sps_pkg_name)
"""
results = {
"events": [],
"numbers": {},
"exceptions": [],
}

try:
events = cls.exclude_articles_with_invalid_pid_v2(journal)
results["events"].extend(events)
except Exception as e:
results["exceptions"].append(
{
"exclude_articles_with_invalid_pid_v2": str(e),
"traceback": traceback.format_exc(),
}
)

for field_name in ("pid_v2", "sps_pkg__sps_pkg_name"):
repeated_items = cls.get_repeated_items(field_name, journal)
results["numbers"][f"repeated_by_{field_name}"] = repeated_items.count()
for repeated_value in repeated_items:
try:
events = cls.exclude_repetitions(
user, field_name, repeated_value, timeout=timeout
)
results["events"].extend(events)
except Exception as e:
results["exceptions"].append(
{
f"repeated_by_{field_name}": repeated_value,
"traceback": traceback.format_exc(),
}
)

return results

@classmethod
def select_articles(cls, journal_id_list=None, issue_id_list=None):
kwargs = {}
Expand Down
Loading