Skip to content

Have the Mica portal document sorts be case-insensitive #4545

@kazoompa

Description

@kazoompa

Case-Insensitive Sorting for Localized Fields

Problem

Sorting on string fields (e.g. name, acronym) is case-sensitive by default in Elasticsearch 8 — "Zebra" sorts before "alpha" due to byte ordering. This affects Network, Study, Dataset, and Variable entities.

Solution

Add a .sort sub-field with a lowercase_normalizer to sortable string fields in the ES index mapping. At query time, RQLFieldResolver routes sort requests to the .sort sub-field when it exists, detected dynamically via IndexFieldMapping.isSortable().

Changes

mica-search-es8

  • elasticsearch.yml: add lowercase_normalizer to the analysis.normalizer block.
  • AbstractIndexConfiguration: add createMappingWithAnalyzersAndSort (non-localized) and createLocalizedMappingWithAnalyzersAndSort (localized) mapping methods, and createMappingWithAnalyzersAndSortNonLocalized wrapper.
  • NetworkIndexConfiguration, StudyIndexConfiguration, DatasetIndexConfiguration: exclude name/acronym from addTaxonomyFields and map them explicitly with createLocalizedMappingWithAnalyzersAndSort.
  • VariableIndexConfiguration: use createMappingWithAnalyzersAndSortNonLocalized for name.
  • ESIndexer.IndexFieldMappingImpl: implement isSortable(fieldName) using JSONPath to check for a .sort sub-field in the live ES mapping.
  • RQLQuery.RQLBuilder: delegate resolveFieldForSort to RQLFieldResolver, used in RQLSortBuilder.processArgument.

mica-spi

  • IndexFieldMapping: add isSortable(String fieldName).
  • RQLFieldResolver: add resolveFieldForSort public method; update FieldData.Builder.sortable() to call indexFieldMapping.isSortable() instead of checking the localized vocabulary attribute, covering both localized and non-localized sortable fields.

Notes

  • Re-indexing required: mapping changes do not apply to existing indices. Drop and rebuild affected indices via the Mica admin re-index function after deploying.
  • mica-search-os2: the same gap exists in the OpenSearch plugin; RQLFieldResolver changes are shared via the SPI.
  • Future: the current hardcoded approach for built-in fields (name, acronym, label) is a temporary measure. The plan is to introduce a sortable attribute on taxonomy vocabularies as the authoritative source for which fields get a .sort sub-field in the ES mapping. A companion internal attribute will be introduced to prevent system-managed vocabularies from appearing in the Mica Administration UI. Once implemented, custom fields defined in user taxonomies will automatically get case-insensitive sorting by setting sortable: true on their vocabulary — with no code changes required.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions