Skip to content

Cleanup Duplicate Previewed Finding Aids in Solr #1091

@amywieliczka

Description

@amywieliczka

I believe these finding aids were uploaded to the dashboard, previewed, an ark was minted for them, and then they were deleted, and re-uploaded (and a new different ark was minted for them). We managed to spot these previewed duplicates because of their lack of an oac_normalized_title field (the version still properly managed in the dashboard were properly re-indexed and now have an oac_normalized_title field).

Collection of California maps and guidebooks - published in oac4: https://oac4.cdlib.org/findaid/ark:/13030/c80p16sk/
REMOVE? - https://oac.cdlib.org/findaid/ark:/13030/c87h1sq5 - ark created 2025-07-24
REMOVE? - https://oac.cdlib.org/findaid/ark:/13030/c83r1207 - ark created 2025-07-24
KEEP? - https://oac.cdlib.org/findaid/ark:/13030/c80p16sk < this one is also here: https://dashboard.oac.cdlib.org/cdl-administration/findingaids/findingaid/?q=collection+of+california+maps+and+guidebooks < also oac4-old-files has this ark in prime2002 sc012.xml - ark created 2023-07-19 / ark modified 2025-07-11

Sam Green Papers - new to oac5
https://oac.cdlib.org/findaid/ark:/13030/c8rj4snd - ark created 2025-08-12
https://oac.cdlib.org/findaid/ark:/13030/c85m6dvq - ark created 2025-08-13
https://oac.cdlib.org/findaid/ark:/13030/c8ms41wm - ark created 2025-08-12
https://oac.cdlib.org/findaid/ark:/13030/c8nk3p6c < this one is also here: https://dashboard.oac.cdlib.org/cdl-administration/findingaids/findingaid/?q=sam+green+papers - m2913.xml is neither in prime2002 nor submission in oac4-old-files - ark created 2025-08-12

3rd and 4th Earls of Loudoun papers - new to oac5
https://oac.cdlib.org/findaid/ark:/13030/c8rb7crv - ark created 2025-08-15
https://oac.cdlib.org/findaid/ark:/13030/c83n2bhm < this one is also here: https://dashboard.oac.cdlib.org/cdl-administration/findingaids/findingaid/?q=3rd+and+4th+Earls+of+Loudoun+papers - loudoun.xml is neither in prime2002 nor submission in oac4-old-files - ark created 2025-08-16

We should probably delete these duplicates? What should happen to these arks that we presumably minted? We should also probably come up with a process to make sure this doesn't continue to happen.

# query solr for all records at level=Collection without the oac_normalized_title_ssm field: 
curl "$SOLR_URL/select" \
  --data-urlencode 'q=-oac_normalized_title_ssm:* AND level_ssim:"Collection"' \
  --data-urlencode 'fl=id,preview_ssi,oac_normalized_title_ssm,title_ssm,repository_ssim' \
  --data-urlencode 'facet=true' \
  --data-urlencode 'facet.field=repository_ssim' \
  --data-urlencode 'facet.mincount=1' \
  --data-urlencode 'rows=6'
{
  "responseHeader":{
    "status":0,
    "QTime":239,
    "params":{
      "q":"-oac_normalized_title_ssm:* AND level_ssim:\"Collection\"",
      "facet.field":"repository_ssim",
      "fl":"id,preview_ssi,oac_normalized_title_ssm,title_ssm,repository_ssim",
      "facet.mincount":"1",
      "rows":"6",
      "facet":"true"
    }
  },
  "response":{
    "numFound":6,
    "start":0,
    "numFoundExact":true,
    "docs":[{
      "id":"ark:/13030/c87h1sq5",
      "title_ssm":["Collection of California maps and guidebooks"],
      "preview_ssi":"true",
      "repository_ssim":["California State University, San Bernardino"]
    },{
      "id":"ark:/13030/c83r1207",
      "title_ssm":["Collection of California maps and guidebooks"],
      "preview_ssi":"true",
      "repository_ssim":["California State University, San Bernardino"]
    },{
      "id":"ark:/13030/c8rj4snd",
      "title_ssm":["Sam Green papers"],
      "preview_ssi":"true",
      "repository_ssim":["Stanford University, Manuscripts Division"]
    },{
      "id":"ark:/13030/c85m6dvq",
      "title_ssm":["Sam Green papers"],
      "preview_ssi":"true",
      "repository_ssim":["Stanford University, Manuscripts Division"]
    },{
      "id":"ark:/13030/c8rb7crv",
      "title_ssm":["3rd and 4th Earls of Loudoun papers"],
      "preview_ssi":"true",
      "repository_ssim":["Huntington Library, Manuscript Collections"]
    },{
      "id":"ark:/13030/c8ms41wm",
      "title_ssm":["Sam Green papers"],
      "preview_ssi":"true",
      "repository_ssim":["Stanford University, Manuscripts Division"]
    }]
  },
  "facet_counts":{
    "facet_queries":{ },
    "facet_fields":{
      "repository_ssim":["Stanford University, Manuscripts Division",3,"California State University, San Bernardino",2,"Huntington Library, Manuscript Collections",1]
    },
    "facet_ranges":{ },
    "facet_intervals":{ },
    "facet_heatmaps":{ }
  },
  "spellcheck":{
    "suggestions":[ ],
    "correctlySpelled":true
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions