Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

18 search improvements #27

Draft
wants to merge 89 commits into
base: master
Choose a base branch
from
Draft

Conversation

patrick-austin
Copy link
Contributor

@patrick-austin patrick-austin commented Apr 4, 2022

Covers a variety of improvements to free text search functionality.
Closes #18 #19 #25 #26 #30

Functional changes

Added the ability to:

  • Search on over 2 billion documents
  • Apply sorting on specific entity fields
  • "Infinitely" search the data by using the searchAfter parameter
  • Request the facets of a search
  • Expanding the text field into more specific fields that reflect the ICAT schema to allow field targetting
  • Support for unit conversion on numeric Parameters

Architectural changes

In addition to changes to various functions on the main Lucene class:

  • Refactored mapping of fields to its own class DocumentMapping
  • Created FacetedDimension to store the results of a request to facet the documents
  • Refactored the Search from Lucene into its own dedicated SearchBucket class to handle building of complex queries and additional searching parameters (e.g. sorting)
  • ShardBucket added alongside IndexBucket in Lucene to reflect the distinction between a single directory of Lucene Documents (ShardBucket) and a way to access all Documents of one entity type contained in one or more directories (IndexBucket)

Interdependencies on other components

dependabot bot and others added 30 commits August 11, 2020 09:28
Bumps `luceneVersion` from 5.3.0 to 8.6.0.

Updates `lucene-core` from 5.3.0 to 8.6.0

Updates `lucene-queryparser` from 5.3.0 to 8.6.0

Updates `lucene-analyzers-common` from 5.3.0 to 8.6.0

Updates `lucene-join` from 5.3.0 to 8.6.0

Signed-off-by: dependabot[bot] <[email protected]>
…n-8.6.0

Dependabot/maven/lucene version 8.6.0 into 18
@patrick-austin patrick-austin requested a review from VKTB September 13, 2022 14:25
Copy link
Contributor

@VKTB VKTB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a lot going on here so I must admit that it's hard for me to understand everything but I tried my best.

I can see that you have refactored code into private methods to make some methods shorter but there are still a few methods (i.e. luceneFacetResult, luceneSearchResult etc in the Lucene class) that are long so have a look to see if you can shorten them.

src/main/java/org/icatproject/lucene/SearchBucket.java Outdated Show resolved Hide resolved
src/main/java/org/icatproject/lucene/SearchBucket.java Outdated Show resolved Hide resolved
src/main/java/org/icatproject/lucene/SearchBucket.java Outdated Show resolved Hide resolved
src/main/java/org/icatproject/lucene/SearchBucket.java Outdated Show resolved Hide resolved
src/main/java/org/icatproject/lucene/SearchBucket.java Outdated Show resolved Hide resolved
src/main/java/org/icatproject/lucene/Lucene.java Outdated Show resolved Hide resolved
src/main/java/org/icatproject/lucene/Lucene.java Outdated Show resolved Hide resolved
src/main/java/org/icatproject/lucene/Lucene.java Outdated Show resolved Hide resolved
src/main/java/org/icatproject/lucene/Lucene.java Outdated Show resolved Hide resolved
src/main/java/org/icatproject/lucene/IcatAnalyzer.java Outdated Show resolved Hide resolved
@patrick-austin patrick-austin requested a review from VKTB September 30, 2022 10:27
@patrick-austin
Copy link
Contributor Author

In addition to the smaller stuff (use of toString(), use of primitives, formatting etc.) have:

  • Refactored code out of luceneFacetResult, luceneSearchResult to reduce length (each has two new functions they call)
  • Changed more places uses objects instead of primitives (where we parse dates and used to handle default values of null, now use Long.MAX_VALUE and Long.MIN_VALUE
  • Replaced addField and addSort field with a new Field class, with equivalent methods. The special treatment that depends on the type of the field and whether it is facetable or sortable is done on an InnerField class which are created from either Json or the Lucene IndexableField depending on circumstances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Lucene Search Functionality
4 participants