Issue 1601 match partial page citations #2209
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a first pass at addressing #1601, the other half of freelawproject/eyecite#30.
I've broken up our
search_db_for_full_citation()
function into two paths: one for when we have full citation information (which I've kept essentially the same as it is currently, just reordered slightly) and one for when we're looking for a case with a missing page citation. In the latter scenario, I've made it only return a result if the cited case was published within 5 years of the citing case, to guard against false positives.The logic behind this is based on @brianwc's comment (#1226 (comment)):
However, unfortunately the U.S. reporter, for one, takes a preposterous amount of time to assign page numbers, so for SCOTUS cases at least I think we need to go beyond a 1-year search frame. According to https://www.supremecourt.gov/opinions/USReports.aspx, it seems that as of today the U.S. reporter has only finalized its volumes for up to the 2015 term. I put in the 5 year range before checking this, but maybe it needs to be even longer?? If this is only a problem with SCOTUS cases, maybe we should treat them differently than all others, but I don't know if other reporters also have this problem. Obviously, the longer we allow the date range to be, the more false positives there will be.
Incidentally, how do we get finalized citations into CL when they're assigned so late? E.g. this case does not have its U.S. citation (
575 U.S. 348
) in the system: https://www.courtlistener.com/opinion/2795278/rodriguez-v-united-states99a3bd3 is unrelated to the substance of this PR, but I realized as I was working on this that my factory changes in #2183 caused it to not be testing what we wanted anymore, so I fixed it.