Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find short case names and linkify them #186

Open
mlissner opened this issue Nov 27, 2024 · 8 comments
Open

Find short case names and linkify them #186

mlissner opened this issue Nov 27, 2024 · 8 comments
Assignees

Comments

@mlissner
Copy link
Member

If you have a full citation followed by a citation to simply "Roe", we should capture that second reference, and treat it like a supra reference.

There's some trickiness here, but we have some short case names in the DB, and others can be generated pretty well already. There's also spacey, which should be able to do entity extraction.

@flooie flooie moved this to Backlog Dec 16 - Dec 27th in Case Law Sprint Dec 16, 2024
@flooie flooie moved this from Backlog Dec 16 - Dec 27th to To Do in Case Law Sprint Dec 16, 2024
@flooie
Copy link
Contributor

flooie commented Dec 16, 2024

Here is a useful example.

In Roe v. Wade, 410 U.S. 113, 93 S.Ct. 705, 35 L.Ed.2d 147 (1973), the Supreme Court
held that the constitutional right of privacy was broad enough to encompass a woman's
decision whether or not to terminate her pregnancy. The Court specifically articulated
that an individual's right to control their reproductive function was a "fundamental" right.
Roe,410 U.S. 113 at 155.

from litowitz-v-litowitz

Our State Supreme Court has also recognized the fundamental right of an individual's reproductive autonomy. In State v. Koome, 84 Wn.2d 901, 904, 530 P.2d 260 (1975) the Washington Supreme Court recognized that the "constitutional rights of minors, including the right of privacy, are coextensive with those of adults." Therefore, the court held unconstitutional a Washington statute requiring parental consent for a minor to terminate her pregnancy because it abridged a fundamental freedom and was not justified by a compelling state interest. Koome, 84 Wn.2d at 909.

from litowitz-v-litowitz

@mlissner
Copy link
Member Author

That's good. I think there are others that just say Roe at 155, right, because I think we'd catch the examples you've provided so far.

@flooie flooie self-assigned this Dec 17, 2024
@flooie
Copy link
Contributor

flooie commented Dec 17, 2024

Related to #76

The solution seems clear to me now: we should enhance eyecite to search for SHORT_CASE_CITATIONS after identifying a full case citation.

Proposed Approach
1. Start with the Full Case Citation: Once a full citation is identified with either a plaintiff or defendant, we can reliably search for its shortened version.

r {PLAINTIFF or DEFENDANT NAME} at \d+

2.	Validation Rules:
•	The short citation must appear after the full citation.
•	The page number in the short citation must be greater than or equal to the page number in the full citation.  
    •       Would not apply to parallel citations found in full citation.

This approach ensures we can accurately associate short citations with their corresponding full case citations.

@mlissner what do you think.

@mlissner
Copy link
Member Author

Hm, that's good, but what about when there's no page number?

And are you envisioning a second pass with eyecite or that the first pass is able to accomplish all of this?

Can you do some pseudocode of how it'd work?

@mattdahl
Copy link
Contributor

If I may, I suggest expanding ShortCaseCitation to recognize the new pattern.

We already support this list of short forms:

eyecite/eyecite/models.py

Lines 472 to 484 in 5d3cf67

class ShortCaseCitation(CaseCitation):
"""Convenience class which represents a short form citation, i.e., the kind
of citation made after a full citation has already appeared. This kind of
citation lacks a full case name and usually has a different page number
than the canonical citation.
Examples:
```
Adarand, 515 U.S., at 241
Adarand, 515 U.S. at 241
515 U.S., at 241
```
"""

If we add regex for Adarand at 241 to this list, then we can do everything in one pass and handle the resolution using _resolve_shortcase_citation(). (I think I would put the logic for trying to match the short case name there.)

@flooie
Copy link
Contributor

flooie commented Dec 17, 2024

I envision something like this...

if citations are returned

    citations = []

    for i, token in citation_tokens:
        citation: CitationBase
        token_type = type(token)

        # CASE 1: Token is a CitationToken (i.e., a reporter, a law journal,
        # or a law).
        # In this case, first try extracting it as a standard, full citation,
        # and if that fails try extracting it as a short form citation.
        if token_type is CitationToken:
            citation_token = cast(CitationToken, token)
            if citation_token.short:
                citation = _extract_shortform_citation(words, i)
            else:
                citation = _extract_full_citation(words, i)
...

then we do a second pass in get_citations function

something like this

    if find_short_pincites:
        citations = identify_short_pincites(citations, plain_text)

pass in the citations found and for each fullcitecitation check build the regex based on the plaintiff/defendant.

something like this

    for citation in citations:
        if (isinstance(citation, FullCaseCitation)):
            following_text = plain_text[citation.token.end:]
            pattern, is_valid = _build_short_citation_regex(citation)
            if not pattern:
                log something like no plaintiff or defendant to build regex.
                continue
            for match in pattern.finditer(following_text):
                if is_valid(match):

                    start, end = match.span()
                    short_citation = ShortCaseCitation(
                        CitationToken(
                            match.group(0),
                            start,
                            end,
                            groups=citation.groups),
                            index=0,
                            span_start = start + citation.token.end,
                            span_end = end + citation.token.end,
                            metadata = {
                                "plaintiff": match.group("name"),
                                "pin_cite": match.group("page"),
                            }
                    )

I dont know the citation token stuff that well so forgive me if this is bad pseudo code.

@mattdahl I want to do it in one function call but we need to find plaintiff and defendant names first to find these short form citations.

@mlissner can you clarify what you mean with ones without page numbers? like just straight up references ... like In XYZ - I think that should be done on CL or in some other function.

@flooie
Copy link
Contributor

flooie commented Dec 18, 2024

@mattdahl - I think I wasn't clear in my response to you - I thought about using the _resolve_shortcase_citation method but because we don't know the name to use I thought it wasn't the correct option?

@mlissner
Copy link
Member Author

Sorry Bill, can you go up a level for me and start with something like:

text = get_text_from_db(opinion_id)

And end with the resolved short cases?

I'm not so interested in the internal code. I'm thinking more about the API that a user would interact with.

I think Matt is probably right that we can catch things like <TOKEN> at \d{1,4} pretty successfully. Maybe resolve_shorcase_citation would need the short case names to make this work, but those seem pretty straightforward to me.

I'll also suggest that we can't always figure out the short case names correctly, so ideally this would be robust to that. For example in this case, the case_name is:

Florida Citizens' Alliance, Inc. v. School Board of Indian River County

And the case_name_short is blank, indicating that we couldn't figure it out automatically. As a human, I'd suggest the short case name is probably "Citizens' Alliance". We should be able to figure that out if we're clever. They share the same (very unique) words.

So the spec is to find and figure out examples like:

  1. "As said in Roe..."
  2. "Roe at 222 says..."
  3. "Roe,410 U.S. 113 at 155."
  4. "As said in Citizens' Alliance..."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To Do
Status: No status
Development

No branches or pull requests

3 participants