Skip to content

Koala: Adding SSL disabling ENV control (See #198)#199

Open
csiefer2 wants to merge 3 commits intolanl:mainfrom
csiefer2:lammps-fix2
Open

Koala: Adding SSL disabling ENV control (See #198)#199
csiefer2 wants to merge 3 commits intolanl:mainfrom
csiefer2:lammps-fix2

Conversation

@csiefer2
Copy link
Copy Markdown
Contributor

As per #198, adding the URSA_DISABLE_SSL ENV var to allow disabling SSL for trafilatura. This helps URSA work correctly with SSL interception for the LAMMPS agent.

@mikegros
Copy link
Copy Markdown
Collaborator

Please fix the ruff formatting issue.

cfg.set("DEFAULT", "favor_recall", "false") # be stricter; less noise
try:
# If you fetched HTML already, use extract() on string; otherwise, fetch_url(url)
downloaded = trafilatura.fetch_url(html, no_ssl=no_ssl)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrectly implemented. As ruff indicates, "downloaded" isnt used anywhere.

It's also not clear why you are trying to fetch again. The information passed into this function should be a string representation of the webpage, not a URL.

Can we just remove this line?

from langchain_community.document_loaders import PyPDFLoader

# Optionally disable SSL for trafilatura
no_ssl = os.environ["URSA_DISABLE_SSL"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you would actually want to use something like:
no_ssl = os.getenv("URSA_DISABLE_SSL")

As it is, this throws key errors in almost every test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants