Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,47 +2,47 @@
Energy Language Model (ELM)
***************************

.. image:: https://github.com/NREL/elm/workflows/Documentation/badge.svg
:target: https://nrel.github.io/sup3r/
.. image:: https://github.com/NatLabRockies/elm/workflows/Documentation/badge.svg
:target: https://natlabrockies.github.io/elm/

.. image:: https://github.com/NREL/elm/workflows/pytests/badge.svg
:target: https://github.com/NREL/elm/actions?query=workflow%3A%22pytests%22
.. image:: https://github.com/NatLabRockies/elm/workflows/pytests/badge.svg
:target: https://github.com/NatLabRockies/elm/actions?query=workflow%3A%22pytests%22

.. image:: https://github.com/NREL/elm/workflows/Lint%20Code%20Base/badge.svg
:target: https://github.com/NREL/elm/actions?query=workflow%3A%22Lint+Code+Base%22
.. image:: https://github.com/NatLabRockies/elm/workflows/Lint%20Code%20Base/badge.svg
:target: https://github.com/NatLabRockies/elm/actions?query=workflow%3A%22Lint+Code+Base%22

.. image:: https://img.shields.io/pypi/pyversions/NREL-elm.svg
:target: https://pypi.org/project/NREL-elm/
.. image:: https://img.shields.io/pypi/pyversions/NLR-elm.svg
:target: https://pypi.org/project/NLR-elm/

.. image:: https://badge.fury.io/py/NREL-elm.svg
:target: https://badge.fury.io/py/NREL-elm
.. image:: https://badge.fury.io/py/NLR-elm.svg
:target: https://badge.fury.io/py/NLR-elm

.. image:: https://zenodo.org/badge/690793778.svg
:target: https://zenodo.org/doi/10.5281/zenodo.10070538

The Energy Language Model (ELM) software provides interfaces to apply Large Language Models (LLMs) like ChatGPT and GPT-4 to energy research. For example, you might be interested in:

- `Converting PDFs into a text database <https://nrel.github.io/elm/_autosummary/elm.pdf.PDFtoTXT.html#elm.pdf.PDFtoTXT>`_
- `Chunking text documents and embedding into a vector database <https://nrel.github.io/elm/_autosummary/elm.embed.ChunkAndEmbed.html#elm.embed.ChunkAndEmbed>`_
- `Performing recursive document summarization <https://nrel.github.io/elm/_autosummary/elm.summary.Summary.html#elm.summary.Summary>`_
- `Building an automated data extraction workflow using decision trees <https://nrel.github.io/elm/_autosummary/elm.tree.DecisionTree.html#elm.tree.DecisionTree>`_
- `Building a chatbot app that interfaces with reports from OSTI <https://github.com/NREL/elm/tree/main/examples/energy_wizard>`_
- `Converting PDFs into a text database <https://natlabrockies.github.io/elm/_autosummary/elm.pdf.PDFtoTXT.html#elm.pdf.PDFtoTXT>`_
- `Chunking text documents and embedding into a vector database <https://nrenatlabrockiesl.github.io/elm/_autosummary/elm.embed.ChunkAndEmbed.html#elm.embed.ChunkAndEmbed>`_
- `Performing recursive document summarization <https://natlabrockies.github.io/elm/_autosummary/elm.summary.Summary.html#elm.summary.Summary>`_
- `Building an automated data extraction workflow using decision trees <https://natlabrockies.github.io/elm/_autosummary/elm.tree.DecisionTree.html#elm.tree.DecisionTree>`_
- `Building a chatbot app that interfaces with reports from OSTI <https://github.com/NatLabRockies/elm/tree/main/examples/energy_wizard>`_

Installing ELM
==============

.. inclusion-install

NOTE: If you are installing ELM to run ordinance scraping and extraction,
see the `ordinance-specific installation instructions <https://github.com/NREL/elm/blob/main/elm/ords/README.md>`_.
see the `ordinance-specific installation instructions <https://github.com/NatLabRockies/elm/blob/main/elm/ords/README.md>`_.

Option #1 (basic usage):

#. ``pip install NREL-elm``
#. ``pip install NLR-elm``

Option #2 (developer install):

#. from home dir, ``git clone [email protected]:NREL/elm.git``
#. from home dir, ``git clone [email protected]:NatLabRockies/elm.git``
#. Create ``elm`` environment and install package
a) Create a conda env: ``conda create -n elm``
b) Run the command: ``conda activate elm``
Expand All @@ -58,4 +58,4 @@ Option #2 (developer install):
Acknowledgments
===============

This work was authored by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by the DOE Wind Energy Technologies Office (WETO), the DOE Solar Energy Technologies Office (SETO), and internal research funds at the National Renewable Energy Laboratory. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.
This work was authored by the National Laboratory of the Rockies, operated by Alliance for Energy Innovation, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by the DOE Wind Energy Technologies Office (WETO), the DOE Solar Energy Technologies Office (SETO), and internal research funds at the National Laboratory of the Rockies. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@

html_context = {
"display_github": True,
"github_user": "nrel",
"github_user": "nlr",
"github_repo": "elm",
"github_version": "main",
"conf_py_path": "/docs/source/",
Expand Down
10 changes: 5 additions & 5 deletions docs/source/dev/ords_architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -326,11 +326,11 @@ resources, we can use ``Services`` to monitor their use as well! Let's look at a


There are several other services provided out of the box - see the
`documentation <https://nrel.github.io/elm/_autosummary/elm.ords.services.html>`_ for details
`documentation <https://natlabrockies.github.io/elm/_autosummary/elm.ords.services.html>`_ for details
Alternatively, we provide two base classes that you can extend to get similar functionality:
`ThreadedService <https://nrel.github.io/elm/_autosummary/elm.ords.services.threaded.ThreadedService.html#elm.ords.services.threaded.ThreadedService>`_
`ThreadedService <https://natlabrockies.github.io/elm/_autosummary/elm.ords.services.threaded.ThreadedService.html#elm.ords.services.threaded.ThreadedService>`_
for threaded tasks and
`ProcessPoolService <https://nrel.github.io/elm/_autosummary/elm.ords.services.cpu.ProcessPoolService.html#elm.ords.services.cpu.ProcessPoolService>`_
`ProcessPoolService <https://natlabrockies.github.io/elm/_autosummary/elm.ords.services.cpu.ProcessPoolService.html#elm.ords.services.cpu.ProcessPoolService>`_
for multiprocessing tasks.

**4.2 Key Classes**
Expand Down Expand Up @@ -763,9 +763,9 @@ We give a rough breakdown of the following call:
from elm.web.search import web_search_links_as_docs

QUERIES = [
"NREL wiki",
"NLR wiki",
"National Renewable Energy Laboratory director",
"NREL leadership wikipedia",
"NLR leadership wikipedia",
]

async def main():
Expand Down
2 changes: 1 addition & 1 deletion elm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from elm.web.osti import OstiRecord, OstiList

__author__ = """Grant Buster"""
__email__ = "Grant.Buster@nrel.gov"
__email__ = "Grant.Buster@nlr.gov"

ELM_DIR = os.path.dirname(os.path.realpath(__file__))
TEST_DATA_DIR = os.path.join(os.path.dirname(ELM_DIR), 'tests', 'data')
Expand Down
6 changes: 3 additions & 3 deletions elm/ords/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Then, install `pdftotext`:

$ pip install pytesseract pdf2image

At this point, you can install ELM per the [front-page README](https://github.com/NREL/elm/blob/main/README.rst) instructions, e.g.:
At this point, you can install ELM per the [front-page README](https://github.com/NatLabRockies/elm/blob/main/README.rst) instructions, e.g.:

$ pip install -e .

Expand All @@ -25,11 +25,11 @@ To do so, simply run:

$ rebrowser_playwright install

Now you are ready to run ordinance retrieval and extraction. See the [example](https://github.com/NREL/elm/blob/main/examples/ordinance_gpt/README.rst) to get started. If you get additional import errors, just install additional packages as necessary, e.g.:
Now you are ready to run ordinance retrieval and extraction. See the [example](https://github.com/NatLabRockies/elm/blob/main/examples/ordinance_gpt/README.rst) to get started. If you get additional import errors, just install additional packages as necessary, e.g.:

$ pip install beautifulsoup4 html5lib


## Architecture

For information on the architectural design of this code, see the [design document](https://nrel.github.io/elm/dev/ords_architecture.html).
For information on the architectural design of this code, see the [design document](https://natlabrockies.github.io/elm/dev/ords_architecture.html).
2 changes: 1 addition & 1 deletion elm/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
ELM version number
"""

__version__ = "0.0.34"
__version__ = "0.0.35"
16 changes: 8 additions & 8 deletions elm/web/rhub.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ class ProfilesRecord(dict):
"""Class to handle a single profiles as dictionary data.
This class requires setting an 'RHUB_API_KEY' environment
variable to access the Pure Web Service. The API key can be
obtained by contacting an NREL library representative:
Library@nrel.gov.
obtained by contacting an NLR library representative:
Library@nlr.gov.
"""
def __init__(self, record):
"""
Expand Down Expand Up @@ -333,8 +333,8 @@ class ProfilesList(list):
"""Class to retrieve and handle multiple profiles from an API URL.
This class requires setting an 'RHUB_API_KEY' environment
variable to access the Pure Web Service. The API key can be
obtained by contacting an NREL library representative:
Library@nrel.gov.
obtained by contacting an NLR library representative:
Library@nlr.gov.
"""
def __init__(self, url, n_pages=1):
"""
Expand Down Expand Up @@ -497,8 +497,8 @@ class PublicationsRecord(dict):
"""Class to handle a single publication as dictionary data.
This class requires setting an 'RHUB_API_KEY' environment
variable to access the Pure Web Service. The API key can be
obtained by contacting an NREL library representative:
Library@nrel.gov.
obtained by contacting an NLR library representative:
Library@nlr.gov.
"""
def __init__(self, record):
"""
Expand Down Expand Up @@ -736,8 +736,8 @@ class PublicationsList(list):
"""Class to retrieve and handle multiple publications from an API URL.
This class requires setting an 'RHUB_API_KEY' environment
variable to access the Pure Web Service. The API key can be
obtained by contacting an NREL library representative:
Library@nrel.gov.
obtained by contacting an NLR library representative:
Library@nlr.gov.
"""
def __init__(self, url, n_pages=1):
"""
Expand Down
2 changes: 1 addition & 1 deletion examples/energy_wizard/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Notes:
Downloading and Embedding PDFs
==============================

Run ``python ./retrieve_docs.py`` to retrieve 20 of the latest NREL technical
Run ``python ./retrieve_docs.py`` to retrieve 20 of the latest NLR technical
reports from OSTI. The script then converts the PDFs to text and then runs the
text through the OpenAI embedding model.

Expand Down
2 changes: 1 addition & 1 deletion examples/energy_wizard/retrieve_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
init_logger('elm', log_level='INFO')


# NREL-Azure endpoint. You can also use just the openai endpoint.
# NLR-Azure endpoint. You can also use just the openai endpoint.
# NOTE: embedding values are different between OpenAI and Azure models!
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_key = os.getenv("AZURE_OPENAI_KEY")
Expand Down
2 changes: 1 addition & 1 deletion examples/energy_wizard/run_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

model = 'gpt-4'

# NREL-Azure endpoint. You can also use just the openai endpoint.
# NLR-Azure endpoint. You can also use just the openai endpoint.
# NOTE: embedding values are different between OpenAI and Azure models!
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_key = os.getenv("AZURE_OPENAI_KEY")
Expand Down
22 changes: 11 additions & 11 deletions examples/ordinance_gpt/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ This example folder contains supporting documents, results, and code for the Ord
Prerequisites
=============
We recommend installing the pytesseract module to allow PDF retrieval for scanned documents.
See the `ordinance-specific installation instructions <https://github.com/NREL/elm/blob/main/elm/ords/README.md>`_
See the `ordinance-specific installation instructions <https://github.com/NatLabRockies/elm/blob/main/elm/ords/README.md>`_
for more details.

Running from Python
===================
This instruction set presents a simplified example to extract ordinance data from a ordinance document on disk. This corresponds with the ordinance data extraction from PDF results in `Buster et al., 2024 <https://doi.org/10.48550/arXiv.2403.12924>`_.
This instruction set presents a simplified example to extract ordinance data from a ordinance document on disk. This corresponds with the ordinance data extraction from PDF results in `Buster et al., 2024 <https://doi.org/10.48550/arXiv.2403.12924>`_.

To run this, first download one or more ordinance documents from `the Box folder <https://app.box.com/s/a8oi8jotb9vnu55rzdul7e291jnn7hmq>`_.

Expand All @@ -23,17 +23,17 @@ After downloading the ordinance document(s), set the relevant path for the ``fp_

Running from the Command Line Utility
=====================================
This instruction set is an experimental process to use LLMs to search the internet for relevant ordinance documents, download those documents, and then extract the relevant ordinance data.
This instruction set is an experimental process to use LLMs to search the internet for relevant ordinance documents, download those documents, and then extract the relevant ordinance data.

There are a few key things you need to set up in order to run ordinance retrieval and extraction.
First, you must specify which counties you want to process. You can do this by setting up a CSV file
with a ``County`` and a ``State`` column. Each row in the CSV file then represents a single county to process.
See the `example CSV <https://github.com/NREL/elm/blob/main/examples/ordinance_gpt/counties.csv>`_
See the `example CSV <https://github.com/NatLabRockies/elm/blob/main/examples/ordinance_gpt/counties.csv>`_
file for reference.

Once you have set up the county CSV, you can fill out the
`template JSON config <https://github.com/NREL/elm/blob/main/examples/ordinance_gpt/config.json>`_.
See the documentation for the `"process_counties_with_openai" function <https://github.com/NREL/elm/blob/main/elm/ords/process.py#L78>`_
`template JSON config <https://github.com/NatLabRockies/elm/blob/main/examples/ordinance_gpt/config.json>`_.
See the documentation for the `"process_counties_with_openai" function <https://github.com/NatLabRockies/elm/blob/main/elm/ords/process.py#L78>`_
for an explanation of all the allowed inputs to the configuration file.
Some notable inputs here are the ``azure*`` keys, which should be configured to match your Azure OpenAI API
deployment (unless it's defined in your environment with the ``AZURE_OPENAI_API_KEY``, ``AZURE_OPENAI_VERSION``,
Expand All @@ -60,7 +60,7 @@ Debugging
---------
Not sure why things aren't working? No error messages? Make sure you run the CLI call with a ``-v`` flag for "verbose" logging (e.g., ``$ elm ords -c config.json -v``)

Errors on import statements? Trouble importing ``pdftotext`` with cryptic error messages like ``symbol not found in flat namespace``? Follow the `ordinance-specific install instructions <https://github.com/NREL/elm/blob/main/elm/ords/README.md>`_ *exactly*.
Errors on import statements? Trouble importing ``pdftotext`` with cryptic error messages like ``symbol not found in flat namespace``? Follow the `ordinance-specific install instructions <https://github.com/NatLabRockies/elm/blob/main/elm/ords/README.md>`_ *exactly*.

Source Ordinance Documents
==========================
Expand All @@ -71,10 +71,10 @@ The ordinance documents downloaded using (an older version of) this example code
Extension to Other Technologies
===============================
Extending this functionality to other technologies is possible but requires deeper understanding of the underlying processes.
We recommend you start out by examining the decision tree queries in `graphs.py <https://github.com/NREL/elm/blob/main/elm/ords/extraction/graphs.py>`_
as well as how they are applied in `parse.py <https://github.com/NREL/elm/blob/main/elm/ords/extraction/parse.py>`_. Once you
We recommend you start out by examining the decision tree queries in `graphs.py <https://github.com/NatLabRockies/elm/blob/main/elm/ords/extraction/graphs.py>`_
as well as how they are applied in `parse.py <https://github.com/NatLabRockies/elm/blob/main/elm/ords/extraction/parse.py>`_. Once you
have a firm understanding of these two modules, look through the
`document validation routines <https://github.com/NREL/elm/blob/main/elm/ords/validation>`_ to get a better sense of how to
`document validation routines <https://github.com/NatLabRockies/elm/blob/main/elm/ords/validation>`_ to get a better sense of how to
adjust the web-scraping portion of the code to your technology. When you have set up the validation and parsing for your
technology, put it all together by adjusting the `"process_counties_with_openai" function <https://github.com/NREL/elm/blob/main/elm/ords/process.py#L78>`_
technology, put it all together by adjusting the `"process_counties_with_openai" function <https://github.com/NatLabRockies/elm/blob/main/elm/ords/process.py#L78>`_
to call your new routines.
2 changes: 1 addition & 1 deletion examples/research_hub/retrieve_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
init_logger('elm', log_level='INFO')


# NREL-Azure endpoint. You can also use just the openai endpoint.
# NLR-Azure endpoint. You can also use just the openai endpoint.
# NOTE: embedding values are different between OpenAI and Azure models!
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_key = os.getenv("AZURE_OPENAI_KEY")
Expand Down
2 changes: 1 addition & 1 deletion examples/research_hub/run_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

model = 'gpt-4'

# NREL-Azure endpoint. You can also use just the openai endpoint.
# NLR-Azure endpoint. You can also use just the openai endpoint.
# NOTE: embedding values are different between OpenAI and Azure models!
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_key = os.getenv("AZURE_OPENAI_KEY")
Expand Down
2 changes: 1 addition & 1 deletion examples/web_information_retrieval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ In this directory, we provide a set of examples that demonstrate how to use ELM
- [`Custom web information retrieval using search engines`](./example_search_retrieval_wiki.ipynb)

In this example, we demonstrate how to set up your own end-to-end web and information retrieval pipeline.
Specifically, we set up a procedure to extract the name of NREL's current director using only Wikipedia articles.
Specifically, we set up a procedure to extract the name of NLR's current director using only Wikipedia articles.

> [!NOTE]
> Due to the non-deterministic nature of several pipeline components (Google search, LLM), you may get
Expand Down
Loading