Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README Typo and Failure Running retrieve_open_map_data.py #11

Open
allella opened this issue Dec 17, 2019 · 12 comments
Open

README Typo and Failure Running retrieve_open_map_data.py #11

allella opened this issue Dec 17, 2019 · 12 comments

Comments

@allella
Copy link
Collaborator

allella commented Dec 17, 2019

Two issues

  1. The README has a typo in the name of python retrieve_open_map_data.py
  2. Running python retrieve_open_map_data.py > data/map_data.json generates the error below.
python retrieve_open_map_data.py > data/map_data.json
  File "retrieve_open_map_data.py", line 107
    if count > 7
               ^
SyntaxError: invalid syntax

@allella
Copy link
Collaborator Author

allella commented Dec 17, 2019

@SpaceCowboy326 not sure I can assign these, so pinging you for advise.

@SpaceCowboy326
Copy link
Owner

@allella just committed some changes which should resolve both issues. I've never really run a public repository before, if there's anything I can do to give you some control over that I'd be happy to. I'm also happy to continue fixing issues as they come up.

@allella
Copy link
Collaborator Author

allella commented Dec 17, 2019

Thanks.

The access needed will depend on if you want to continue to maintain this repo or have CFG fork it and continue any future development on the fork.

There's nothing stopping folks from forking and rolling their own so if as CFG needs to tweak it then a fork my make more sense.

@allella
Copy link
Collaborator Author

allella commented Dec 17, 2019

Now I'm getting the error below. Do we need to run the script inside of a Python environment? If so then is an environments.yml necessary to specify the external packages?

python retrieve_open_map_data.py > data/map_data.json
Traceback (most recent call last):
  File "retrieve_open_map_data.py", line 15, in <module>
    from bs4 import BeautifulSoup
ImportError: No module named bs4

@SpaceCowboy326
Copy link
Owner

@allella yeah I totally forgot to add that in the README. The script relies on two python packages: requests and bs4 (Beautiful Soup HTML parser). I haven't worked with Python much before, but I've been using pip to install packages. It looks like environments.yml is associated with conda, would you recommend I look into that for automating the process?

I could probably rewrite it in JS pretty quickly if that makes things easier as well, I really only wrote it in Python for my own edification.

@allella
Copy link
Collaborator Author

allella commented Dec 17, 2019

I wouldn't worry about it. I think what we can/should do is just cache the JSON data on the server and create additional views that allow people to get lists without having to spider the site.

If I expose a list of .geojson links then would it be easy to have the script just reference those files? If they are cached / saved as flat files then the serving should be fast. Eventually I'd like to put Cloudflare in front of the site too so the caching is at a more robust level.

@SpaceCowboy326
Copy link
Owner

Totally agree regarding caching, wouldn't want this script running every time someone hits the site. I played around with using https://data.openupstate.org/rest/maps?_format=json to retrieve the data, but I think I ran into a COORS issue. I can set it up to pull from whatever works best for you - ideally, I would love to have it be able to make a single request and get all the layers back at once.

@allella
Copy link
Collaborator Author

allella commented Dec 23, 2019

@SpaceCowboy326 I've set the CORS to allow all origins.

Will this work now?
https://data.openupstate.org/rest/maps?_format=json

@SpaceCowboy326
Copy link
Owner

@allella that did the trick. I will change the script to pull from that URL soon. I can pretty easily pull the data right from the browser now, but I'll probably want to add some way to cache the data before I start doing that.

@allella
Copy link
Collaborator Author

allella commented Dec 31, 2019

I'm not sure it's worth caching on that end because the API is already caching on the backend and the browser should cache the response for up to 1 day due to the HTTP Header
Cache-Control: max-age=86400, public
Also, the API is currently set to cache for 1 day. I can set the API or HTTP header to expire more frequently if/as needed. I haven't tested it, but I think the API is also smart enough to invalidate the caching on both ends. It would use the Etag to invalidate the browser's copy.

I'm sure there are some advantages to what you're suggesting, but they may depend on me making the Drupal API caching strategy more flexible. I'm likely to put this domain behind Cloudflare, or similar before long and at that point it would be smart to allow the browser HTTP headers to deal with any data that's only used by/in the browser.

That make sense?

@SpaceCowboy326
Copy link
Owner

I think that makes sense, but it should actually be pretty easy for me to detect changes and reduce the number of calls I need to make overall while keeping my data in sync with the API. Each of the map layers from https://data.openupstate.org/rest/maps?_format=json looks like it has an "updated" property, so I can avoid making calls to all of the geojson URLs if mine match. Does that seem worthwhile?

@allella
Copy link
Collaborator Author

allella commented Jan 1, 2020

I haven't looked into the code to know if/when you'd want to cache things on the server running this application.

If we're talking about your application needing access to the JSON for it's own use, like if you wanted to cache the list of JSON URLs, then that's probably fine the HTTP headers do nothing if you're using curl or wget or some Python fetching method to get a URL on every initial load.

However, if we're talking about the web browser making calls to get JSON then I'd pull that directly from the data.openupstate.org URLs since the browser will cache these responses and we'll continue to add more performance improvements to the API server as there are more applications which depend on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants