Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor github provider as template for other providers #428

Closed
bollwyvl opened this issue Mar 18, 2015 · 6 comments
Closed

Refactor github provider as template for other providers #428

bollwyvl opened this issue Mar 18, 2015 · 6 comments

Comments

@bollwyvl
Copy link
Contributor

Right now, the github/gist provider is:

  • ~600 lines of handlers.py
  • the github.py async client

Refactoring this into some common patterns that can be reused for other providers would be the first step towards #371 (gitlab), #302 (webgit), and #402 (stash/bitbucket) and #427 (google drive). Additional providers:

  • allura/sourceforge
  • dropbox
  • confluence

Provider

The theoretical base class, Provider, might like to have implementations include:

  • handlers: a la notebookapp, this is all handlers.py would really care about, and would handle the format story (a whole separate thing!)
  • entity(entity): user/org/project json-like for templates
  • tree(entity, collection, ref=None): repo json-like for templates
  • file(entity, collection, file): would a notebook need something special? The urls will be poor predictors for the general case.
  • refs(entity, collection, file=None): history, etc. presumably even drive would support this

Not all of these would be necessary for a 1.0 of a given implementation.

Client

Not sure if this is just an implementer convenience thing, but consistent logging, error reporting, caching, etc. would be handy, even if the individual methods were completely different from implementation to implementation... only the *Provider would care.

Distribution

Monolith

Perhaps the right file structure, if everything is to live in nbviewer itself is:

- app.py
- handlers.py
- ...
- providers/
  - provider.py
  - client.py
  - github/
    - provider.py
    - client.py
  - stash...
  - drive...
  - gitlab...
  - gitweb...
  - allura...
  - dropbox...
  - confluence...

And turn the providers on and off with the Configuration.... which may be a prerequisite for doing this correctly.

Plugins

Another option would be to just go ahead and use distribute/setuptools entry points, and break the providers into separate repos

setup(
    # ...
    entry_points = {'nbviewer.provider': 'github = nbviewer_github:GithubProvider'}
)

And manage them, for example, in your docker build....

FROM jupyter/nbviewer
RUN pip install nbviewer-github
CMD ["python", "-m", "nbviewer"]

Personally, I favor this approach.

Let's chart a course to the future of publishing 🚀 !

@bollwyvl bollwyvl changed the title Refactor github provider as template for for other providers Refactor github provider as template for other providers Mar 18, 2015
@bollwyvl
Copy link
Contributor Author

Looking at this more, the basic url handler could also benefit from being able to render a directory index view such as apache, WebDAV or svn might provide. This might be a better place to set the pattern than a synthetic provider, as it has a client, etc.

It would render as a tree all links that are neighbors (if index.html) or children of the directory requested, and offer .. and breadcrumbs.

Of course, this would increase the spidering surface of the app, so maybe we'd want to start thinking about robots.txt.

@rgbkrk
Copy link
Member

rgbkrk commented Mar 25, 2015

On robots.txt, @danabauer was going to take a crack at it when she got some time. If we need it sooner rather than later, I can head that up.

@bollwyvl
Copy link
Contributor Author

Mention of robots was speculative... How do we even feel about scraping
html and trying to deduce a list of folders, notebooks and other from it?
On Mar 25, 2015 4:45 PM, "Kyle Kelley" [email protected] wrote:

On robots.txt, @danabauer https://github.com/danabauer was going to
take a crack at it when she got some time. If we need it sooner rather than
later, I can head that up.


Reply to this email directly or view it on GitHub
#428 (comment).

@rgbkrk
Copy link
Member

rgbkrk commented Mar 25, 2015

We need robots.txt anyways, because we don't currently serve a robots.txt that adheres to the originating content's robots.txt

@bollwyvl
Copy link
Contributor Author

Gotcha.
On Mar 25, 2015 5:48 PM, "Kyle Kelley" [email protected] wrote:

We need robots.txt anyways, because we don't currently serve a robots.txt
that adheres to the originating content's robots.txt


Reply to this email directly or view it on GitHub
#428 (comment).

@bollwyvl
Copy link
Contributor Author

Hooray, this is closed by #443!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants