Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making OpenRefine Available from Notebooks #6

Closed
psychemedia opened this issue Dec 17, 2018 · 9 comments
Closed

Making OpenRefine Available from Notebooks #6

psychemedia opened this issue Dec 17, 2018 · 9 comments

Comments

@psychemedia
Copy link
Contributor

Would it make sense to also include a pip dependency for an OpenRefine Python client eg paulmakepeace/refine-client-py#17 ?

I have some fragments of demo notebooks using the client and could probably pull something together out of them for a demo notebook...

(I note there is also an R client.)

@betatim
Copy link
Owner

betatim commented Dec 18, 2018

Yes please. I didn't even know there was a Python client/way to use OpenRefine that isn't via the web GUI.

@psychemedia
Copy link
Contributor Author

Will try to pull something together.

Is the port number on which the OpenRefine service runs on localhost available anywhere? (Default is 3333).

A quick connection test is s/thing like:

!pip install git+https://github.com/dbutlerdb/refine-client-py #py3 fork

from open.refine import refine, facet

server = refine.RefineServer()
orefine = refine.Refine(server)

orefine.list_projects().items()

The OpenRefine server may need starting with env var REFINE_HOST=0.0.0.0 or as appropriate to allow the connection to be made?

@betatim
Copy link
Owner

betatim commented Dec 18, 2018

I think the port on which OpenRefine itself listens is configured here:

cmd = ['openrefine-2.8/refine',
'-p', str(self.port)
]
return cmd

and is chosen at random by the proxy. We could change that I think?

@psychemedia
Copy link
Contributor Author

Maybe set an env var that contains the port number?

@betatim
Copy link
Owner

betatim commented Dec 20, 2018

Something to try. Though I think if we set an env var in the notebook server process (where this handler runs) will we see it from inside a notebook kernel?

Have you tried talking to OpenRefine through the proxy instead of directly?

@psychemedia
Copy link
Contributor Author

Setting a specific port (default is 3333) would make things easier.

I did a test where I added:

with open('openrefine.txt','w') as f:
            f.write(str(self.port))

to the port assigner, which writes into the notebook user homedir, then could connect with server = refine.RefineServer(port=open('openrefine.txt').read()) (I also set REFINE_HOST=0.0.0.0, but didn't try without that).

Alternatively, can just set things to use the default port:

def setup_handlers(web_app):
    web_app.add_handlers('.*', [
        (ujoin(web_app.settings['base_url'], 'openrefine/(.*)'),
         OpenRefineProxyHandler, dict(state={'port':3333})), #Or allow this to be set
        (ujoin(web_app.settings['base_url'], 'openrefine'), AddSlashHandler)
        ])

I wonder whether OpenRefine should use a project directory in the Jupyter user's home path (start OpenRefine with -d flag pointing to the OpenRefine project directory. (Note that the project directory needs to exist prior to trying to start OpenRefine using it otherwise OR fails to start up.)

With the following test, a new project is created although the project name isn't respected (OR may well have changed over the years since the client was written.)

from open.refine import refine

server = refine.RefineServer()
orefine = refine.Refine(server)

orefine.list_projects().items()

Using a project created via OpenRefine, and with a project key (keys in orefine.list_projects().items() dict), we can look at a project:

p=orefine.open_project(projkey)
pr=p.get_rows(limit=10)
pr.rows[0].row

Opening a proj from py doesn't seem to work for me out of the can - an untitled project is created but not data loaded in:

#!pip install pandas
import pandas as pd
pd.DataFrame({'col1':[1,2],'col2':['fgfg','sd']}).to_csv('test.csv')

p=orefine.new_project(project_file='test.csv', project_name='dkbjhgjghfemo1')

I'm being forced onto a Christmas break now...! When I get back, I'll maybe fork the py client and try to get it working and document it a bit with a standalone OpenRefine server, then try again...

@betatim
Copy link
Owner

betatim commented Dec 22, 2018

Happy Christmas break!

When you come back: I don't use OpenRefine all that much so I don't really have opinions/experience on what it should do. Happy for you to take the lead on that.

Should we move this project to a more neutral venue than my GitHub account? Might make it clearer that I don't really know what I am doing here and others can and should step up.

@psychemedia
Copy link
Contributor Author

I made a start on a demo notebook here but it could take me some time before it's anywhere near complete as a walkthrough of the openrefine-py client.

@felixlohmeier
Copy link
Contributor

I think this issue is solved by #8 and #11

@betatim betatim closed this as completed Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants