Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize README to split container infra from pipeline construction #200

Open
MichaelTiemannOSC opened this issue Aug 26, 2022 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation priority Indicates that the issue is a priority and should be fixed asap. user-experience Indicates that the issue exists to improve the user experience of the demo

Comments

@MichaelTiemannOSC
Copy link
Contributor

I am trying to use the latest documentation to guide how to create a pipeline for the (still private) PCAF sovereign footprint POC. I appreciate that the AICoE demo is trying to address two audiences: those who are building the actual containers that will run the jobs, as well as those who are building the notebooks that need to use those containers, but which are much more concerned with the calculations within the notebooks and the topology of the notebooks, without so much concern for the underlying infrastructure.

For example, when I select Custom Elyra Notebook or AICoE Demo as a notebook type, how much of the infrastructure decisions can I expect to have already been made by that selection, requiring me to only make simple GUI-based selections within a constrained environment? And how much do I need to grovel in the details of copy-pasting and editing every line of a Dockerfile to get the right sort of "Hello, world" pipeline functionality?

Following along the demo video (https://www.youtube.com/watch?v=lGeT615YNlM) I do see that users must create both a YAML file and a Docker image to define the container image. When the demo shows the construction of pipelines, it does not mention how much additional work is needed behind the scenes to make the demo2 notebooks magically link up with all that the YAML file and Dockerfile imply. For a Jupyter notebook user, it does not explain how to even edit /opt/app-root/src/PCAF-sovereign-footprint/.aicoe-ci.yaml, which is a hidden file that the file browser cannot even open.

In the part of the video that shows how runtime images are selected (https://youtu.be/lGeT615YNlM?t=701) there is no mention of how to find the quay.io server, nor any explanation as to the relationship between what a project should magically inherit as a result of the AICoE template nor any OperateFirst instance values for projects that are part of an Op1st environment (such as os-climate). The requirement that os-climate needs to create a redhat.com account to access quay.io repositories is confusing as an ODH user in a different organization. (The readme does offer the name of a quay.io image that does take me to the right place, but that's buried way past where I run into trouble trying to follow other directions first.)

I tried using the default https://ml-pipeline-ui.kubeflow.svc.cluster.local:80/pipeline advertised by the documentation, but that did not work. I interpolated a different endpoint by scraping what is in the demo video browser URL and changing CL1 to CL2:
http://ml-pipeline-ui.kubeflow.apps.odh-cl2.apps.os-climate.org/pipeline but that gave this error message:

Error making request
Failed to initialize `kfp.Client()` against: 'http://ml-pipeline-ui.kubeflow.apps.odh-cl2.apps.os-climate.org/pipeline' - Check Kubeflow Pipelines runtime configuration: 'pcaf_kubeflow'

Error details:
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/elyra/pipeline/kfp/processor_kfp.py", line 123, in process
    client = TektonClient(
  File "/opt/app-root/lib64/python3.8/site-packages/kfp/_client.py", line 161, in __init__
    if not self._context_setting['namespace'] and self.get_kfp_healthz().multi_user is True:
  File "/opt/app-root/lib64/python3.8/site-packages/kfp/_client.py", line 363, in get_kfp_healthz
    raise TimeoutError('Failed getting healthz endpoint after {} attempts.'.format(max_attempts))
TimeoutError: Failed getting healthz endpoint after 5 attempts.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/tornado/web.py", line 1704, in _execute
    result = await result
  File "/opt/app-root/lib64/python3.8/site-packages/elyra/pipeline/handlers.py", line 120, in post
    response = await PipelineProcessorManager.instance().process(pipeline)
  File "/opt/app-root/lib64/python3.8/site-packages/elyra/pipeline/processor.py", line 134, in process
    res = await asyncio.get_event_loop().run_in_executor(None, processor.process, pipeline)
  File "/usr/lib64/python3.8/asyncio/futures.py", line 260, in __await__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib64/python3.8/asyncio/tasks.py", line 349, in __wakeup
    future.result()
  File "/usr/lib64/python3.8/asyncio/futures.py", line 178, in result
    raise self._exception
  File "/usr/lib64/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/elyra/pipeline/kfp/processor_kfp.py", line 148, in process
    raise RuntimeError(
RuntimeError: Failed to initialize `kfp.Client()` against: 'http://ml-pipeline-ui.kubeflow.apps.odh-cl2.apps.os-climate.org/pipeline' - Check Kubeflow Pipelines runtime configuration: 'pcaf_kubeflow'
Check the JupyterLab log for more details at 2022-08-26 09:39:48

Happy to try again with some guidance.

@MichaelTiemannOSC MichaelTiemannOSC added the documentation Improvements or additions to documentation label Aug 26, 2022
@schwesig
Copy link

/kind bug

@sesheta
Copy link
Member

sesheta commented Aug 26, 2022

@schwesig: The label(s) kind/bug cannot be applied, because the repository doesn't have them.

In response to this:

/kind bug

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Shreyanand Shreyanand added priority Indicates that the issue is a priority and should be fixed asap. user-experience Indicates that the issue exists to improve the user experience of the demo labels Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation priority Indicates that the issue is a priority and should be fixed asap. user-experience Indicates that the issue exists to improve the user experience of the demo
Projects
None yet
Development

No branches or pull requests

5 participants