Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Creating the two PipelineRuns makes one or both fail #175

Open
adelton opened this issue Nov 16, 2023 · 2 comments
Open

[BUG]: Creating the two PipelineRuns makes one or both fail #175

adelton opened this issue Nov 16, 2023 · 2 comments
Labels
kind/bug Something isn't working

Comments

@adelton
Copy link
Contributor

adelton commented Nov 16, 2023

Details

Describe the bug

When the user just pastes

oc create -f tekton/build-container-image-pipeline/aws-env-real.yaml
oc apply -k tekton/build-container-image-pipeline/
oc create -f tekton/build-container-image-pipeline/build-container-image-pipelinerun-bike-rentals.yaml
oc create -f tekton/build-container-image-pipeline/build-container-image-pipelinerun-tensorflow-housing.yaml

on a fresh namespace, there are high chances that

  • build-container-image-bike-rentals-* PipelineRun will fail in git-clone-model-repo with
    + /ko-app/git-init -url=https://github.com/opendatahub-io/ai-edge.git -revision=main -refspec= -path=/workspace/output//model_dir/ -sslVerify=true -submodules=true -depth=1 -sparseCheckoutDirectories=
    {"level":"error","ts":1700162720.543732,"caller":"git/git.go:53","msg":"Error running git [remote add origin https://github.com/opendatahub-io/ai-edge.git]: exit status 128\nfatal: detected dubious ownership in repository at '/workspace/output/model_dir'\nTo add an exception for this directory, call:\n\n\tgit config --global --add safe.directory /workspace/output/model_dir\n","stacktrace":"github.com/tektoncd/pipeline/pkg/git.run\n\t/go/src/github.com/tektoncd/pipeline/pkg/git/git.go:53\ngithub.com/tektoncd/pipeline/pkg/git.Fetch\n\t/go/src/github.com/tektoncd/pipeline/pkg/git/git.go:109\nmain.main\n\t/go/src/github.com/tektoncd/pipeline/cmd/git-init/main.go:53\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}
    {"level":"fatal","ts":1700162720.54378,"caller":"git-init/main.go:54","msg":"Error fetching git repository: exit status 128","stacktrace":"main.main\n\t/go/src/github.com/tektoncd/pipeline/cmd/git-init/main.go:54\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}
    
  • build-container-image-tensorflow-housing-* PipelineRun will fail in build-mlflow-container with
    error evaluating symlinks in build context path: lstat /workspace/source/model_dir/tensorflow-housing: no such file or directory
    

It's likely caused by a race condition over that new static model_dir part of the path introduced by #112 which the two pods are fighting for.

To Reproduce

Paste the commands all in one and then check the PipelineRuns in console.

Alternatively, do

oc create -f tekton/build-container-image-pipeline/aws-env-real.yaml
oc apply -k tekton/build-container-image-pipeline/
oc create -f tekton/build-container-image-pipeline/build-container-image-pipelinerun-tensorflow-housing.yaml

watch the first TaskRun (kserve-download-model) turn green in the console, then paste

oc create -f tekton/build-container-image-pipeline/build-container-image-pipelinerun-bike-rentals.yaml

and observe both the PipelineRuns in the console.

Expected behavior

Running the two PipelineRuns in parallel should still be possible the way it worked before, even if now one is git-based and the other one S3-based.

Screenshots (if applicable)

@adelton adelton added the kind/bug Something isn't working label Nov 16, 2023
@adelton adelton changed the title [BUG]: Creating the two PipelineRuns make one or both fail [BUG]: Creating the two PipelineRuns makes one or both fail Nov 21, 2023
@adelton adelton mentioned this issue Nov 21, 2023
3 tasks
@LaVLaS
Copy link
Contributor

LaVLaS commented Dec 13, 2023

@adelton The quick fix for this right now is to migrate your PipelineRun to use volumeClaimTemplate instead of a single pre-existing PVC for all Pipelines.

As we combine and refactor the pipelines (#177) to improve the workflow, we can add optional support for a collection of Pipelines to utilize a single PVC to archive data with support for purging older logs

@adelton
Copy link
Contributor Author

adelton commented Dec 13, 2023

@adelton The quick fix for this right now is to migrate your PipelineRun to use volumeClaimTemplate instead of a single pre-existing PVC for all Pipelines.

Could you do a PR so that we fix the repo content for everyone and folks don't need to do a one-off changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: Untriaged
Development

No branches or pull requests

2 participants