Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add async completion API #55

Merged
merged 4 commits into from
Apr 16, 2021
Merged

Conversation

jpommerening
Copy link
Member

@jpommerening jpommerening commented Feb 26, 2021

Just starting with the API description to get the ball rolling and get some early-on support ;)

A lot going on here … so … I pulled in 3f83e18 while making the AppsService API async. Then updated kube (0.48.0) and dkregistry (some very recent git commit) to be compatible with Tokio 1.3.0 and Reqwest 0.11.

Adapting the tests took a while, but all should pass now.

I added rustls for reading PEM certs (kube now wants them already parsed to their binary X.509 representation) and hyper (which was already a transitive dependency) for parsing the Prefer header.


The create_app, delete_app and the new task endpoint now support Prefer: respond-async and Prefer: respond-async,wait=ts.

App creation and shutdown now trigger the creation of a temporary container identified by a task id that is present for as long as the operation takes. The new task endpoint can be used to "poll" the created container.

Known issues:

  • The creation API may respond before the deployment task container is present, so a poll immediately after that might indicate that the task is already completed. The API should probably create the deployment task container synchronously before responding.
  • Polling a task and waiting for it's completion produces a different result (Ok(Vec<Service>)) than polling a task that is already completed (NotFound(())).

@jpommerening jpommerening marked this pull request as ready for review March 18, 2021 13:54
@jpommerening jpommerening changed the title (wip) Add async completion API Add async completion API Mar 18, 2021
@jpommerening
Copy link
Member Author

jpommerening commented Mar 18, 2021

Test script create-poll.sh:

#!/bin/sh
PAYLOAD='[{"serviceName": "my-service", "image": "docker.io/library/my-image:latest" , "env": {}}]'

WAIT="2"
CONTENT_TYPE="Content-Type: application/json"
PREFER="Prefer: respond-async, wait=$WAIT"
HOST="http://localhost:8000"
URL="$HOST/api/apps/master"

RESPONSE=`curl -v -H "$CONTENT_TYPE" -H "$PREFER" -d "$PAYLOAD" "$URL"`
URL=`echo "$RESPONSE" | jq -re 'if .taskId? and .url? then .url else empty end'`

while test -n "$URL" ; do
  RESPONSE=`curl -v -H "$CONTENT_TYPE" -H "$PREFER" "$URL"`
  URL=`echo "$RESPONSE" | jq -re 'if .taskId? and .url? then .url else empty end'`
done

echo "$RESPONSE"

Test script delete-poll.sh:

#!/bin/sh
WAIT="1"
PREFER="Prefer: respond-async, wait=$WAIT"
HOST="http://localhost:8000"
URL="$HOST/api/apps/master"

RESPONSE=`curl -v -X DELETE -H "$PREFER" "$URL"`
URL=`echo "$RESPONSE" | jq -re 'if .taskId? and .url? then .url else empty end'`

while test -n "$URL" ; do
  RESPONSE=`curl -v -H "$CONTENT_TYPE" -H "$PREFER" "$URL"`
  URL=`echo "$RESPONSE" | jq -re 'if .taskId? and .url? then .url else empty end'`
done

echo "$RESPONSE"

Copy link
Contributor

@schrieveslaach schrieveslaach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. 😎 I found some minor issues and we should be able to work them out quickly.

However, we should test the K8s again but I can help you to test it (write me a private message and we do it together). Unfortunately, we do not have integrations tests for K8s.

api/res/openapi.yml Outdated Show resolved Hide resolved
api/res/openapi.yml Outdated Show resolved Hide resolved
api/res/openapi.yml Outdated Show resolved Hide resolved
api/res/openapi.yml Outdated Show resolved Hide resolved
api/src/apps/routes.rs Show resolved Hide resolved
api/src/infrastructure/kubernetes/infrastructure.rs Outdated Show resolved Hide resolved
api/src/infrastructure/kubernetes/infrastructure.rs Outdated Show resolved Hide resolved
api/src/infrastructure/kubernetes/infrastructure.rs Outdated Show resolved Hide resolved
api/src/infrastructure/kubernetes/payloads.rs Outdated Show resolved Hide resolved
api/src/webhooks.rs Outdated Show resolved Hide resolved
@jpommerening
Copy link
Member Author

Thanks for taking the time!
I'll go ahead and make some edits, and will get in contact for testing the k8s stuff 👍

@jpommerening
Copy link
Member Author

jpommerening commented Mar 22, 2021

Okay, I've got a little update:

I made some adjustments to get the kubernetes infrastructure running … to a degree.
I'm quite certain that the issues I'm still having with it are not caused by the update. I noticed some things:

  • Deployments are inspected to get the state of the services, however a k8s deployment spec may not be paused, have replicas >= 1 and still, the pods and containers may not yet be ready. Is this intended? Source
    Result: services are listed as running when, in fact, they are not.
    (curl localhost:8000/api/apps | jq .$app_name vs kubectl --namespace $app_name get deployments)
  • At least on my machine, running Docker for Mac, services can't communicate with each other. The issue seems to be that the ports of the databases can't be resolved, so the k8s services are created on port 80. kubectl --namespace $app_name get services displays port 80 for mariadb containers which is clearly not the port mariadb listens on.
    Result: services won't start up because they are waiting for their database. (An app companion in my case, which I assume is stopping the service instance(s) and some other companions from starting)

Edit: Just tested this with the current master, running into the same issues.

@jpommerening
Copy link
Member Author

jpommerening commented Mar 23, 2021

@schrieveslaach I think I addressed all of you notes, except for those regarding the OpenAPI documentation.

Would you mind doing another review/test, while I adjust the docs (and remove the response body for tasks)?

edit done 👇

@jpommerening
Copy link
Member Author

What's your opinion on #55 (comment) and #55 (comment) ?


self.connect_traefik(&network_id).await?;
let result = (async move {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right to make sure that we clean up the temporary container. However, I would prefer to move the code that is now contained in the async move block into its own method so that you can use it in following way:

      let deployment_container = self
            .create_deployment_task_container(deployment_id, app_name)
            .await?;

      let deployment_result =  self.deploy_services_impl().await;

      delete(deployment_container).await?;

      deployment_result

Putting it in an async move closure reduces readability IMHO.

api/src/infrastructure/docker.rs Outdated Show resolved Hide resolved
api/src/infrastructure/kubernetes/infrastructure.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@schrieveslaach schrieveslaach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I could not finish the review today (I'll continue tomorrow). So far the PR looks promising and I just found some minor issues (see comments).

The issue seems to be that the ports of the databases can't be resolved, so the k8s services are created on port 80

That's a known issue (at least for me). You have to use a private Docker registry that allows you to resolve the container manifest (hub.docker.io will give you 404 if you request the manifest). For testing, just push an nginx to a private registry and use the image from there.

api/src/apps/apps.rs Show resolved Hide resolved
api/src/infrastructure/docker.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@schrieveslaach schrieveslaach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR looks good to me. 👍 There are some minor issues but we will figure them out. 😉

api/src/apps/routes.rs Outdated Show resolved Hide resolved
api/src/apps/apps.rs Outdated Show resolved Hide resolved
api/src/infrastructure/docker.rs Outdated Show resolved Hide resolved
api/src/infrastructure/kubernetes/infrastructure.rs Outdated Show resolved Hide resolved
api/src/main.rs Show resolved Hide resolved
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
pub struct TaskId(uuid::Uuid);

impl TaskId {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm wondering if we should call it DeploymentId because task is soooo generic. And that makes me wondering if the route should be /deployment/{deploymentId} instead. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a slight preference for "task". I associate "deployment" either with something longer-lasting, in the sense that k8s uses the term, or with "the act of deploying something", in which case it would be confusing to poll a deployment when undeploying an app.

But there probably is a better term than "task", I just can't find one that's less generic but would still match both deploying and undeploying an app?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, deployment is too narrow here. What do you think about using another resource URL: /apps/{appName}/status-changes/{statusId} and then use AppStatusChangeId here.

Copy link
Contributor

@schrieveslaach schrieveslaach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in private we want to rename the resource path that has been proposed in this PR. Until then, I'm looking forward to merge this PR.

api/src/infrastructure/docker.rs Outdated Show resolved Hide resolved
@@ -85,6 +85,55 @@ impl DockerInfrastructure {
DockerInfrastructure {}
}

async fn find_deployment_task_container(
&self,
deployment_id: &String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deployment_id: &String,
deployment_id: &str,

@@ -185,6 +234,65 @@ impl DockerInfrastructure {
Ok(())
}

async fn deploy_services_impl(
&self,
app_name: &String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
app_name: &String,
app_name: &str,

Ok(services)
}

async fn stop_services_impl(&self, app_name: &String) -> Result<Vec<Service>, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async fn stop_services_impl(&self, app_name: &String) -> Result<Vec<Service>, Error> {
async fn stop_services_impl(&self, app_name: &str) -> Result<Vec<Service>, Error> {

Comment on lines 521 to 523
app_name: Option<&String>,
service_name: Option<&String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
app_name: Option<&String>,
service_name: Option<&String>,
app_name: Option<&str>,
service_name: Option<&str>,

Comment on lines +545 to +547
app_name: &String,
service_name: &String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
app_name: &String,
service_name: &String,
app_name: &str,
service_name: &str,

};
async fn get_deployment_task(
&self,
deployment_id: &String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deployment_id: &String,
deployment_id: &str,

#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
pub struct TaskId(uuid::Uuid);

impl TaskId {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, deployment is too narrow here. What do you think about using another resource URL: /apps/{appName}/status-changes/{statusId} and then use AppStatusChangeId here.

api/src/apps/apps.rs Show resolved Hide resolved
@@ -185,6 +207,29 @@ paths:
text/plain:
schema:
type: string
/tasks/{taskId}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated below, to have a better naming /apps/{appName}/status-changes/{statusId} would be my choice for naming this resource. In this PR we can ignore appName in the route implementation:

#[get("/<app_name>/status-changes/<status_change_id>", format = "application/json")]
fn task(
    _app_name: Result<AppName, AppNameError>,
    status_change_id: Result<AppStatusChangeId, AppStatusChangeIdError>,
    apps: State<Arc<Apps>>,
    tasks: State<Tasks>,
    options: RunOptions,
) -> Result<AsyncCompletion<Json<Vec<Service>>>, HttpApiProblem> {
   // …

I'll create than an issue that we will use the app_name and remove the deployment guard.

@jpommerening
Copy link
Member Author

Testing on our (p)review host revealed some issues coming from shiplift:

ERROR prevant::apps::host_meta_cache] Cannot load apps: Cannot interact with infrastructure: Docker Error: channel closed

Further investigation needed 😅

Updated:
- tokio (1.3)
- kube (0.48)
- hyper (0.11)
- dkregistry (pre-0.6)
- shiplift (0.7)
- reqwest (0.11)
- Use a static runtime instance (should be removed when updating to Rocket 0.5)
- Avoid spawning of Docker inspect commands due to instability when running
  many Docker container.
- Replace Tasks with function
@jpommerening jpommerening force-pushed the add-async-api branch 5 times, most recently from d1d1f18 to 0247b20 Compare April 7, 2021 14:47
Copy link
Contributor

@schrieveslaach schrieveslaach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some small code occurrences to improve. Please, have a look. Additionally, we should keep the number of commits on the main branch small. Code you squash your commit?

- respond-async
- respond-async, wait=1
- respond-async, wait=5
- respond-async, wait=10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we move this examples to the example block? wait isn't limited to these examples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's kinda hard to model the header in a meaningful way.
Another method I could think of would be:

      example:
        respond-async,wait=10
      schema:
        type: array
        uniqueItems: true
        items:
          oneOf:
            - type: string
              description: Respond before the operation is finished
              pattern: ^respond-async$
              example: respond-async
            - type: string
              description: Wait the specified amount of seconds for the operation to finish
              pattern: ^wait=(\d+)$
              example: wait=10

Swagger UI can even validate it an provide a semi-useful UI (list of strings):
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

api/src/apps/apps.rs Show resolved Hide resolved
api/src/apps/routes.rs Show resolved Hide resolved

container_details.insert(app_name, details);
for container in container_list.into_iter() {
if let Some(details) = not_found_to_none(inspect(container).await)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎆 Hurray

- Leave spawning threads to the runtime when crawling host-meta
- Fix an issue with shiplift
- Pull the busybox image before trying to create the container
- Fix error in crawler refactoring
- Return 404 when task disappears
- Handle containers disappearing before we can inspect them
- Be more explicit about allowed Prefer header formats
@schrieveslaach
Copy link
Contributor

We talked about the remaining comments and we came to the conclusion that the code is good and we tested it in our lab. We will merge it now.

@schrieveslaach schrieveslaach merged commit 6f48c67 into aixigo:master Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants