Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 fix: eliminate zombie jobs #96

Merged
merged 10 commits into from
Dec 16, 2024
Merged

🐛 fix: eliminate zombie jobs #96

merged 10 commits into from
Dec 16, 2024

Conversation

KagemniKarimu
Copy link
Contributor

@KagemniKarimu KagemniKarimu commented Dec 13, 2024

Checklist:

Important

Please review the checklist below before submitting your pull request.

  • Please open an issue before creating a PR or link to an existing issue
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas

Description

This PR is intended to deal with the several issues that exist with the CLI's list command and ListJobs logic. Dead jobs would show as being alive and vice versa. While it does not solve all of the surrounding issues, and could have more informative messages , it provides an initial technical framework for improving manuscript state detection:

The major changes are as follows:

  1. Implemented Manuscript State Machine in pkg/manuscript_state.go
  2. Refactored list to use the Manuscript State Machine to determine Manuscript state
  3. Updated list help documentation
  4. Added ls alias to list command
  5. Added directory search for list command - you can now search other directories besides the default one for manuscripts mimicking unix ls command. This was initially added for debugging purposes but proved quite useful.

Fixes #61
R.I.P. Zombie Jobs 🧟

Type of Change

  • Bug fix

This commit moves the hard-coded network chains endpoint into a `chainEndpoint` constant for better visibility and configurability.
Added a Manuscript State Detector which can be instanciated in the ListJobs() function for sake of identifying manuscript state with accuracy and finality.  The state machine executes a series of elaborate looks at container status and flink logs to  update to one  of five states: RUNNING, INITIALIZING, FAILED, STOPPED, and UNKNOWN (rare). This is the grounds for more advanced manuscript detection from the CLI.
This commit adds new logic to jobs_manuscipt that allows for the detection of manuscripts by using `manuscript_state.go`
Recognizing that there is a lot of complicated logic in the state detector, I added comments explaining its general operation throughout.
Within the DetectState method, manuscripts previously defaulted to initializing state, despite surviving all previous checks. Now, if a manuscript survives all checks it will display as `RUNNING` in the jobs list.
This adds as an additional state check - the checkGraphQLEndpoint function which hits the local endpoint to determine healthiness and ensure it is live before returning RUNNING. If it is determined not to be live, but survive all other checks, the manuscript will return INITIALIZING status.
This commit adds additional status indicators to the long help text of the list command.
This commit adds an alias, `ls` to the list command.
@KagemniKarimu
Copy link
Contributor Author

P.S. It may be time to bump the version number ;) @Liquidwe
We been piling new changes into v1.1.0 😆

This commit adds additional guard rails to fix broken functionality of `init` command that was affecting the jobs list. Previously, a person could init and partially overwrite an existing manuscript. Now, extensive checks are in place to prevent that from happening: manuscript is looked for in the directory of the manuscript path, the manuscript is already running, the manuscript is in the config file.
This commit adds new functionality to manuscript-cli list. It now accepts a directory as an argument. If no directory is specified it looks in the default directory for users. It also shows STOPPED jobs by looking at the manuscript yamls in the subdirectories of the specified directory and showing ALL manuscripts. This makes it more informative and usable.
@KagemniKarimu
Copy link
Contributor Author

Before merging, please verify this is not causing #97

@Liquidwe
Copy link
Member

thx @KagemniKarimu A very excellent repair proposal, I will submit some of the recent repairs to v1.1.1

@Liquidwe Liquidwe merged commit a9bdbdb into main Dec 16, 2024
@Liquidwe Liquidwe deleted the manuscript-core-fix branch December 16, 2024 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🧟 manuscript-cli list shows dead jobs
2 participants