Skip to content

[Chore]: Added url-checker, updated circular-deps, documented new static analysis .txt pattern #7442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jul 1, 2025

Conversation

nadr0
Copy link
Contributor

@nadr0 nadr0 commented Jun 10, 2025

Update

I moved the 404 links to this PR #7448

  • Updated the circular deps to use ./scripts/known/circular.txt and ./scripts/known/urls.txt
  • Updated the bash scripts for the new path
  • Updated the package.json for the new paths
  • Added npm run url-checker:diff to the static analysis CI CD job.
  • Updated url-checker.sh to not use tmp files on disk, everything is in memory so it is less messy
  • Added a scripts section to the contributing.md file to have others understand the pattern.

Issue

URLs in the application are stale and they 404, we don't have an automatic way to detect this

Implementation

  • Fixed the 404 urls
  • Implemented a bash script that has a known-urls.txt file. If we generate a diff, we can test against this and see what the problem is.
  • I am using two grep commands which can produce some funky results. I think this is reasonable since this did catch all the ones that I had to migrate in this PR, I used that list to fix them.
  • Anything link that is not 200 goes in that known-urls.txt file

Copy link

vercel bot commented Jun 10, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
modeling-app ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 1, 2025 5:27pm

@nadr0 nadr0 requested a review from a team June 10, 2025 20:28
@nadr0 nadr0 changed the title Fixing 404 URLs, implemented known-url.txt for url checking. Fixing 404 URLs, implemented known-urls.txt for url checking. Jun 10, 2025
@@ -110,6 +110,9 @@
"remove-importmeta": "sed -i 's/import.meta.url/window.location.origin/g' \"./rust/kcl-wasm-lib/pkg/kcl_wasm_lib.js\"; sed -i '' 's/import.meta.url/window.location.origin/g' \"./rust/kcl-wasm-lib/pkg/kcl_wasm_lib.js\" || echo \"sed for both mac and linux\"",
"lint-fix": "eslint --fix --ext .ts --ext .tsx src e2e packages/codemirror-lsp-client/src rust/kcl-language-server/client/src",
"lint": "eslint --max-warnings 0 --ext .ts --ext .tsx src e2e packages/codemirror-lsp-client/src rust/kcl-language-server/client/src",
"url-checker":"./scripts/url-checker.sh",
"url-checker:overwrite":"npm run url-checker | sed '$d' > known-urls.txt",
"url-checker:diff":"./scripts/diff-url-checker.sh",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a nitpick, so feel free to ignore, but most of these scripts are verbs so maybe check-urls: would be more consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    "url-checker":"./scripts/url-checker.sh",
    "url-checker:overwrite":"npm run url-checker | sed '$d' > known-urls.txt",
    "url-checker:diff":"./scripts/diff-url-checker.sh",
    "circular-deps": "dpdm --no-warning --no-tree -T --skip-dynamic-imports=circular src/index.tsx",
    "circular-deps:overwrite": "npm run circular-deps | sed '$d' | grep -v '^npm run' > known-circular.txt",
    "circular-deps:diff": "./scripts/diff-circular-deps.sh",
    "circular-deps:diff:nodejs": "npm run circular-deps:diff || node ./scripts/diff.js",

I followed the same pattern Jess implemented for circular deps. So the verbage is different

known-urls.txt Outdated
302 https://stackoverflow.com/a/58436959/22753272
301 https://text-to-cad.zoo.dev/dashboard
307 https://zoo.dev/
308 https://zoo.dev/docs/api/ml/generate-a-cad-model-from-text
Copy link
Contributor

@jacebrowning jacebrowning Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this as another top-level file. Perhaps something like scripts/output/ could work? There could also be a nested .gitignore in there to hide the temporary files. Open to suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move this, again I followed jess's pattern of known-circular.txt. This is the one file we want to use in the CI CD testing.

I'd wanna move all of these IMO so they follow the same pattern. This is the second script that we run and have a whitelist that needs to be updated otherwise a future test would fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of moving both then. I don't thinking adding another text file to the root each time we want to store "last run" state is a pattern we want to replicate.

If you want to ship these new scripts, I'd like to see them hooked up to CI as well. Otherwise, I worry that these will be forgotten about and the code/state will go stale over time. I'd hope we catch broken links in reviews going forward, even without any custom scripts -- not doing so implies a mere "looks good" review that didn't actually try out the code. 😄

It might be worthwhile to ship the 404 fixes in a separate PR so we can get those out independent of any potential tooling discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a new PR for only the 404 links. I was going to make a CI CD job but I want it in and people to see it before I make it a CI CD enforced job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of moving both then. I don't thinking adding another text file to the root each time we want to store "last run" state is a pattern we want to replicate.

I will move this into the scripts folder but I do think these types of files fall into a configuration style. It is a project wide state that can be read.

@nadr0
Copy link
Contributor Author

nadr0 commented Jun 11, 2025

#7448

PR with 404 fixes only.

while read line; do
# || true this curl request to bypass any failures and not have the scrip panic.
# the set -euo pipefail will cause a panic if a curl fails
status=$(curl -o /dev/null -s -w "%{http_code}\n" $line || true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $line variable should be quoted to prevent issues with URLs containing special characters or spaces:

status=$(curl -o /dev/null -s -w "%{http_code}\n" "$line" || true)

This ensures the entire URL is passed as a single argument to curl, avoiding potential command injection or unexpected behavior with special characters.

Suggested change
status=$(curl -o /dev/null -s -w "%{http_code}\n" $line || true)
status=$(curl -o /dev/null -s -w "%{http_code}\n" "$line" || true)

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

@nadr0
Copy link
Contributor Author

nadr0 commented Jun 11, 2025

@jacebrowning -- this ready for a rereview. I've moved everything into ./scripts. I've updated the contributing.md to outline this pattern as well.

I added a step in the CI CD for my known urls.

@nadr0 nadr0 changed the title Fixing 404 URLs, implemented known-urls.txt for url checking. [Chore]: Added url-checker, updated circular-deps, documented new static analysis .txt pattern Jun 11, 2025
@nadr0 nadr0 requested a review from a team June 12, 2025 14:19
Copy link
Contributor

@jacebrowning jacebrowning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for consolidating the scripts directory and adding the URL checks to CI!

Comment on lines +30 to +33
val1=$(grep -Eoh "(https)://[^']+" src/**/*.ts | remove_after_space | remove_after_backtick | remove_after_end_paren | remove_after_double_quote | remove_after_gt | remove_after_comma)

# Search all src/**/*.tsx files
val2=$(grep -Eoh "(https)://[^']+" src/**/*.tsx | remove_after_space | remove_after_backtick | remove_after_end_paren | remove_after_double_quote | remove_after_gt | remove_after_comma)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grep commands might exit with a non-zero status if no URLs are found in the source files, which would cause the script to fail due to set -euo pipefail. Consider adding || true to handle this case:

val1=$(grep -Eoh "(https)://[^']+" src/**/*.ts | remove_after_space | remove_after_backtick | remove_after_end_paren | remove_after_double_quote | remove_after_gt | remove_after_comma || true)

# Search all src/**/*.tsx files
val2=$(grep -Eoh "(https)://[^']+" src/**/*.tsx | remove_after_space | remove_after_backtick | remove_after_end_paren | remove_after_double_quote | remove_after_gt | remove_after_comma || true)

This ensures the script continues execution even when no URLs are found in either file type.

Suggested change
val1=$(grep -Eoh "(https)://[^']+" src/**/*.ts | remove_after_space | remove_after_backtick | remove_after_end_paren | remove_after_double_quote | remove_after_gt | remove_after_comma)
# Search all src/**/*.tsx files
val2=$(grep -Eoh "(https)://[^']+" src/**/*.tsx | remove_after_space | remove_after_backtick | remove_after_end_paren | remove_after_double_quote | remove_after_gt | remove_after_comma)
val1=$(grep -Eoh "(https)://[^']+" src/**/*.ts | remove_after_space | remove_after_backtick | remove_after_end_paren | remove_after_double_quote | remove_after_gt | remove_after_comma || true)
# Search all src/**/*.tsx files
val2=$(grep -Eoh "(https)://[^']+" src/**/*.tsx | remove_after_space | remove_after_backtick | remove_after_end_paren | remove_after_double_quote | remove_after_gt | remove_after_comma || true)

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +2 to +4
> [email protected] url-checker
> ./scripts/url-checker.sh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The baseline file in scripts/known/urls.txt contains npm command output headers that won't be present in actual script output during comparison:

> [email protected] url-checker
> ./scripts/url-checker.sh

These headers will cause the diff comparison to fail since they're artifacts of how the file was initially generated. Consider either:

  1. Removing these lines from the baseline file, or
  2. Modifying the url-checker:overwrite command to filter out these headers (similar to how circular-deps:overwrite uses sed and grep to clean its output)

This will ensure consistent comparisons between baseline and current output during CI checks.

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

while read line; do
# || true this curl request to bypass any failures and not have the scrip panic.
# the set -euo pipefail will cause a panic if a curl fails
status=$(curl -o /dev/null -s -w "%{http_code}\n" $line || true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl command doesn't properly handle URLs containing template variables like ${BASE_URL} or ${version} which appear in the urls.txt file. These variables won't be interpolated in the shell context, resulting in failed requests or misleading status codes. Consider adding pattern detection to either skip these templated URLs or implement special handling for strings containing ${} patterns to prevent false positives in the URL checking process.

Suggested change
status=$(curl -o /dev/null -s -w "%{http_code}\n" $line || true)
if [[ "$line" =~ \$\{.*\} ]]; then
echo "Skipping templated URL: $line"
continue
fi
status=$(curl -o /dev/null -s -w "%{http_code}\n" $line || true)

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

problematic+="$status\t$line\n"
fi
done < <(echo "$uniqued")
echo -e $problematic | column -t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The echo command should include double quotes around the $problematic variable to properly handle special characters and whitespace in the output:

echo -e "$problematic" | column -t

Without quotes, shell word splitting could break the formatting or cause unexpected behavior when the variable contains special characters.

Suggested change
echo -e $problematic | column -t
echo -e "$problematic" | column -t

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

Co-authored-by: Jace Browning <[email protected]>
@nadr0 nadr0 merged commit 7f9851a into main Jul 1, 2025
57 checks passed
@nadr0 nadr0 deleted the nadro/adhoc/url-checking branch July 1, 2025 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants