openai · MrTalhaMTS · Mar 30, 2026
diff --git a/skills/.experimental/app-testing/SKILL.md b/skills/.experimental/app-testing/SKILL.md
@@ -0,0 +1,71 @@
+---
+name: app-testing
+description: Create a tailored app test checklist and then run in-depth testing for web apps, mobile apps, desktop apps, internal tools, and API-backed products. Use when Codex is asked to QA an app, test a feature or release, perform smoke, regression, or end-to-end testing, hunt bugs, validate UX or edge cases, or systematically explore product behavior before sign-off.
+---
+
+# App Testing
+
+## Overview
+
+Create a context-specific checklist before interacting with the app. Then execute deep testing against that checklist, expand coverage when risk appears, and report findings with reproduction steps, impact, and residual risk.
+
+## Workflow
+
+1. Gather context fast
+   - Inspect the repo, run instructions, routes or screens, auth roles, data model, feature flags, recent changes, and existing tests.
+   - Identify what "app" means in context: web UI, mobile app, desktop app, API-backed workflow, or a mixed system.
+   - Note blockers early: missing credentials, unavailable services, absent fixtures, or platform limits.
+
+2. Build the checklist first
+   - Start from `references/checklist-template.md`.
+   - Keep only relevant sections and add app-specific flows.
+   - Include both requested scope and nearby regression risk.
+   - Make the checklist visible in the response or working notes before deep testing.
+   - Mark each item as `pending`, `passed`, `failed`, or `blocked` as testing progresses.
+
+3. Run testing in deliberate passes
+   - Start with a smoke pass to confirm the app boots and the main entry path works.
+   - Cover primary user journeys end to end before spending time on polish issues.
+   - Run negative and edge-case passes after the happy path is stable.
+   - Validate integrations, persistence, permissions, and state transitions.
+   - Finish with broader quality passes such as responsiveness, accessibility, security sanity checks, and performance observations when applicable.
+
+4. Go deep when issues appear
+   - Minimize repro steps.
+   - Check whether the problem is isolated or systemic.
+   - Probe adjacent states, roles, inputs, and recovery paths.
+   - Record the smallest reliable reproduction and the broadest credible impact.
+   - Use `references/test-depth-guide.md` for additional heuristics.
+
+5. Report outcomes clearly
+   - List findings ordered by severity.
+   - For each confirmed issue include: title, severity, setup or account used, steps to reproduce, expected result, actual result, and evidence.
+   - Separate confirmed bugs from weak signals, assumptions, and untested areas.
+   - End with checklist coverage, blocked items, and the highest remaining risks.
+
+## Coverage Priorities
+
+Prefer this order unless the user gives tighter scope:
+
+1. App start-up and environment sanity
+2. Authentication, authorization, and role boundaries
+3. Core value paths
+4. Data creation, editing, deletion, and persistence
+5. Validation, empty states, and error handling
+6. Navigation, back, refresh, retry behavior, and session continuity
+7. Integrations, background jobs, uploads, downloads, webhooks, or payments
+8. Responsive, accessibility, localization, timezone, and browser or device differences
+9. Security and performance sanity checks
+
+## Testing Tactics
+
+- Use the same tools the app uses in real life: local dev servers, seeded data, logs, network panels, CLI scripts, and database inspection.
+- Cross-check UI claims against API responses or stored state when possible.
+- Prefer representative, risk-based coverage over exhaustive but shallow clicking.
+- Do not invent coverage. Call out what you could not test and why.
+- If the user asks for a review, prioritize concrete findings over narrative.
+
+## References
+
+- Read `references/checklist-template.md` when preparing the first-pass checklist.
+- Read `references/test-depth-guide.md` when broadening from smoke testing into deeper exploratory and regression coverage.
diff --git a/skills/.experimental/app-testing/agents/openai.yaml b/skills/.experimental/app-testing/agents/openai.yaml
@@ -0,0 +1,6 @@
+interface:
+  display_name: "App Testing"
+  short_description: "Create test checklists and run deep app testing"
+  icon_small: "./assets/favicon.svg"
+  icon_large: "./assets/favicon.png"
+  default_prompt: "Use $app-testing to build a checklist and test this app thoroughly."
diff --git a/skills/.experimental/app-testing/assets/favicon.png b/skills/.experimental/app-testing/assets/favicon.png
diff --git a/skills/.experimental/app-testing/assets/favicon.svg b/skills/.experimental/app-testing/assets/favicon.svg
diff --git a/skills/.experimental/app-testing/references/checklist-template.md b/skills/.experimental/app-testing/references/checklist-template.md
@@ -0,0 +1,65 @@
+# Checklist Template
+
+Copy only the sections that apply to the app under test. Add app-specific journeys, roles, integrations, and release risks.
+
+## Setup and access
+
+- [ ] Install and start the app successfully
+- [ ] Confirm required environment variables, services, and fixtures
+- [ ] Confirm available test accounts, roles, and permissions
+- [ ] Confirm logs or errors are visible somewhere useful
+
+## Smoke pass
+
+- [ ] App loads without fatal errors
+- [ ] Main route or landing screen renders correctly
+- [ ] Basic navigation works
+- [ ] Critical dependencies respond
+
+## Core user journeys
+
+- [ ] Primary persona can complete the main task end to end
+- [ ] Data written in one step appears correctly in later steps
+- [ ] Refresh or reopen does not corrupt the flow
+- [ ] Success states are clear and trustworthy
+
+## Inputs and validation
+
+- [ ] Required fields enforce constraints
+- [ ] Boundary values behave correctly
+- [ ] Invalid formats show useful errors
+- [ ] Duplicate submissions and rapid repeat actions are handled safely
+
+## Permissions and roles
+
+- [ ] Signed-out behavior is correct
+- [ ] Low-privilege users cannot access restricted actions
+- [ ] Role changes take effect correctly
+- [ ] Unauthorized API calls fail safely
+
+## State and resilience
+
+- [ ] Retry, cancel, back, and refresh behave safely
+- [ ] Slow or failed network calls surface actionable feedback
+- [ ] Partial failures do not leave corrupted state
+- [ ] Concurrent edits or repeated requests are handled correctly
+
+## Integrations and assets
+
+- [ ] External services return expected outcomes or fail safely
+- [ ] Uploads, downloads, payments, emails, or webhooks behave correctly
+- [ ] Background jobs reflect status in the app
+- [ ] Audit trails or logs capture important actions when relevant
+
+## Broader quality
+
+- [ ] Layout works on target device sizes and browsers
+- [ ] Keyboard and screen-reader basics are usable
+- [ ] Dates, currency, locale, and timezone behavior are correct
+- [ ] Performance is acceptable on critical flows
+
+## Closeout
+
+- [ ] Findings are documented with repro steps and impact
+- [ ] Failed and blocked items are called out explicitly
+- [ ] Remaining risk areas are listed
diff --git a/skills/.experimental/app-testing/references/test-depth-guide.md b/skills/.experimental/app-testing/references/test-depth-guide.md
@@ -0,0 +1,65 @@
+# Deep Testing Guide
+
+Use this after the checklist exists and the smoke pass is complete.
+
+## Expand around every important flow
+
+For each important journey, probe:
+
+- Entry
+- Success
+- Failure
+- Retry
+- Cancel
+- Refresh or reopen
+- Timeout or slow dependency
+- Duplicate submit
+- Permission change
+- Stale session or expired token
+
+## Web app heuristics
+
+- Check deep links, browser back and forward, reload, and tab restore.
+- Check form preservation, loading states, and optimistic UI.
+- Check upload and download flows, clipboard behavior, and keyboard navigation.
+- Check responsive layouts, overflow, focus handling, and error visibility.
+- Check cache, local storage, and session transitions across tabs when relevant.
+
+## Mobile and desktop heuristics
+
+- Check relaunch, background and foreground transitions, and interrupted flows.
+- Check poor network behavior, offline states, and recovery.
+- Check OS permissions such as files, camera, notifications, or location when relevant.
+- Check device-specific layout, scaling, and input behavior.
+
+## API-backed and data-heavy heuristics
+
+- Check idempotency, retries, pagination, sorting, and filtering.
+- Check stale reads, race conditions, and partial writes.
+- Check webhook retries, duplicate events, and eventual consistency.
+- Check validation on both client and server boundaries.
+
+## Data integrity checks
+
+- Verify user-visible state against persisted state when possible.
+- Check create, edit, delete, rollback, and cross-role visibility paths.
+- Look for duplicate records, orphaned records, mismatched counts, or stuck jobs.
+
+## Severity calibration
+
+- Critical: data loss, auth bypass, payment or security failure, or app unusable.
+- High: main flow blocked, incorrect persistent state, or major integration failure.
+- Medium: degraded flow with workaround, incorrect validation, or notable UX break.
+- Low: copy, layout, or polish issue with minor impact.
+
+## Reporting minimum
+
+Include:
+
+- Environment or build under test
+- Preconditions
+- Steps to reproduce
+- Expected result
+- Actual result
+- Evidence
+- Scope or suspected blast radius