Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions skills/.experimental/app-testing/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
name: app-testing
description: Create a tailored app test checklist and then run in-depth testing for web apps, mobile apps, desktop apps, internal tools, and API-backed products. Use when Codex is asked to QA an app, test a feature or release, perform smoke, regression, or end-to-end testing, hunt bugs, validate UX or edge cases, or systematically explore product behavior before sign-off.
---

# App Testing

## Overview

Create a context-specific checklist before interacting with the app. Then execute deep testing against that checklist, expand coverage when risk appears, and report findings with reproduction steps, impact, and residual risk.

## Workflow

1. Gather context fast
- Inspect the repo, run instructions, routes or screens, auth roles, data model, feature flags, recent changes, and existing tests.
- Identify what "app" means in context: web UI, mobile app, desktop app, API-backed workflow, or a mixed system.
- Note blockers early: missing credentials, unavailable services, absent fixtures, or platform limits.

2. Build the checklist first
- Start from `references/checklist-template.md`.
- Keep only relevant sections and add app-specific flows.
- Include both requested scope and nearby regression risk.
- Make the checklist visible in the response or working notes before deep testing.
- Mark each item as `pending`, `passed`, `failed`, or `blocked` as testing progresses.

3. Run testing in deliberate passes
- Start with a smoke pass to confirm the app boots and the main entry path works.
- Cover primary user journeys end to end before spending time on polish issues.
- Run negative and edge-case passes after the happy path is stable.
- Validate integrations, persistence, permissions, and state transitions.
- Finish with broader quality passes such as responsiveness, accessibility, security sanity checks, and performance observations when applicable.

4. Go deep when issues appear
- Minimize repro steps.
- Check whether the problem is isolated or systemic.
- Probe adjacent states, roles, inputs, and recovery paths.
- Record the smallest reliable reproduction and the broadest credible impact.
- Use `references/test-depth-guide.md` for additional heuristics.

5. Report outcomes clearly
- List findings ordered by severity.
- For each confirmed issue include: title, severity, setup or account used, steps to reproduce, expected result, actual result, and evidence.
- Separate confirmed bugs from weak signals, assumptions, and untested areas.
- End with checklist coverage, blocked items, and the highest remaining risks.

## Coverage Priorities

Prefer this order unless the user gives tighter scope:

1. App start-up and environment sanity
2. Authentication, authorization, and role boundaries
3. Core value paths
4. Data creation, editing, deletion, and persistence
5. Validation, empty states, and error handling
6. Navigation, back, refresh, retry behavior, and session continuity
7. Integrations, background jobs, uploads, downloads, webhooks, or payments
8. Responsive, accessibility, localization, timezone, and browser or device differences
9. Security and performance sanity checks

## Testing Tactics

- Use the same tools the app uses in real life: local dev servers, seeded data, logs, network panels, CLI scripts, and database inspection.
- Cross-check UI claims against API responses or stored state when possible.
- Prefer representative, risk-based coverage over exhaustive but shallow clicking.
- Do not invent coverage. Call out what you could not test and why.
- If the user asks for a review, prioritize concrete findings over narrative.

## References

- Read `references/checklist-template.md` when preparing the first-pass checklist.
- Read `references/test-depth-guide.md` when broadening from smoke testing into deeper exploratory and regression coverage.
6 changes: 6 additions & 0 deletions skills/.experimental/app-testing/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
interface:
display_name: "App Testing"
short_description: "Create test checklists and run deep app testing"
icon_small: "./assets/favicon.svg"
icon_large: "./assets/favicon.png"
default_prompt: "Use $app-testing to build a checklist and test this app thoroughly."
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions skills/.experimental/app-testing/assets/favicon.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
65 changes: 65 additions & 0 deletions skills/.experimental/app-testing/references/checklist-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Checklist Template

Copy only the sections that apply to the app under test. Add app-specific journeys, roles, integrations, and release risks.

## Setup and access

- [ ] Install and start the app successfully
- [ ] Confirm required environment variables, services, and fixtures
- [ ] Confirm available test accounts, roles, and permissions
- [ ] Confirm logs or errors are visible somewhere useful

## Smoke pass

- [ ] App loads without fatal errors
- [ ] Main route or landing screen renders correctly
- [ ] Basic navigation works
- [ ] Critical dependencies respond

## Core user journeys

- [ ] Primary persona can complete the main task end to end
- [ ] Data written in one step appears correctly in later steps
- [ ] Refresh or reopen does not corrupt the flow
- [ ] Success states are clear and trustworthy

## Inputs and validation

- [ ] Required fields enforce constraints
- [ ] Boundary values behave correctly
- [ ] Invalid formats show useful errors
- [ ] Duplicate submissions and rapid repeat actions are handled safely

## Permissions and roles

- [ ] Signed-out behavior is correct
- [ ] Low-privilege users cannot access restricted actions
- [ ] Role changes take effect correctly
- [ ] Unauthorized API calls fail safely

## State and resilience

- [ ] Retry, cancel, back, and refresh behave safely
- [ ] Slow or failed network calls surface actionable feedback
- [ ] Partial failures do not leave corrupted state
- [ ] Concurrent edits or repeated requests are handled correctly

## Integrations and assets

- [ ] External services return expected outcomes or fail safely
- [ ] Uploads, downloads, payments, emails, or webhooks behave correctly
- [ ] Background jobs reflect status in the app
- [ ] Audit trails or logs capture important actions when relevant

## Broader quality

- [ ] Layout works on target device sizes and browsers
- [ ] Keyboard and screen-reader basics are usable
- [ ] Dates, currency, locale, and timezone behavior are correct
- [ ] Performance is acceptable on critical flows

## Closeout

- [ ] Findings are documented with repro steps and impact
- [ ] Failed and blocked items are called out explicitly
- [ ] Remaining risk areas are listed
65 changes: 65 additions & 0 deletions skills/.experimental/app-testing/references/test-depth-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Deep Testing Guide

Use this after the checklist exists and the smoke pass is complete.

## Expand around every important flow

For each important journey, probe:

- Entry
- Success
- Failure
- Retry
- Cancel
- Refresh or reopen
- Timeout or slow dependency
- Duplicate submit
- Permission change
- Stale session or expired token

## Web app heuristics

- Check deep links, browser back and forward, reload, and tab restore.
- Check form preservation, loading states, and optimistic UI.
- Check upload and download flows, clipboard behavior, and keyboard navigation.
- Check responsive layouts, overflow, focus handling, and error visibility.
- Check cache, local storage, and session transitions across tabs when relevant.

## Mobile and desktop heuristics

- Check relaunch, background and foreground transitions, and interrupted flows.
- Check poor network behavior, offline states, and recovery.
- Check OS permissions such as files, camera, notifications, or location when relevant.
- Check device-specific layout, scaling, and input behavior.

## API-backed and data-heavy heuristics

- Check idempotency, retries, pagination, sorting, and filtering.
- Check stale reads, race conditions, and partial writes.
- Check webhook retries, duplicate events, and eventual consistency.
- Check validation on both client and server boundaries.

## Data integrity checks

- Verify user-visible state against persisted state when possible.
- Check create, edit, delete, rollback, and cross-role visibility paths.
- Look for duplicate records, orphaned records, mismatched counts, or stuck jobs.

## Severity calibration

- Critical: data loss, auth bypass, payment or security failure, or app unusable.
- High: main flow blocked, incorrect persistent state, or major integration failure.
- Medium: degraded flow with workaround, incorrect validation, or notable UX break.
- Low: copy, layout, or polish issue with minor impact.

## Reporting minimum

Include:

- Environment or build under test
- Preconditions
- Steps to reproduce
- Expected result
- Actual result
- Evidence
- Scope or suspected blast radius