UI testing (the "Playwright for native") 🎯¶
You already know how to build screens, wire events and run them in the simulator.
Now let's test a screen the way Playwright tests a web page — except with no
browser, no pixels and no magic sleep.
The tempestroid test driver drives your tree (the IR): it mounts the app,
finds nodes by key/text/semantics, injects the same typed events a real tap
produces, and asserts the result — with auto-wait built into every action and
assertion (the tree must settle before proceeding).
Why it's stronger than Playwright
Playwright talks to the DOM. Here the "DOM" is our IR — identical across
renderers. So the same script runs on the headless backend (fast, local)
and on the emulator backend (a REAL Compose app on an Android emulator) —
with no change to the test.
The minimal example¶
A UI test file is an ordinary app module — it defines view(app) and
make_state() (the same contract a runnable app satisfies) — plus one or more
async def test_*(page) functions.
Let's test the counter. examples/counter/app.py already exists; we add an
examples/counter/test_counter.py next to it:
from app import make_state, view # re-use the sibling app's contract
from tempestroid.testing import Page
__all__ = ["make_state", "view"]
async def test_counter_starts_at_zero(page: Page) -> None:
await page.expect_text("Count: 0")
async def test_increment_button_updates_count(page: Page) -> None:
await page.expect_text("Count: 0")
await page.tap(page.get_by_key("inc")) # tap the "+" button
await page.expect_text("Count: 1") # auto-wait until the UI settles
Run it with:
Output:
[PASS] test_counter_starts_at_zero
[PASS] test_increment_button_updates_count
2/2 passed on target 'headless'.
🚀 Done: an end-to-end flow test (event → state → re-render) with no renderer and no timing flake.
Understanding it piece by piece¶
page — the mounted app¶
Each test_* receives a page: an app freshly mounted on a backend. Every
test gets its own page and its own state, so one test never contaminates
another.
Locators — lazy queries¶
A locator is a query, not a captured node. It resolves against the
current tree every time an action or assertion needs it — so it survives a
rebuild. Create them off the page:
page.get_by_key("inc") # by the stable IR key
page.get_by_text("Count: 0") # by text (substring)
page.get_by_text("Count: 0", exact=True) # whole-text exact match
page.get_by_role("button", name="Save") # by Semantics role/label (E9)
page.get_by_semantics(label="counter") # by accessibility semantics
Always late resolution
locator.first / locator.all() / locator.count() walk the live scene.
An action (tap/fill) uses locator.resolve(), which requires exactly
one node: zero or many → a clear error, never a silent "the first one".
Actions — injected typed events¶
await page.tap(locator) # inject a TapEvent into the node's on_click/on_tap
await page.fill(locator, "abc") # inject a TextChangeEvent(value="abc") into on_change
await page.back() # pop the navigation stack (system back)
The action resolves the locator, picks the node's handler, validates the payload
into the typed event the widget declares (via event_schemas) and calls the
handler — exactly the path the device's dispatchEvent and Qt's _invoke take.
Auto-wait — the end of sleep¶
Every assertion waits for the tree to settle before checking:
await page.expect_text("Count: 2") # until some node's text contains it
await page.expect_visible(locator) # until the locator matches ≥ 1 node
await page.expect_count(locator, 3) # until the locator matches exactly 3 nodes
"Settle" = no rebuild pending in the coalesced cycle (A4) and two consecutive
snapshots equal. No sleep: the wait ends the instant the UI stops changing, or
times out (default 5s) with the current tree dumped for diagnosis.
Async handlers work for free
An async handler that awaits before set_state is awaited by the action
before the settle. The "+ (async)" button test passes with no sleep.
snapshot() — the IR "screenshot"¶
It is the headless analogue of a screenshot: a stable serialization of the tree (types, keys, string/number props, children) for golden comparison.
Pixel screenshots belong to the renderer
A real pixel screenshot is the Qt/device backend's job and lands with Trilho
F8. The headless snapshot() covers what a flow test cares about: the tree's
shape.
Failures produce diagnostics¶
When an assertion does not hold within the timeout, the error carries the tree at the moment of failure:
[FAIL] test_increment
AssertionError: expected text 'Count: 9' to be visible
Traceback (most recent call last):
...
tree at failure:
{'root': {'type': 'Column', 'key': None, 'props': {}, 'children': [...]}}
Targets (--target)¶
| Target | State | What it drives |
|---|---|---|
headless |
✅ available | The IR/state/events in-process, no renderer |
emulator |
✅ available | A REAL app through the Compose renderer on an Android emulator |
qt |
⏳ reserved | The Qt simulator in-process |
device |
⏳ reserved | Compose on a physical device, over the bridge |
Since they all speak the same IR + typed events, your headless test runs on the other targets unchanged.
The emulator target — REAL Compose render + N in parallel¶
# one emulator (reuses a running one, e.g. emulator-5554)
uv run tempest uitest examples/counter/test_counter.py --target emulator
# N isolated emulators in parallel (capped by host CPU/RAM)
uv run tempest uitest examples/ --target emulator -j 3
# multiple AGENTS in parallel: each with its own PRIVATE adb server
uv run tempest uitest examples/ --target emulator -j 2 --isolate-adb
Per-agent adb-server isolation (--isolate-adb)
The emulator instances are already isolated (own port/userdata). The last
shared resource is the adb server (TCP 5037 by default): when two agents
drive emulators at once they both hammer that one server, it wedges, and the
recovery path used to kill every adb process — taking down the sibling
agent's server too. With --isolate-adb (or -P <port>) the run uses a
private server (ANDROID_ADB_SERVER_PORT, auto-allocated), and recovery
(device_loop.sh) is scoped to that port only. Each agent is then fully
isolated — this is what closes true N-emulator parallelism across agents.
The emulator backend (EmulatorBackend) drives a real app through the
Compose renderer: it owns a DevServer in harness mode, adb -s <serial>
reverses, and launches the host in dev mode. The device's code-push client:
- device → host: POSTs the
mountJSON and everypatchbatch back; the server keeps a host-side mirror (Scene, viatempestroid.testing.mirror).page.scene()reads that mirror. - host → device:
page.tap(...)reads the handler token from the mirrored node ({"$handler": token}), enqueues the event; the client long-polls, consumes it, and feedsDeviceApp.handle_event— the same path a real Compose tap takes — and the resulting patch flows back and updates the mirror. Noadb input tapby coordinate, no C/JNI change.
Auto-wait (settle) polls the mirror revision until it has been quiet for a
short window AND the enqueued event was consumed — no fixed sleep as the primary
mechanism. EmulatorPool allocates/recycles N isolated instances (port
5554 + i*2), reusing running emulators and capping N by the host's CPU/RAM.
REAL pixel screenshots
On the emulator target every test saves a screenshot under
docs/assets/emulator/uitest/<test>.png captured via adb exec-out
screencap — real Compose pixels, not the tree dump. That is the proof the
device leaf behaves like the core.
In code (no CLI)¶
You can also drive a backend directly, handy inside a pytest:
from tempestroid.testing import HeadlessBackend, Page
page = Page(HeadlessBackend(make_state, view))
await page.mount()
await page.tap(page.get_by_key("inc"))
await page.expect_text("Count: 1")
Recap¶
- A UI test is an app +
async def test_*(page)functions. - Locators (
get_by_key/get_by_text/get_by_role/get_by_semantics) resolve against the live IR. - Actions (
tap/fill/back) inject typed events — the same path the real renderer takes. - Assertions (
expect_text/expect_visible/expect_count) auto-wait: they wait for the tree to settle, nosleep, no flake. - The
headlessbackend drives the renderer-agnostic core; theemulatorbackend runs the same script against a REAL Compose app on an Android emulator (-j Nin parallel, a pixel screenshot per test). ✅