ADR-0026: UI Testing Strategy

DateFebruary 24, 2026

CategoryInfrastructure

Context

The UI layer described in ADR-0025 now has a stable, feature-first package layout under ui/. Test coverage across that layer is currently low and inconsistent:

Several *ViewModel classes have no unit tests (ReferenceViewModel, SomedayMaybeViewModel, a full-coverage EditItemViewModel test is absent beyond the single error-path file).
No feature has a Compose UI test for its *Screen composable.
Shared components in ui/components/ and feature-scoped components in ui/features/*/components/ are entirely untested.
The existing NavigationTest.kt covers route constant naming but does not assert that every registered destination actually renders.
The three androidTest user-journey smoke tests (PrimaryJourneySmokeTest, EditAndReclassifySmokeTest, SomedayPromotionSmokeTest) are the only end-to-end coverage in the project.

The project already carries Turbine (1.2.1), MockK (1.14.9), Robolectric (4.16.1), the Compose BOM (2026.01.01), and Espresso Core (3.7.0) in libs.versions.toml, so the necessary infrastructure is in place. What is missing is a documented, enforceable policy that specifies what to test, at which layer, with which tools, and to what coverage threshold.

Decision

We will enforce a three-tier testing pyramid for all UI code, with each tier having a defined scope, tooling choice, co-location rule, and minimum coverage expectation. No feature may be considered complete unless all three applicable tiers have been addressed.

Tier 1 — ViewModel unit tests (JVM, fast)

Scope: Every *ViewModel class. Tests cover state transitions, business-logic branching inside the ViewModel, error paths, and any derived StateFlow / SharedFlow emission.

Tooling:

kotlinx-coroutines-test with UnconfinedTestDispatcher as the default scheduler so StateFlow emissions resolve synchronously.
Turbine for all Flow/StateFlow/SharedFlow assertions — prefer turbineScope { } over collectList() hacks.
MockK for mocking infrastructure dependencies (repositories, use cases). Use coEvery / coVerify for suspend functions.
Prefer hand-written Fake* implementations (e.g. FakeInboxRepository) over MockK stubs for repositories that are exercised by many test cases — this matches the existing FakeLlmService pattern already in the codebase.

Canonical test class template:

@OptIn(ExperimentalCoroutinesApi::class)
class InboxViewModelTest {

    private val testDispatcher = UnconfinedTestDispatcher()

    @get:Rule
    val mainDispatcherRule = MainDispatcherRule(testDispatcher)   // custom TestRule

    private val fakeRepository = FakeInboxRepository()
    private lateinit var viewModel: InboxViewModel

    @Before fun setUp() {
        viewModel = InboxViewModel(fakeRepository)
    }

    @Test fun `initial state is Loading`() = runTest {
        // assert first emission
    }

    @Test fun `inbox items emitted after load`() = runTest {
        viewModel.uiState.test {
            fakeRepository.emit(listOf(anInboxItem()))
            val state = awaitItem()
            assertThat(state.items).hasSize(1)
        }
    }
}

Co-location rule: Every ui/features/<feature>/*ViewModel.kt must have a corresponding ui/features/<feature>/*ViewModelTest.kt under app/src/test/.

Coverage threshold: ≥ 80 % line coverage per ViewModel class, measured by Kover on the CI pipeline (see ADR-0022).

Tier 2 — Compose UI tests (JVM via Robolectric, medium)

Scope:

Every *Screen composable — at minimum, one test per distinct UI state the screen can render (Loading, Content, Empty, Error where applicable).
Every shared component in ui/components/ — at minimum a smoke test that the component renders without crashing across its primary variants.
Feature-scoped components in ui/features/*/components/ — tested when they contain non-trivial conditional rendering or interaction logic.

Tooling:

compose-ui-test-junit4 (androidx.compose.ui:ui-test-junit4) with createComposeRule().
Robolectric as the test runtime so these run on the JVM without an emulator. Annotate test classes with @RunWith(RobolectricTestRunner::class).
All screens are tested in state-driven isolation: the composable is called with an explicit, pre-constructed state value and a no-op or recording lambda for each callback. ViewModels are not instantiated inside Compose UI tests.
ComposeContentTestRule.onNodeWithTag() / onNodeWithText() / onNodeWithContentDescription() for assertions. testTag modifiers must be added to stateful or interactive elements that are otherwise difficult to target.
The LocusFlowTheme {} wrapper must be applied inside each setContent {} call so tokens and typography resolve correctly.

Canonical screen test pattern:

@RunWith(RobolectricTestRunner::class)
class InboxScreenTest {

    @get:Rule val composeTestRule = createComposeRule()

    @Test fun `loading state shows progress indicator`() {
        composeTestRule.setContent {
            LocusFlowTheme {
                InboxScreen(
                    uiState = InboxUiState.Loading,
                    onAddItem = {},
                    onItemClick = {},
                )
            }
        }
        composeTestRule.onNodeWithTag("LoadingIndicator").assertIsDisplayed()
    }

    @Test fun `content state renders item list`() {
        val items = listOf(anInboxItem(title = "Buy groceries"))
        composeTestRule.setContent {
            LocusFlowTheme {
                InboxScreen(
                    uiState = InboxUiState.Content(items),
                    onAddItem = {},
                    onItemClick = {},
                )
            }
        }
        composeTestRule.onNodeWithText("Buy groceries").assertIsDisplayed()
    }
}

Canonical shared component test pattern:

@RunWith(RobolectricTestRunner::class)
class PrimaryButtonTest {

    @get:Rule val composeTestRule = createComposeRule()

    @Test fun `renders label and invokes onClick`() {
        var clicked = false
        composeTestRule.setContent {
            LocusFlowTheme { PrimaryButton(label = "Save", onClick = { clicked = true }) }
        }
        composeTestRule.onNodeWithText("Save").performClick()
        assertThat(clicked).isTrue()
    }

    @Test fun `disabled state blocks click`() {
        var clicked = false
        composeTestRule.setContent {
            LocusFlowTheme { PrimaryButton(label = "Save", enabled = false, onClick = { clicked = true }) }
        }
        composeTestRule.onNodeWithText("Save").performClick()
        assertThat(clicked).isFalse()
    }
}

Co-location rules:

Screen tests: app/src/test/.../ui/features/<feature>/*ScreenTest.kt
Shared component tests: app/src/test/.../ui/components/*Test.kt
Feature component tests: app/src/test/.../ui/features/<feature>/components/*Test.kt

Coverage threshold: Every distinct sealed-interface state branch of a *State must have at least one Compose UI test exercising it. No numeric line-coverage gate is applied here; branch completeness is verified by code review.

Tier 3 — User-journey instrumented tests (Emulator / device, slow)

The existing smoke tests in app/src/androidTest/.../userjourney/ cover end-to-end critical paths. This tier is retained as-is; the strategy for it is documented in ADR-0022 (CI pipeline) and ADR-0020 (LLM testing). No new user-journey test should be added unless:

It covers a cross-feature interaction that cannot be exercised at Tier 1 or 2.
It is validating a runtime capability tied to the Android OS (permissions, deep links, back-stack behaviour against real Activity lifecycle).

Navigation tests

ui/navigation/NavigationTest.kt is expanded to assert:

Every constant in NavigationRoutes has a corresponding composable(...) registration reachable via locusFlowNavGraph.
Navigation to each destination via navController.navigate(route) results in the expected screen composable being displayed, verified with a stable testTag on each screen's root node.

These tests use createComposeRule() + a real NavHostController backed by a TestNavHostController and run under Robolectric (Tier 2 tooling).

Test data and fixtures

A shared app/src/test/.../testutil/ package provides:

File	Contents
`Fixtures.kt`	Top-level builder functions: `anInboxItem()`, `aProcessedItem()`, `aDailyReflection()`, etc. All parameters have sensible defaults so tests only specify what they care about.
`Fake*.kt`	Hand-written fake implementations of each `*Repository` interface (`FakeInboxRepository`, `FakeProcessedItemRepository`, etc.), emitting `MutableStateFlow` values that tests can control.
`MainDispatcherRule.kt`	`TestWatcher` that installs `UnconfinedTestDispatcher` as `Dispatchers.Main` for the duration of each test.

The existing FakeLlmService in domain/llm/ is the reference implementation. All new fakes follow the same pattern.

What is explicitly out of scope

Repository and DAO tests — covered by the existing strategy in data/ tests; this ADR does not modify them.
Use-case tests — covered by the existing SummarizeDailyReflectionUseCaseTest pattern; out of scope here.
LLM integration tests — governed by ADR-0020.
Performance / startup benchmarks — tracked in the roadmap; not part of this ADR.
Screenshot / golden regression tests — deferred until the UI has been reviewed against final design. A future ADR will introduce screenshot testing (e.g. Roborazzi) once the visual design is stable.
Accessibility audits — deferred; Semantics correctness is implicitly covered by content-description assertions in Tier 2 tests, which is sufficient for now.

Coverage enforcement

Kover is configured in app/build.gradle.kts with the following rules:

koverReport {
    filters { excludes { packages("*.di", "*.data.local.entity") } }
    verify {
        rule("ViewModel coverage") {
            bound {
                minValue = 80
                metric = MetricType.LINE
                aggregation = AggregationType.COVERED_PERCENTAGE
                filter { includes { classes("*ViewModel") } }
            }
        }
    }
}

The verifyKoverReportDebug Gradle task is added to the test stage of the CI pipeline (ADR-0022). A pull request that drops ViewModel line coverage below 80 % fails the pipeline.

Rationale

Tier 1 as the primary investment. ViewModels encapsulate all meaningful UI logic. Testing them on the JVM is fast, deterministic, and gives the highest return on investment. Turbine eliminates the brittle delay()-based Flow assertions prevalent in the existing tests.
State-driven Compose UI tests. Passing a concrete state value directly to a composable is simpler, more refactor-resilient, and faster than any screenshot-based approach. Functional correctness (state rendering, interaction callbacks) is what matters at this stage of the project.
Robolectric over emulator for Tier 2. Running Compose UI tests on the JVM via Robolectric cuts feedback time from minutes to seconds and makes them viable inside the standard unit-test Gradle task. Emulator tests are reserved for coverage that genuinely requires the Android runtime.
Fakes over mocks for repositories. Mocks specify behaviour per-call, which makes tests brittle to argument changes. Fake repositories hold a MutableStateFlow that any test can set up without knowing the call signature — the same design the FakeLlmService already uses.
Co-location with features. Mirror the source layout under test/ so that when a developer opens a feature folder they immediately see both the production code and its tests. This is consistent with ADR-0025's feature-first principle.
Screenshot testing deferred. Visual regression testing requires stable, design-reviewed UI. Introducing golden images before the design is finalised would create noise from intentional visual churn rather than catching regressions.

Consequences

Positive: A developer adding a new feature has an unambiguous checklist: ViewModel test → Screen state tests → Navigation registration test.
Positive: The CI pipeline gains a deterministic, fast test suite that runs without an emulator for the majority of UI coverage.
Positive: Kover enforcement prevents ViewModel coverage regression without requiring manual review of coverage reports.
Negative: Robolectric does not perfectly replicate all Android rendering behaviour. Tests that rely on hardware-accelerated canvas operations or system WindowInsets may behave differently on a real device.
Follow-up: MainDispatcherRule, Fixtures.kt, and all Fake* implementations should be created in a single bootstrapping task before new feature tests are written, to prevent duplication.
Follow-up: Once placeholder screens (Vision, Settings, Projects) are promoted to full features (per ADR-0025), their ViewModel and Screen tests must be added in the same pull request as the implementation.
Follow-up: Accessibility audits (TalkBack, minimum touch-target size) should be formalised in a future ADR once the feature set stabilises.

Alternatives Considered

Espresso-only instrumented tests for UI — Rejected. Instrumented tests are slow (minutes per run on CI), require an emulator, and are difficult to run locally without configuration overhead. The existing smoke tests cover the end-to-end surface adequately; adding per-screen Espresso tests would duplicate Tier 2 coverage at much higher cost.
Screenshot testing now (Roborazzi or Paparazzi) — Deferred, not rejected. Golden image tests are high-value once the visual design is stable, but introducing them before the design review phase would generate constant golden-update churn with no regression signal. This will be addressed in a follow-up ADR.
MockK for all test doubles including repositories — Rejected. As noted above, mock-per-call verification is fragile under interface changes. The FakeLlmService pattern is already established in the codebase as the preferred approach for infrastructure fakes; this ADR extends that convention uniformly.
Coverage thresholds on composables — Rejected at this stage. Line coverage metrics for composables are noisy because tooling counts the Compose runtime's generated code rather than meaningful branches. Branch completeness per sealed-interface state, enforced by review, is a more meaningful signal.

Notes

Related ADRs: ADR-0003 (MVVM/UseCase layering), ADR-0006 (UI state modelling), ADR-0020 (LLM testing), ADR-0022 (CI pipeline), ADR-0025 (UI package structure).
The testTag convention for screen root nodes should be "<FeatureName>Screen" (e.g. "InboxScreen", "ProcessingScreen") to allow stable targeting from both Tier 2 tests and navigation tests.
Screenshot / golden regression testing is deferred pending design review. When ready, Roborazzi is the preferred tool as it reuses the existing Robolectric setup and supports captureRoboImage() directly on Compose nodes.

Back to All Decisions