docs: complete project research

2026-03-23 19:45:06 +01:00
parent 5b273e17bd
commit e4d59d4788
5 changed files with 1199 additions and 0 deletions
--- a/.planning/research/PITFALLS.md
+++ b/.planning/research/PITFALLS.md
@@ -0,0 +1,280 @@
+# Domain Pitfalls
+
+**Domain:** Go dashboard — SQLite to dual-database (SQLite + PostgreSQL) migration + dashboard UX improvements
+**Researched:** 2026-03-23
+**Confidence:** HIGH for SQLite/Go-specific pitfalls (sourced directly from codebase evidence); MEDIUM for PostgreSQL dialect differences (from training knowledge, verified against known Go `database/sql` contract)
+
+---
+
+## Critical Pitfalls
+
+Mistakes that cause rewrites, data loss, or silent test passes.
+
+---
+
+### Pitfall 1: Leaking SQLite-specific SQL into "shared" query layer
+
+**What goes wrong:** When adding a PostgreSQL path, developers copy existing SQLite queries and swap the driver — but keep SQLite-isms in the SQL itself. The two most common in this codebase: `datetime('now')` (SQLite built-in, line 225) and `INSERT OR REPLACE` (SQLite only, lines 109 and 352). Both fail silently or loudly on PostgreSQL. PostgreSQL uses `NOW()` and `INSERT ... ON CONFLICT DO UPDATE`.
+
+**Why it happens:** The queries are embedded as raw strings throughout handler functions rather than in a dedicated SQL layer. Each query must be individually audited and conditionally branched or abstracted.
+
+**Consequences:** PostgreSQL path silently produces wrong results (`datetime('now')` evaluates as a column name or throws an error) or panics on `INSERT OR REPLACE` (PostgreSQL does not support this syntax at all).
+
+**Warning signs:**
+- Any raw `db.Exec` or `db.Query` call with `datetime(`, `OR REPLACE`, `AUTOINCREMENT`, `PRAGMA`, or `?` placeholders — all must be replaced or branched for PostgreSQL.
+- `?` is the SQLite/MySQL placeholder; PostgreSQL requires `$1`, `$2`, etc.
+
+**Prevention:**
+- Define a `Store` interface with methods (`UpsertEvent`, `GetUpdates`, `DismissImage`, etc.) and provide two concrete implementations: `sqliteStore` and `pgStore`.
+- Never write raw SQL in HTTP handlers. All SQL lives in the store implementation only.
+- Add an integration test that runs against both stores for every write operation; if the schema or SQL diverges the test fails at the driver level.
+
+**Phase mapping:** Must be resolved before any PostgreSQL code is written — this is the foundational refactor that makes dual-DB possible without a maintenance nightmare.
+
+---
+
+### Pitfall 2: `INSERT OR REPLACE` silently deletes tag assignments before PostgreSQL is even added
+
+**What goes wrong:** `UpdateEvent()` (line 109) uses `INSERT OR REPLACE INTO updates`. SQLite implements this as DELETE + INSERT when a conflict is found. Because `tag_assignments.image` is a foreign key referencing `updates.image`, the DELETE step removes the child row — unless `PRAGMA foreign_keys = ON` is active (it is not, confirmed at line 58-103). Even with FK enforcement, the CASCADE would delete the assignment rather than preserve it. The result: every time DIUN sends a new event for a tracked image, its tag assignment vanishes.
+
+**Why it happens:** The intent of `INSERT OR REPLACE` is to update existing rows, but the mechanism is destructive. The UPSERT syntax (`INSERT ... ON CONFLICT(image) DO UPDATE SET ...`) is the correct tool and has been available since SQLite 3.24 (2018).
+
+**Consequences:** This bug is already in production. Users lose tag assignments every time an image receives a new DIUN event. This directly contributed to the trust erosion described in PROJECT.md. Adding PostgreSQL without fixing this first means the bug ships in both DB paths.
+
+**Warning signs:**
+- Tag assignments disappear after DIUN reports a new update for a previously-tagged image.
+- `TestDismissHandler_ReappearsAfterNewWebhook` tests the acknowledged-state reset correctly, but no test asserts that the tag survives a second `UpdateEvent` call on the same image.
+
+**Prevention:**
+- Replace line 109 with: `INSERT INTO updates (...) VALUES (...) ON CONFLICT(image) DO UPDATE SET diun_version=excluded.diun_version, ...` (preserves all other columns, including nothing that touches `tag_assignments`).
+- Add `PRAGMA foreign_keys = ON` immediately after `sql.Open` in `InitDB()`.
+- Add a regression test: `UpdateEvent` twice on the same image with a tag assigned between calls; assert tag survives.
+
+**Phase mapping:** Fix before any other work — this is a data-correctness bug affecting existing users.
+
+---
+
+### Pitfall 3: Global package-level state makes database abstraction structurally impossible without a refactor
+
+**What goes wrong:** The codebase uses `var db *sql.DB` and `var mu sync.Mutex` at package level (lines 48-52). The `InitDB` function sets the global `db`. Adding PostgreSQL means calling a different `sql.Open` and storing it — but there is only one `db` variable. You cannot run SQLite and PostgreSQL tests in the same process, cannot dependency-inject the store into handlers, and cannot test the two stores independently.
+
+**Why it happens:** The package was written as a single-instance tool, which was appropriate at first. Dual-DB support requires the concept of a "store" that can be swapped — which requires struct-based design.
+
+**Consequences:** If you try to add PostgreSQL without refactoring, you end up with `if dbType == "postgres" { ... } else { ... }` branches scattered across every handler. This is unmaintainable, untestable, and will break if a third DB is ever added.
+
+**Warning signs:**
+- Any attempt to pass a PostgreSQL `*sql.DB` to the existing handlers requires changing the global variable, which breaks concurrent tests.
+- The test file uses `UpdatesReset()` to reset global state between tests — a design smell that signals the global state problem.
+
+**Prevention:**
+- Introduce `type Server struct { store Store; secret string }` where `Store` is an interface.
+- Move all handler functions to methods on `Server`.
+- `InitDB` becomes a factory: `NewSQLiteStore(path)` or `NewPostgresStore(dsn)` returning the interface.
+- Tests construct a fresh `Server` with an in-memory SQLite store; no global state to reset.
+
+**Phase mapping:** This refactor is the prerequisite for dual-DB. Do it as the first step of the milestone, before any PostgreSQL driver work.
+
+---
+
+### Pitfall 4: Schema migration strategy does not scale to dual-DB or multi-version upgrades
+
+**What goes wrong:** The current migration strategy is a single silent `ALTER TABLE` at line 87: `_, _ = db.Exec("ALTER TABLE updates ADD COLUMN acknowledged_at TEXT")`. This works for one SQLite column addition but fails in two ways when expanded: (1) PostgreSQL requires different syntax and error handling, (2) there is no version tracking, so there is no way to know which migrations have already run on an existing database.
+
+**Why it happens:** The approach was acceptable for a single-column addition in a personal project. It does not generalise.
+
+**Consequences:**
+- On PostgreSQL, `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` is available but the silent `_, _` error swallow pattern will hide real migration failures.
+- If a second column is added in a future milestone, there is no mechanism to skip it on databases that already have it (SQLite's `IF NOT EXISTS` on `ADD COLUMN` is only available in SQLite 3.37+).
+- Existing user databases upgrading from the current version need all migrations to run in order and idempotently.
+
+**Warning signs:**
+- More than one `ALTER TABLE` in `InitDB()`.
+- Any `_, _ = db.Exec(...)` where the underscore discards an error on a DDL statement.
+
+**Prevention:**
+- Introduce a `schema_migrations` table with a single `version INTEGER` column.
+- Write migrations as numbered functions: `migration001`, `migration002`, etc.
+- `InitDB` reads the current version and runs only pending migrations.
+- Keep migrations simple: pure SQL, no application logic.
+- A lightweight library (`golang-migrate/migrate`) can handle this, but for this project's scale a 30-line hand-rolled runner is sufficient and avoids a new dependency.
+
+**Phase mapping:** Implement alongside the Store interface refactor. The migration runner must support both SQLite and PostgreSQL SQL dialects.
+
+---
+
+## Moderate Pitfalls
+
+---
+
+### Pitfall 5: PostgreSQL connection pooling behaves differently than SQLite's forced single connection
+
+**What goes wrong:** The SQLite configuration uses `db.SetMaxOpenConns(1)` to serialize all DB access (line 64). This was the correct choice for SQLite's single-writer model. For PostgreSQL, `MaxOpenConns(1)` is a severe bottleneck and eliminates one of the primary reasons to use PostgreSQL. However, removing the constraint also removes the `sync.Mutex`, which must be eliminated correctly — not just the `SetMaxOpenConns(1)` call.
+
+**Why it happens:** The mutex was added as belt-and-suspenders to the `SetMaxOpenConns(1)` constraint. For PostgreSQL, transactions handle isolation and the driver manages connection pooling correctly. The mutex is not needed and actively harmful at scale.
+
+**Consequences:** Keeping `SetMaxOpenConns(1)` on PostgreSQL caps throughput to sequential queries. Removing it without reviewing the mutex usage can cause incorrect locking (the mutex guards writes, but PostgreSQL transactions should guard atomicity instead).
+
+**Warning signs:**
+- The `pgStore` implementation sets `MaxOpenConns(1)` — that is wrong.
+- The `pgStore` implementation acquires a `sync.Mutex` around individual `db.Exec` calls instead of using transactions.
+
+**Prevention:**
+- In `sqliteStore`: keep `SetMaxOpenConns(1)` and the mutex (SQLite needs it).
+- In `pgStore`: use PostgreSQL's default pooling (`SetMaxOpenConns` appropriate to load, e.g. 10-25), use `db.BeginTx` for operations that require atomicity, no application-level mutex.
+- Document the difference in code comments.
+
+**Phase mapping:** During the `pgStore` implementation phase.
+
+---
+
+### Pitfall 6: Optimistic UI updates in `assignTag` have no rollback on failure
+
+**What goes wrong:** `assignTag()` in `useUpdates.ts` (lines 60-84) applies the state change optimistically before the API call. If the PUT/DELETE fails, the UI shows the new tag state but the server retained the old one. The next poll at most 5 seconds later will overwrite the optimistic state with the real server state — but during that window the user sees incorrect data. Worse, the error is only `console.error`, so the user gets no feedback that their action failed.
+
+**Why it happens:** Optimistic updates are a good UX pattern, but require pairing with: (a) rollback on failure, and (b) user-visible error feedback.
+
+**Consequences:**
+- During a 5-second window after a failed tag assignment, the UI shows the wrong tag.
+- If the backend is down and the user assigns multiple tags, all changes appear to succeed. The next poll resets all of them silently.
+
+**Warning signs:**
+- No `try/catch` that restores `prev` state on `assignTag` failure.
+- No error toast or inline error state for tag assignment failures.
+
+**Prevention:**
+- Capture `prevState` before the optimistic update.
+- In the `catch` block: restore `prevState` and surface an error message to the user (inline or toast).
+- Example pattern: `const prev = updates[image]; setUpdates(optimistic); try { await api() } catch { setUpdates(restore(prev)); showError() }`.
+
+**Phase mapping:** Part of the UX improvements phase.
+
+---
+
+### Pitfall 7: Bulk acknowledge actions hitting the backend sequentially instead of in a single operation
+
+**What goes wrong:** "Dismiss all" and "dismiss by group" are planned features. The naive implementation fires one `PATCH /api/updates/{image}` per image from the frontend. For a user with 30 tracked images, this is 30 sequential API calls. Each call acquires the mutex and executes a SQL UPDATE. This is fine for single-user loads but is the wrong pattern: it creates 30 round trips, 30 DB transactions, and 30 state updates in the React UI (causing 30 re-renders).
+
+**Why it happens:** The existing dismiss path is single-image by design; bulk is an afterthought unless an explicit bulk endpoint is designed from the start.
+
+**Consequences:**
+- 30 re-renders in rapid succession cause visible UI flickering.
+- If one request fails in the middle, some images are acknowledged and others are not, with no clear feedback to the user.
+
+**Warning signs:**
+- A "dismiss all" button that loops over `updates` calling `acknowledge(image)` in sequence or in `Promise.all`.
+
+**Prevention:**
+- Add a `POST /api/updates/acknowledge-bulk` endpoint that accepts an array of image names and wraps all UPDATEs in a single transaction.
+- The frontend calls one endpoint and updates state once.
+- For "dismiss by group": pass `tag_id` as the filter parameter so the backend does `UPDATE updates SET acknowledged_at = NOW() WHERE image IN (SELECT image FROM tag_assignments WHERE tag_id = ?)`.
+
+**Phase mapping:** Design the bulk endpoint before implementing the frontend bulk UI; the API contract drives the UI, not the other way around.
+
+---
+
+### Pitfall 8: No rollback path for existing SQLite users upgrading to a version with dual-DB
+
+**What goes wrong:** When an existing user upgrades their Docker image to the version that includes PostgreSQL support, they continue using SQLite. If the migration runner runs new DDL migrations on their existing SQLite database (e.g., a new column added for PostgreSQL compatibility), and the migration fails silently due to the `_, _` pattern, they are left with a database in an intermediate state. On the next restart the migration runner does not know whether to retry or skip.
+
+**Why it happens:** No migration version tracking means "already migrated" cannot be distinguished from "never migrated."
+
+**Consequences:** Database schema becomes inconsistent. Queries that expect the new column fail. The user has no recourse except to delete the database (losing all data) or manually run SQL.
+
+**Warning signs:**
+- `InitDB` has no `SELECT version FROM schema_migrations` step.
+- Migration SQL errors are swallowed.
+
+**Prevention:**
+- Implement the versioned migration runner (see Pitfall 4).
+- Log migration progress visibly at startup: `INFO: running migration 002 (add_xyz_column)`.
+- For the column that already exists implicitly (`acknowledged_at`), migration 001 is `ALTER TABLE updates ADD COLUMN IF NOT EXISTS acknowledged_at TEXT` with the result logged regardless of whether the column existed.
+
+**Phase mapping:** Part of the store interface refactor phase, before any new schema changes land.
+
+---
+
+## Minor Pitfalls
+
+---
+
+### Pitfall 9: Drag handle invisible by default breaks tag reorganization discoverability
+
+**What goes wrong:** The `GripVertical` icon in `ServiceCard.tsx` (line 96) has `opacity-0 group-hover:opacity-100`. On touch devices, on keyboard navigation, and for users who do not hover over each card, the drag-to-regroup feature is entirely invisible. Drag-and-drop is the only way to assign a tag to an image (the `assignTag` API is only called from the drag-and-drop handler).
+
+**Why it happens:** The design prioritized a clean visual for non-interactive browsing, but made the interactive feature undiscoverable.
+
+**Consequences:** Users who cannot use hover (touch devices, keyboard-only) have no way to reorganize images. As noted in CONCERNS.md, the delete button on `TagSection.tsx` has the same problem.
+
+**Warning signs:**
+- The drag handle has `opacity-0` without a `focus-visible:opacity-100` counterpart.
+- No alternative assignment mechanism exists (e.g., a dropdown on the card).
+
+**Prevention:**
+- Make the grip handle always visible at reduced opacity (e.g., `opacity-30 group-hover:opacity-100`), or make it visible on focus.
+- Add an accessible fallback: a "Move to group" dropdown on the card's context menu or `...` menu. This also gives keyboard and touch users the ability to assign tags.
+
+**Phase mapping:** UX improvements phase. Not a blocker for DB work but should be addressed before the milestone closes.
+
+---
+
+### Pitfall 10: `datetime('now')` in DismissHandler produces SQLite-only timestamps
+
+**What goes wrong:** `DismissHandler` (line 225) writes `acknowledged_at` using `datetime('now')`, a SQLite built-in. This is a SQL dialect issue distinct from the `INSERT OR REPLACE` problem. When the PostgreSQL path is added, this query must become `NOW()` or an application-layer timestamp.
+
+**Why it happens:** It is a small single-line SQL call, easy to overlook during the migration to dual-DB.
+
+**Consequences:** `DismissHandler` breaks entirely on PostgreSQL; `datetime('now')` is not a valid PostgreSQL function call and will produce a column-name error.
+
+**Warning signs:**
+- Any raw `datetime(` in query strings.
+
+**Prevention:**
+- In the Store interface, the `DismissImage(image string) error` method takes no timestamp argument — the store implementation generates `NOW()` in SQL or passes `time.Now()` as a parameter from Go. Passing the timestamp from Go (`?` / `$1`) is the most portable approach: both SQLite and PostgreSQL accept a bound `time.Time` value, removing all dialect issues for timestamps.
+
+**Phase mapping:** Resolve during the `pgStore` implementation. Can be fixed in `sqliteStore` at the same time for consistency.
+
+---
+
+### Pitfall 11: `AUTOINCREMENT` in SQLite schema vs PostgreSQL `SERIAL` or `GENERATED ALWAYS AS IDENTITY`
+
+**What goes wrong:** The `tags` table uses `INTEGER PRIMARY KEY AUTOINCREMENT` (line 90). PostgreSQL does not have `AUTOINCREMENT`; it uses `SERIAL`, `BIGSERIAL`, or `GENERATED ALWAYS AS IDENTITY`. When writing the `CREATE TABLE` DDL for PostgreSQL, this must be translated.
+
+**Why it happens:** A detail that is invisible in the SQLite path because `CREATE TABLE IF NOT EXISTS` never re-runs.
+
+**Consequences:** `CREATE TABLE` fails on PostgreSQL if the SQLite DDL is used verbatim.
+
+**Warning signs:**
+- A single `schema.sql` file used for both databases.
+
+**Prevention:**
+- Store DDL per-driver: `schema_sqlite.sql` and `schema_pg.sql`, or generate DDL in code with driver-specific constants.
+- For PostgreSQL, use `id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY`.
+
+**Phase mapping:** Part of the initial `pgStore` schema setup.
+
+---
+
+## Phase-Specific Warnings
+
+| Phase Topic | Likely Pitfall | Mitigation |
+|---|---|---|
+| Fix SQLite bugs (UPSERT + FK enforcement) | INSERT OR REPLACE deletes tag assignments (Pitfall 2) | Use `ON CONFLICT DO UPDATE`; add `PRAGMA foreign_keys = ON` |
+| Store interface refactor | Global state prevents dual-DB (Pitfall 3) | Struct-based `Server` with `Store` interface before any PostgreSQL work |
+| Migration runner | Silent failures leave DB in unknown state (Pitfalls 4, 8) | Versioned migrations with visible logging; never swallow DDL errors |
+| PostgreSQL implementation | SQLite SQL dialect in shared queries (Pitfall 1) | All SQL in store implementations, never in handlers; integration test both stores |
+| PostgreSQL connection setup | Single-connection constraint applied to Postgres (Pitfall 5) | `pgStore` uses pooling and transactions, not mutex + `MaxOpenConns(1)` |
+| Timestamp writes | `datetime('now')` fails on PostgreSQL (Pitfall 10) | Pass `time.Now()` as a bound parameter from Go instead of using SQL built-ins |
+| Schema creation | `AUTOINCREMENT` not valid PostgreSQL syntax (Pitfall 11) | Separate DDL per driver |
+| Bulk acknowledge UI | Sequential API calls cause flickering and partial state (Pitfall 7) | Design bulk endpoint first; one API call, one state update |
+| Tag UX improvements | Optimistic updates without rollback confuse users (Pitfall 6) | Always pair optimistic updates with `catch` rollback and user-visible error |
+| Accessibility improvements | Drag handle invisible; keyboard users cannot reorganize (Pitfall 9) | Always-visible handle at reduced opacity + dropdown alternative |
+
+---
+
+## Sources
+
+- Codebase analysis: `/pkg/diunwebhook/diunwebhook.go`, lines 48-117, 225, 352 (HIGH confidence — direct code evidence)
+- Codebase analysis: `/frontend/src/hooks/useUpdates.ts`, lines 60-84 (HIGH confidence — direct code evidence)
+- Codebase analysis: `/frontend/src/components/ServiceCard.tsx`, line 96 (HIGH confidence — direct code evidence)
+- `.planning/codebase/CONCERNS.md` — confirmed INSERT OR REPLACE and FK enforcement issues (HIGH confidence — prior audit)
+- Go `database/sql` package contract and SQLite vs PostgreSQL dialect differences (MEDIUM confidence — training knowledge, no external verification available; recommend verifying PostgreSQL placeholder syntax `$1` format before implementation)