Files
GearBox/.planning/research/ARCHITECTURE.md

397 lines
24 KiB
Markdown

# Architecture Research
**Domain:** Public-first discovery platform with catalog enrichment — v2.1 milestone
**Researched:** 2026-04-09
**Confidence:** HIGH (based on direct codebase inspection)
## Standard Architecture
### System Overview
```
┌──────────────────────────────────────────────────────────────────────┐
│ CLIENT (React 19 SPA) │
├──────────────────────────────────────────────────────────────────────┤
│ ┌──────────────────────┐ ┌─────────────────────────────────┐ │
│ │ Public Shell │ │ Auth Shell (isAuthenticated) │ │
│ │ Discovery / Catalog │ │ Collection / Threads / Setups │ │
│ │ Public Setups │ │ Settings / FAB / TotalsBar │ │
│ └──────────┬───────────┘ └──────────────┬──────────────────┘ │
│ │ │ │
│ ┌──────────┴────────────────────────────────┴──────────────────┐ │
│ │ __root.tsx — single root layout, conditional chrome │ │
│ │ TanStack Router (file-based) + React Query + Zustand │ │
│ └──────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
│ fetch /api/*
┌──────────────────────────────────────────────────────────────────────┐
│ SERVER (Hono on Bun) │
├──────────────────────────────────────────────────────────────────────┤
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Auth Middleware — public bypass list + three-way auth │ │
│ │ Existing bypasses: GET /api/global-items, GET /api/tags, │ │
│ │ GET /api/setups/:id/public, GET /api/users/:id/profile │ │
│ │ NEW bypass: GET /api/discovery/* │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ items │ │ setups │ │ global-items │ │ discovery [NEW] │ │
│ │ threads │ │ profiles │ │ tags │ │ bulk import [NEW] │ │
│ │categories│ │ auth │ │ images │ │ │ │
│ └──────────┘ └──────────┘ └──────────────┘ └────────────────────┘ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Service Layer (db as first param) │ │
│ └────────────────────────────────────────────────────────────────┘ │
├──────────────────────────────────────────────────────────────────────┤
│ ┌──────────────────────┐ ┌────────────────────┐ │
│ │ PostgreSQL (Drizzle)│ │ MinIO (S3) │ │
│ └──────────────────────┘ └────────────────────┘ │
├──────────────────────────────────────────────────────────────────────┤
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ MCP Server (/mcp, streamable-http) │ │
│ │ 19 existing tools + NEW catalog seeding tools │ │
│ └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
```
### Component Responsibilities
| Component | Responsibility | Status for v2.1 |
|-----------|----------------|-----------------|
| `__root.tsx` | Auth gate, layout shell, global modals | MODIFY — remove hard redirect for public routes |
| `routes/index.tsx` | Home page | REPLACE — becomes Discovery landing page |
| `routes/global-items/` | Catalog browsing and item detail | EXTEND — show enrichment fields, attribution |
| `server/index.ts` auth bypass list | Public route exceptions | EXTEND — add discovery feed bypass |
| `server/routes/global-items.ts` | Catalog CRUD API | EXTEND — add bulk import endpoint |
| `server/services/global-item.service.ts` | Catalog queries | EXTEND — trending query, bulk upsert |
| `db/schema.ts` globalItems table | Catalog data model | EXTEND — attribution and provenance fields |
| `server/mcp/` | Agent tool interface | EXTEND — add catalog seeding tools |
## Recommended Project Structure
New files slot into existing conventions. Nothing moves; additions only (except `routes/index.tsx` replacement).
```
src/
├── client/
│ ├── routes/
│ │ ├── index.tsx # REPLACE: Discovery landing (was Dashboard)
│ │ └── global-items/
│ │ ├── index.tsx # EXTEND: enrichment fields in catalog list
│ │ └── $globalItemId.tsx # EXTEND: attribution, source URL display
│ ├── components/
│ │ ├── DiscoveryFeed.tsx # NEW: trending setups + popular items feed
│ │ ├── FeedCard.tsx # NEW: card component for feed items
│ │ └── CatalogSearchBar.tsx # NEW: prominent hero search bar
│ └── hooks/
│ └── useDiscovery.ts # NEW: React Query hook for /api/discovery/feed
└── server/
├── routes/
│ ├── global-items.ts # EXTEND: add POST /bulk endpoint
│ └── discovery.ts # NEW: GET /feed, GET /trending
├── services/
│ ├── global-item.service.ts # EXTEND: bulkUpsert, getTrending functions
│ └── discovery.service.ts # NEW: feed composition queries
└── mcp/
└── tools/
├── catalog.ts # NEW: upsert_catalog_item, bulk_upsert_catalog
└── items.ts # UNCHANGED (user collection tools)
```
### Structure Rationale
- **Replace `routes/index.tsx` directly**: The home route IS the discovery page for v2.1. No separate `/discover` URL needed — that creates two entry points and splits SEO value.
- **`discovery.ts` route separate from `global-items.ts`**: Feed queries are read-only, public, and compositional (join multiple tables). Catalog CRUD stays in `global-items.ts`. Separation keeps route files single-responsibility.
- **`catalog.ts` MCP tools separate from `items.ts`**: User collection tools (`create_item`) and global catalog tools (`upsert_catalog_item`) have different semantics. Mixing them invites agents using the wrong tool.
## Architectural Patterns
### Pattern 1: Auth-Aware Root Layout (Modify Existing)
**What:** `__root.tsx` currently hard-redirects all unauthenticated users to `/login` except `/users/*` and `/login` itself. The `isPublicRoute` check must be expanded to include the discovery landing page and catalog routes.
**When to use:** Every new public-facing route requires an addition to this check.
**Current code (lines 130-132 of `__root.tsx`):**
```typescript
const isPublicRoute =
location.pathname.startsWith("/users/") || location.pathname === "/login";
```
**Change required:**
```typescript
const isPublicRoute =
location.pathname === "/" ||
location.pathname.startsWith("/users/") ||
location.pathname.startsWith("/global-items/") ||
location.pathname === "/login";
```
**Trade-offs:** Minimal change, zero new infrastructure. Risk: the list grows and becomes the source of security-adjacent bugs (forgetting to add a route). Consider extracting to a named constant `PUBLIC_ROUTE_PREFIXES` so it's discoverable.
### Pattern 2: Discovery Feed as a Composed Read Endpoint
**What:** A new `GET /api/discovery/feed` endpoint returns pre-composed content: trending public setups + popular global items in a single response. No auth required.
**When to use:** Discovery landing page initial load. Client calls once on mount.
**Server-side composition (discovery.service.ts):**
```typescript
export async function getDiscoveryFeed(db: Db) {
// Trending setups: public setups, most recently updated
const trendingSetups = await db
.select({ id, name, userId, updatedAt })
.from(setups)
.innerJoin(users, eq(users.id, setups.userId))
.where(eq(setups.isPublic, true))
.orderBy(desc(setups.updatedAt))
.limit(10);
// Popular catalog items: most widely owned
const popularItems = await db
.select({ ...globalItems, ownerCount: count(items.id) })
.from(globalItems)
.leftJoin(items, eq(items.globalItemId, globalItems.id))
.groupBy(globalItems.id)
.orderBy(desc(count(items.id)))
.limit(6);
return { trendingSetups, popularItems };
}
```
**Trade-offs:** Single round-trip for the landing page. Risk: query grows expensive as setups table grows. Mitigation: composite index on `(is_public, updated_at DESC)`.
### Pattern 3: Catalog Enrichment via Schema Extension
**What:** Add attribution and provenance fields to `globalItems`. These are optional columns — existing records are unaffected until an agent or admin populates them.
**Schema additions:**
```typescript
// In globalItems pgTable definition
sourceUrl: text("source_url"), // Product page or spec sheet
manufacturer: text("manufacturer"), // Normalized manufacturer name
imageAttribution: text("image_attribution"), // Credit text for catalog image
verifiedAt: timestamp("verified_at"), // Last verification date
updatedAt: timestamp("updated_at") // Track catalog edits
.defaultNow().notNull(),
```
**Trade-offs:** Nullable columns = zero migration risk for existing data. The `updatedAt` column is useful for cache invalidation and agent re-verification workflows.
### Pattern 4: Bulk Upsert for Agent Catalog Seeding
**What:** A new `POST /api/global-items/bulk` endpoint accepts an array of catalog items and upserts on the natural key `(brand, model)`. Returns counts of created/updated/skipped.
**Auth:** Required (API key or MCP OAuth Bearer token). This is a write operation.
**Upsert strategy:**
```sql
INSERT INTO global_items (brand, model, category, weight_grams, ...)
VALUES (...)
ON CONFLICT (brand, model) DO UPDATE
SET source_url = EXCLUDED.source_url,
manufacturer = EXCLUDED.manufacturer,
updated_at = NOW()
WHERE global_items.verified_at IS NULL
OR EXCLUDED.source_url IS NOT NULL;
```
**Trade-offs:** Natural key upsert is robust for seeding. Risk: "Osprey" vs "Osprey Packs" creates duplicates. Mitigation: normalizeText() before insert, agent prompt instructs canonical brand naming.
### Pattern 5: MCP Catalog Tools (No User Scope)
**What:** New MCP tools write to `globalItems` (shared catalog), not `items` (per-user collection). The existing MCP server passes `userId` to every tool handler — catalog tools must accept userId for auth but ignore it for data scope.
**New tools:**
```
upsert_catalog_item — insert or update a single global catalog entry
bulk_upsert_catalog — batch version for efficiency (up to 50 items)
get_catalog_stats — item counts by category for agent planning
search_catalog — wrapper over existing searchGlobalItems
```
**Registration pattern (mirrors existing tools):**
```typescript
// catalog.ts
export const catalogToolDefinitions = [
{ name: "upsert_catalog_item", description: "...", inputSchema: {...} },
{ name: "bulk_upsert_catalog", description: "...", inputSchema: {...} },
{ name: "get_catalog_stats", description: "...", inputSchema: {...} },
];
export function registerCatalogTools(db: Db) {
// Note: no userId param — catalog tools are not user-scoped
return { upsert_catalog_item: ..., bulk_upsert_catalog: ..., get_catalog_stats: ... };
}
```
**Trade-offs:** Keeps catalog seeding distinct from personal collection management. The `userId` is available in the MCP server context but catalog tools simply don't use it for data scope — they use it only for audit logging if needed.
## Data Flow
### Public Discovery Page Load (Unauthenticated)
```
Browser (no session)
→ GET / → React SPA loads (served as static file in prod)
→ __root.tsx: isAuthenticated=false, isPublicRoute=true → render layout
→ DiscoveryPage mounts
→ useDiscovery() → GET /api/discovery/feed (auth bypassed)
→ discovery.service.getDiscoveryFeed() → queries setups + globalItems
→ Returns { trendingSetups: [...], popularItems: [...] }
→ DiscoveryFeed renders FeedCard list
→ CatalogSearchBar renders (calls existing GET /api/global-items?q=...)
→ User clicks item → /global-items/:id (public) or /users/:userId (public)
```
### Agent Catalog Seeding
```
Claude agent (API key or MCP OAuth)
→ MCP: get_catalog_stats
→ Returns: { byCategory: [{ name: "Bags", count: 3 }, ...] }
→ Agent identifies "Bags" as underserved (target: 20 items)
→ Agent researches 17 bag products
→ MCP: bulk_upsert_catalog([{ brand, model, weightGrams, ... }])
→ global-item.service.bulkUpsert() → normalizeText() → INSERT ON CONFLICT
→ Returns { created: 14, updated: 2, skipped: 1 }
→ Agent repeats per category until coverage target met
```
### Catalog Enrichment Display
```
User navigates to /global-items/:id (public or authenticated)
→ GET /api/global-items/:id
→ getGlobalItemWithOwnerCount() → item + ownerCount
→ Response includes: sourceUrl, manufacturer, imageAttribution, verifiedAt
→ $globalItemId.tsx renders attribution section if sourceUrl present
→ "Source: [manufacturer] via [domain]" with external link
```
### Authenticated User — Unchanged
```
Browser (OIDC session)
→ __root.tsx: isAuthenticated=true → existing behavior
→ / → DiscoveryPage (same component, but "Go to Collection" CTA visible)
→ TotalsBar, FAB, OnboardingWizard shown as today
→ All collection/thread/setup routes unchanged
```
## Scaling Considerations
| Scale | Architecture Adjustments |
|-------|--------------------------|
| Current (< 1k users, ~18 catalog items) | Monolith fine, no changes needed beyond feature additions |
| 1k-50k users | Add indexes: `CREATE INDEX ON setups (is_public, updated_at DESC)` and `CREATE INDEX ON items (global_item_id)` for ownerCount aggregation |
| 50k+ users | Cache `/api/discovery/feed` response server-side (Redis or in-memory with 60s TTL). Feed accuracy does not need to be real-time. |
### Scaling Priorities
1. **First bottleneck:** The ownerCount aggregation in `getDiscoveryFeed` (and `getGlobalItemWithOwnerCount`) joins `items` on `global_item_id`. As items table grows this is O(items). Add index on `items.global_item_id` immediately — it likely does not exist yet since it's not a FK PK.
2. **Second bottleneck:** Public setup listing for the feed scans the `setups` table for `is_public = true`. Composite index `(is_public, updated_at DESC)` makes this a fast index scan.
## Anti-Patterns
### Anti-Pattern 1: Growing the Auth Bypass List Indefinitely
**What people do:** Add more regex path checks to the 15-line bypass block in `server/index.ts` every time a new public endpoint appears.
**Why it's wrong:** The bypass list in `server/index.ts` already has 5 special cases (lines 125-137). Each addition is a security decision made in the wrong place. A typo in a regex silently exposes an endpoint or silently breaks a public one.
**Do this instead:** For this milestone, add the one needed bypass (`GET /api/discovery/*`) cleanly. Longer term, consider route-level middleware via Hono's `.use()` on specific route groups, moving auth decisions to where routes are defined.
### Anti-Pattern 2: Two Separate Root Layouts for Public vs Auth
**What people do:** Create a new `__public-root.tsx` with completely different structure for unauthenticated users.
**Why it's wrong:** TanStack Router file-based routing would require a `_public` layout segment and routing decisions at the top that duplicate `__root.tsx` logic. The existing root already does conditional rendering of TotalsBar and FAB based on `isAuthenticated`. Extend that pattern — don't duplicate the layout.
**Do this instead:** One root, conditional chrome. Public users see the page content without TotalsBar/FAB/OnboardingWizard. The auth check gates those components, not the entire layout.
### Anti-Pattern 3: Using `create_item` MCP Tool for Catalog Seeding
**What people do:** Use the existing `create_item` tool during agent seeding sessions, since it already exists and takes brand/model/weight fields.
**Why it's wrong:** `create_item` writes to the user-scoped `items` table, not `globalItems`. Items added this way belong to the service account, are invisible to other users as catalog entries, pollute that account's weight/cost totals, and cannot be found via catalog search.
**Do this instead:** Use dedicated `upsert_catalog_item` / `bulk_upsert_catalog` tools that target the `globalItems` table. The distinction should be documented clearly in tool descriptions.
### Anti-Pattern 4: Fetching ownerCount on Every Feed Card Render
**What people do:** Call `getGlobalItemWithOwnerCount()` for each item in the discovery feed, resulting in N+1 queries.
**Why it's wrong:** The feed might render 6-10 catalog items. Each triggers a separate COUNT query. At low scale invisible, at medium scale a noticeable latency hit on the most-loaded endpoint (the public landing page).
**Do this instead:** Compute ownerCount in the feed query itself via a single LEFT JOIN + COUNT in the `getDiscoveryFeed` service function. One query returns all items with their counts.
## Integration Points
### Existing Architecture — What Changes
| Boundary | Change | Risk |
|----------|--------|------|
| `__root.tsx` `isPublicRoute` | Add `/` and `/global-items/*` | Low — additive change to conditional |
| `server/index.ts` bypass list | Add `GET /api/discovery/*` | Low — same pattern as existing bypasses |
| `db/schema.ts` globalItems | Add 5 nullable columns | Low — nullable = no migration risk for existing rows |
| `routes/index.tsx` | Replace Dashboard with Discovery page | Medium — existing authenticated users see different home page |
| `server/routes/global-items.ts` | Add `POST /bulk` route | Low — new route, existing routes unchanged |
| `server/mcp/index.ts` | Register catalogToolDefinitions | Low — existing registration pattern, additive |
### New Components — No Existing Touch
| Component | Location | Depends On |
|-----------|----------|------------|
| `discovery.service.ts` | `server/services/` | Schema migration (globalItems.updatedAt), setups table |
| `discovery.ts` route | `server/routes/` | `discovery.service.ts` |
| `useDiscovery.ts` hook | `client/hooks/` | `GET /api/discovery/feed` endpoint |
| `DiscoveryFeed.tsx` | `client/components/` | `useDiscovery.ts`, `FeedCard.tsx` |
| `FeedCard.tsx` | `client/components/` | None — pure presentational |
| `CatalogSearchBar.tsx` | `client/components/` | Existing `GET /api/global-items` endpoint |
| `catalog.ts` MCP tools | `server/mcp/tools/` | `bulkUpsert` function in `global-item.service.ts` |
### External Services
| Service | Change | Notes |
|---------|--------|-------|
| MinIO (S3) | None | Agent can already use `upload_image_from_url` MCP tool for catalog images |
| Logto (OIDC) | None | Public routes bypass Logto entirely |
| PostgreSQL | Schema migration | One `ALTER TABLE global_items ADD COLUMN ...` migration |
## Build Order (Dependency-Ordered)
**Phase 1 — Foundation (no UI yet)**
1. Schema migration: add `sourceUrl`, `manufacturer`, `imageAttribution`, `verifiedAt`, `updatedAt` to `globalItems`. Run `bun run db:generate && bun run db:push`. Unblocks all subsequent work.
2. Auth bypass: add `GET /api/discovery/*` to bypass list in `server/index.ts`. Trivial change, enables endpoint testing.
3. Add indexes: `global_item_id` on items table, `(is_public, updated_at DESC)` on setups table. Drizzle migration.
**Phase 2 — Server (can parallel with Phase 3)**
4. `discovery.service.ts` + `discovery.ts` route + register in `server/index.ts`. Pure reads, testable independently.
5. `bulkUpsert` in `global-item.service.ts` + `POST /api/global-items/bulk` endpoint.
**Phase 3 — Client (can parallel with Phase 2)**
6. Modify `__root.tsx` to expand `isPublicRoute`. Must land before discovery page renders for anon users.
7. Replace `routes/index.tsx` with Discovery landing page. Requires Phase 3 step 6 and Phase 2 step 4 (or mock data while API is in progress).
**Phase 4 — MCP and Polish**
8. `catalog.ts` MCP tools + register in `server/mcp/index.ts`. Requires bulk upsert endpoint (Phase 2 step 5).
9. Update `global-items/$globalItemId.tsx` to display attribution fields. Requires schema migration (Phase 1 step 1).
## Sources
- Direct inspection: `/src/server/index.ts` (auth bypass list at lines 121-139, route registration)
- Direct inspection: `/src/client/routes/__root.tsx` (isPublicRoute logic at lines 130-143, auth gate)
- Direct inspection: `/src/db/schema.ts` (globalItems table definition)
- Direct inspection: `/src/server/routes/global-items.ts` (existing catalog endpoints)
- Direct inspection: `/src/server/services/global-item.service.ts` (query patterns, ILIKE search)
- Direct inspection: `/src/server/mcp/index.ts` (tool registration pattern)
- Direct inspection: `/src/server/middleware/auth.ts` (three-way auth flow)
- Direct inspection: `/src/client/routes/index.tsx` (current dashboard — what is being replaced)
- `.planning/PROJECT.md` (v2.1 milestone goals and constraints)
---
*Architecture research for: GearBox v2.1 Public Discovery milestone*
*Researched: 2026-04-09*