Files
GearBox/.planning/research/STACK.md

334 lines
18 KiB
Markdown

# Stack Research
**Domain:** Public-first gear discovery platform — catalog enrichment, discovery feed, agent-powered seeding (v2.1)
**Researched:** 2026-04-09
**Confidence:** HIGH (existing stack verified against package.json; additions verified against npm/official docs)
---
## Context: What Already Exists (Do Not Re-Research)
The following are validated and in production at v2.0. This file covers ADDITIONS AND CHANGES only.
| Layer | Current |
|-------|---------|
| Runtime | Bun |
| Frontend | React 19, TanStack Router/Query v5, Tailwind CSS v4, Zustand, Zod 4.x, framer-motion, Recharts, Lucide React |
| Backend | Hono 4.12.x, Drizzle ORM 0.45.x, PostgreSQL (postgres.js 3.4.x driver) |
| Auth | @hono/oidc-auth 1.8.x (Logto), API key auth, MCP OAuth 2.1 |
| Storage | @aws-sdk/client-s3 3.x (MinIO) |
| MCP | @modelcontextprotocol/sdk 1.29.x (19 tools) |
| Rate limiting | Custom in-process Map (auth endpoints only, 5 req/15 min per IP) |
---
## New Capability Areas
### 1. Public Access Auth Model
**What's needed:** The `requireAuth` middleware in `src/server/middleware/auth.ts` already handles three auth paths (API key, OAuth Bearer, OIDC session). The skip-list pattern in `src/server/index.ts` already exempts public GETs on `/api/global-items`, `/api/tags`, `/api/users/:id/profile`, and `/api/setups/:id/public`.
**This milestone extends the skip-list** to cover new discovery endpoints (`/api/discovery/*`). Additionally, a new `tryAuth` middleware variant is needed for endpoints that work for both anonymous and authenticated users — it resolves `userId` if credentials are present but does NOT 401 on absence. This enables auth-aware responses (e.g., annotating feed items with "in your collection" for logged-in users).
**No new dependency.** Pure middleware logic — add `tryAuth` to `auth.ts`, update skip-list in `index.ts`.
---
### 2. Discovery Feed (Popular Setups, Trending Items)
The feed requires: ranked/scored queries, cursor-based pagination, and cheap repeated reads by anonymous users.
#### Trending Score
Use a hot-score computed in PostgreSQL SQL — no external search engine or materialized view needed at this scale.
```sql
-- Hacker News-style decay: engagement / time^gravity
SELECT id, brand, model,
(owner_count::float / power((extract(epoch from now()) - extract(epoch from created_at)) / 3600.0 + 2, 1.8)) AS hot_score
FROM global_items
ORDER BY hot_score DESC
LIMIT 20;
```
This requires `ownerCount` as a real column (not a JOIN-time COUNT) on `globalItems`. The column already logically exists via join — promote it to a denormalized integer that the collection add/remove service path updates. No trigger needed; update it in the same database transaction as the collection operation.
**No new dependency.** Schema migration + service-layer update.
#### Cursor-Based Pagination
Drizzle ORM 0.45.x has documented cursor pagination support (two-column keyset). Use `(hotScore DESC, id DESC)` for the trending feed and `(createdAt DESC, id DESC)` for "recently added." Encode cursor as base64 JSON — opaque to the client.
The Hono + Drizzle cursor pattern is documented and actively used in the ecosystem. No pagination library needed.
**No new dependency.** Drizzle already supports this natively.
#### Full-Text Catalog Search
`globalItems` needs fast free-text search across `brand + model + description`. Use PostgreSQL native `tsvector` with a GIN index.
Drizzle 0.45.x does not generate `GENERATED ALWAYS AS ... STORED` syntax for tsvector columns in drizzle-kit. Add the `searchVector` column and GIN index via a raw SQL migration file (create via `drizzle-kit generate` then manually add the ALTER TABLE and CREATE INDEX statements to the generated file).
For the Hono route, use Drizzle's `sql` template tag with `to_tsquery`:
```typescript
.where(sql`search_vector @@ plainto_tsquery('english', ${q})`)
.orderBy(sql`ts_rank(search_vector, plainto_tsquery('english', ${q})) DESC`)
```
**No new dependency.** Schema migration + raw SQL in service layer.
#### Feed Client (TanStack Query + IntersectionObserver)
`useInfiniteQuery` from `@tanstack/react-query` (already at 5.90.x) handles cursor pagination natively via `getNextPageParam`. The scroll trigger uses the browser-native IntersectionObserver API — implement a `useIntersectionObserver(ref, callback)` hook (~12 lines) rather than adding a scroll library. This matches the existing GearBox pattern of minimal third-party UI dependencies.
**No new dependency.**
---
### 3. Catalog Enrichment Infrastructure
#### Schema Additions to `globalItems`
New fields for attribution, source tracking, and feed ranking:
| Field | Type | Purpose |
|-------|------|---------|
| `sourceUrl` | `text` | Canonical product page (retailer or manufacturer) |
| `sourceAttribution` | `text` | Human-readable credit ("via REI", "via manufacturer") |
| `imageAttributionUrl` | `text` | URL where product image was originally sourced |
| `imageAttributionText` | `text` | License or credit line for the image |
| `submittedByUserId` | `integer FK → users` | Who submitted this catalog entry (null = seeded by admin/agent) |
| `verifiedAt` | `timestamp` | When an admin approved the entry (null = unverified) |
| `ownerCount` | `integer NOT NULL DEFAULT 0` | Denormalized count of collection items referencing this |
| `productUrl` | `text` | Retailer/manufacturer product link (duplicates item-level, but catalog-owned) |
These are Drizzle schema additions. **No new dependency.**
#### Zod Schemas for Enriched Catalog
Add `CreateCatalogItemSchema` in `src/shared/schemas.ts` with attribution fields. Zod 4.3.x handles this natively. The schema feeds the new `POST /api/global-items` route (currently only GET is public — writes will require auth but open to non-admins for catalog submissions).
---
### 4. Agent-Powered Catalog Seeding via MCP
The existing MCP server (`@modelcontextprotocol/sdk` 1.29.x, 19 tools) already provides the infrastructure. The agent workflow:
1. Claude agent receives a category or brand as a prompt
2. Uses a new `create_catalog_item` MCP tool — purpose-built for `globalItems` insertion with full attribution fields
3. Server validates via Zod, inserts into `globalItems`, updates `ownerCount` denormalization
4. Agent uses the existing `upload_image_from_url` tool to fetch and store product images
The new tool registers identically to existing tools in `src/server/mcp/index.ts`. Batch seeding sessions: the agent runs N `create_catalog_item` calls in sequence within one MCP session — no parallel execution framework needed at catalog bootstrap scale.
For standalone seed scripts (`bun run src/db/dev-seed.ts` extensions), use the Drizzle db instance directly. No external seeding framework.
**No new dependency.**
---
### 5. HTTP Caching for Public Endpoints
Public GET endpoints (discovery feed, catalog detail pages) will be hit by anonymous users repeatedly. Add HTTP-level cache hints to reduce DB round-trips.
- **Catalog item detail pages** (`GET /api/global-items/:id`): Use Hono's built-in `etag()` middleware. Content-addressed — returns 304 Not Modified when item hasn't changed.
- **Discovery feed endpoints** (`GET /api/discovery/*`): Set `Cache-Control: public, max-age=60, stale-while-revalidate=300` manually in route handlers. Feed data tolerates 60s staleness.
**Do NOT use Hono's `cache()` middleware** — it is platform-specific to Cloudflare Workers and Deno, and silently does nothing on Bun. This is a documented limitation. Known issue #4401 in the Hono repo also shows the `etag()` middleware can generate inconsistent ETags when combining with other middleware — test in integration tests before shipping.
**No new dependency.** `etag` is built into Hono 4.12.x.
---
### 6. Rate Limiting for Public Traffic
The existing `rateLimit.ts` in-process Map handles auth endpoints correctly (5 req/15 min per IP). It is inappropriate for public discovery traffic because:
- 5 req/15 min is far too strict for anonymous browsing
- In-process state resets on server restart (tolerable for auth, wrong for general rate limiting)
- No way to differentiate authenticated vs anonymous callers in the current implementation
**Recommendation:** Keep the existing `rateLimit.ts` for auth endpoints only. Add `hono-rate-limiter` for discovery/catalog public endpoints with a permissive anonymous limit (e.g., 100 req/min per IP) and no limit for authenticated callers.
```typescript
import { rateLimiter } from "hono-rate-limiter";
const discoveryLimiter = rateLimiter({
windowMs: 60 * 1000, // 1 minute
limit: 100,
keyGenerator: (c) => c.req.header("x-forwarded-for")?.split(",")[0] ?? "unknown",
});
app.use("/api/discovery/*", discoveryLimiter);
```
The in-process storage adapter (default in `hono-rate-limiter`) is sufficient for single-instance deployment. If the app scales horizontally, swap to `@hono-rate-limiter/redis` — but that is a future decision, not a v2.1 concern.
**New dependency:**
| Library | Version | Purpose |
|---------|---------|---------|
| `hono-rate-limiter` | `^0.5.3` | Per-route rate limiting with configurable windows for public endpoints |
```bash
bun add hono-rate-limiter
```
---
## Full Stack Additions Summary
### New Dependencies (v2.1 only)
| Library | Version | Purpose | Why |
|---------|---------|---------|-----|
| `hono-rate-limiter` | `^0.5.3` | Configurable rate limits for public discovery routes | Existing in-process limiter is auth-only with a 5-req cap; public browse traffic needs separate, permissive limits |
### No New Dependencies Needed For
| Capability | Existing Stack Component Used |
|------------|------------------------------|
| Public auth model (`tryAuth` variant) | Hono middleware — no library |
| Discovery feed cursor pagination | Drizzle 0.45.x cursor pagination docs |
| Full-text catalog search (tsvector GIN) | PostgreSQL native + Drizzle `sql` template |
| Trending score computation | PostgreSQL SQL expression — no extension |
| Infinite scroll client | TanStack Query `useInfiniteQuery` + native IntersectionObserver |
| Catalog attribution fields | Drizzle schema migration |
| Agent catalog seeding | Existing MCP SDK + new `create_catalog_item` tool |
| HTTP cache headers | Hono built-in `etag()` + manual `Cache-Control` |
| Feed ranking denormalization | Service-layer transaction update (no trigger, no cron) |
---
## Schema Changes Required (Not Library Changes)
These are Drizzle schema additions generating migrations:
### `globalItems` additions
```typescript
// In src/db/schema.ts — globalItems table additions
sourceUrl: text("source_url"),
sourceAttribution: text("source_attribution"),
imageAttributionUrl: text("image_attribution_url"),
imageAttributionText: text("image_attribution_text"),
submittedByUserId: integer("submitted_by_user_id").references(() => users.id),
verifiedAt: timestamp("verified_at"),
ownerCount: integer("owner_count").notNull().default(0),
productUrl: text("product_url"),
```
### Raw SQL migration additions (cannot be expressed in Drizzle schema)
```sql
-- Add after Drizzle-generated migration runs:
-- Generated tsvector column for full-text search
ALTER TABLE global_items
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector('english',
coalesce(brand, '') || ' ' ||
coalesce(model, '') || ' ' ||
coalesce(description, '')
)
) STORED;
CREATE INDEX global_items_search_vector_idx ON global_items USING GIN(search_vector);
-- Partial index for public setup discovery feed
CREATE INDEX setups_public_updated_idx ON setups (updated_at DESC) WHERE is_public = true;
-- Trending feed index
CREATE INDEX global_items_owner_count_id_idx ON global_items (owner_count DESC, id DESC);
```
> **Note:** Drizzle Kit does not generate `GENERATED ALWAYS AS ... STORED` for tsvector. Add these as a separate raw SQL file appended to the Drizzle migration or as a separate `customMigration` file in the migrations folder. Run via `bun run db:push` after the Drizzle migration applies.
### `setups` additions
```typescript
// In src/db/schema.ts — setups table additions
viewCount: integer("view_count").notNull().default(0),
```
---
## Alternatives Considered
| Recommended | Alternative | Why Not |
|-------------|-------------|---------|
| PostgreSQL tsvector + GIN | Meilisearch / Typesense | Separate search service adds infra ops complexity; tsvector covers structured gear catalog search at GearBox scale without additional containers |
| PostgreSQL tsvector + GIN | pg_textsearch (BM25 extension) | Requires installing a PostgreSQL extension in production; BM25 ranking is unnecessary for a catalog of branded products where exact brand/model matches dominate |
| Denormalized `ownerCount` column | COUNT JOIN per feed request | Feed queries fire on every anonymous page load; a JOIN COUNT becomes a bottleneck before any other part of the stack does |
| Native IntersectionObserver hook | react-infinite-scroll-component | Zero-dependency — 12-line hook replaces an 8KB library; consistent with GearBox's minimal-external-dependency UI philosophy |
| Manual `Cache-Control` headers | Hono `cache()` middleware | Hono `cache()` is Cloudflare Workers/Deno only — silently does nothing on Bun |
| `hono-rate-limiter` in-process | Redis-backed rate limiter | Single-instance deployment — Redis adds an infra dependency not justified at current scale |
| Extend existing MCP toolset | Separate seeding CLI script | MCP agents already have auth and structured tool calling; a dedicated `create_catalog_item` tool is cleaner than a one-off script that bypasses the service layer |
| Service-layer `ownerCount` update | PostgreSQL trigger | Triggers are invisible to the TypeScript codebase, harder to test, and prone to silent failures in complex transactions |
---
## What NOT to Add
| Avoid | Why | Use Instead |
|-------|-----|-------------|
| Elasticsearch / OpenSearch | Separate cluster, ops overhead, overkill for a structured product catalog | PostgreSQL tsvector with GIN index |
| pg_textsearch / VectorChord-BM25 | PostgreSQL extension install required in prod; BM25 precision unnecessary for brand+model search | PostgreSQL native `ts_rank` |
| Hono `cache()` middleware | Platform-specific to Cloudflare/Deno; does nothing on Bun | Manual `Cache-Control` headers in route handlers |
| react-virtual / windowing | Feed is paginated, not a virtual list; items per page (~20) never hit DOM performance limits | Standard DOM list with cursor pagination |
| Prisma | Already using Drizzle ORM; two ORMs in one codebase is a maintenance trap | drizzle-orm (existing) |
| Materialized views for feed caching | drizzle-kit does not fully support materialized view migrations; manual REFRESH logic is brittle | Denormalized score columns + partial indexes |
| Separate seeding framework (Faker, etc.) | Catalog data is real product data, not fake; agent seeding produces real structured records | MCP `create_catalog_item` tool |
---
## Version Compatibility
| Package | Current Version | v2.1 Notes |
|---------|----------------|------------|
| `hono` | 4.12.x (4.12.12 latest) | `etag()` built-in available; `cache()` is NOT compatible with Bun — do not use |
| `drizzle-orm` | 0.45.x (0.45.2 latest stable) | Cursor pagination confirmed; generated tsvector column requires raw SQL migration appended to drizzle-kit output |
| `@tanstack/react-query` | 5.90.x | `useInfiniteQuery` with `getNextPageParam` fully supports cursor pattern natively |
| `hono-rate-limiter` | 0.5.3 (latest, published ~16 days ago) | In-process storage adapter works on Bun; actively maintained |
| `@modelcontextprotocol/sdk` | 1.29.x | Existing MCP tooling is sufficient for adding `create_catalog_item` |
| `zod` | 4.3.x | New catalog attribution schemas are straightforward additions to existing `schemas.ts` |
| `@hono/zod-validator` | 0.7.x | Already used for all routes; covers new discovery/catalog endpoints |
---
## Installation
```bash
# Only one new package for v2.1
bun add hono-rate-limiter
```
Everything else is schema migrations, new service/route/middleware code, and one new MCP tool — all on the existing stack.
---
## Sources
- [Drizzle ORM cursor-based pagination](https://orm.drizzle.team/docs/guides/cursor-based-pagination) — two-column keyset pattern, v0.45.x confirmed (HIGH)
- [Drizzle ORM PostgreSQL full-text search](https://orm.drizzle.team/docs/guides/postgresql-full-text-search) — tsvector approach confirmed (HIGH)
- [Drizzle ORM full-text search with generated columns](https://orm.drizzle.team/docs/guides/full-text-search-with-generated-columns) — generated column pattern for tsvector (HIGH)
- [Hono ETag middleware](https://hono.dev/docs/middleware/builtin/etag) — built-in, no install required (HIGH)
- [Hono Cache middleware](https://hono.dev/docs/middleware/builtin/cache) — explicitly listed as Cloudflare/Deno only, not Bun (HIGH)
- [Hono ETag issue #4401](https://github.com/honojs/hono/issues/4401) — known inconsistency bug in etag middleware (MEDIUM)
- [hono-rate-limiter GitHub](https://github.com/rhinobase/hono-rate-limiter) — v0.5.3, active, Bun compatible (HIGH)
- [hono-rate-limiter npm](https://www.npmjs.com/package/hono-rate-limiter) — version 0.5.3 confirmed (HIGH)
- [TanStack Query infinite queries](https://tanstack.com/query/latest/docs/framework/react/guides/infinite-queries) — `useInfiniteQuery` cursor pattern (HIGH)
- [Drizzle ORM materialized views issue #2653](https://github.com/drizzle-team/drizzle-orm/issues/2653) — confirmed drizzle-kit does not fully support materialized view migrations (MEDIUM)
- [Hono middleware docs](https://hono.dev/docs/guides/middleware) — selective auth middleware pattern (HIGH)
- GearBox `package.json` — all existing dependency versions verified directly (HIGH)
- GearBox `src/server/index.ts` — existing skip-list pattern verified directly (HIGH)
- GearBox `src/server/middleware/auth.ts` — existing three-way auth verified directly (HIGH)
- GearBox `src/db/schema.ts` — existing `globalItems` table columns verified directly (HIGH)
---
*Stack research for: GearBox v2.1 Public Discovery milestone*
*Researched: 2026-04-09*