Files

Jean-Luc Makiola c4ad5c1b2a docs: complete project research

2026-04-09 14:44:12 +02:00

18 KiB

Raw Blame History

Stack Research

Domain: Public-first gear discovery platform — catalog enrichment, discovery feed, agent-powered seeding (v2.1) Researched: 2026-04-09 Confidence: HIGH (existing stack verified against package.json; additions verified against npm/official docs)

Context: What Already Exists (Do Not Re-Research)

The following are validated and in production at v2.0. This file covers ADDITIONS AND CHANGES only.

Layer	Current
Runtime	Bun
Frontend	React 19, TanStack Router/Query v5, Tailwind CSS v4, Zustand, Zod 4.x, framer-motion, Recharts, Lucide React
Backend	Hono 4.12.x, Drizzle ORM 0.45.x, PostgreSQL (postgres.js 3.4.x driver)
Auth	@hono/oidc-auth 1.8.x (Logto), API key auth, MCP OAuth 2.1
Storage	@aws-sdk/client-s3 3.x (MinIO)
MCP	@modelcontextprotocol/sdk 1.29.x (19 tools)
Rate limiting	Custom in-process Map (auth endpoints only, 5 req/15 min per IP)

New Capability Areas

1. Public Access Auth Model

What's needed: The requireAuth middleware in src/server/middleware/auth.ts already handles three auth paths (API key, OAuth Bearer, OIDC session). The skip-list pattern in src/server/index.ts already exempts public GETs on /api/global-items, /api/tags, /api/users/:id/profile, and /api/setups/:id/public.

This milestone extends the skip-list to cover new discovery endpoints (/api/discovery/*). Additionally, a new tryAuth middleware variant is needed for endpoints that work for both anonymous and authenticated users — it resolves userId if credentials are present but does NOT 401 on absence. This enables auth-aware responses (e.g., annotating feed items with "in your collection" for logged-in users).

No new dependency. Pure middleware logic — add tryAuth to auth.ts, update skip-list in index.ts.

The feed requires: ranked/scored queries, cursor-based pagination, and cheap repeated reads by anonymous users.

Use a hot-score computed in PostgreSQL SQL — no external search engine or materialized view needed at this scale.

-- Hacker News-style decay: engagement / time^gravity
SELECT id, brand, model,
  (owner_count::float / power((extract(epoch from now()) - extract(epoch from created_at)) / 3600.0 + 2, 1.8)) AS hot_score
FROM global_items
ORDER BY hot_score DESC
LIMIT 20;

This requires ownerCount as a real column (not a JOIN-time COUNT) on globalItems. The column already logically exists via join — promote it to a denormalized integer that the collection add/remove service path updates. No trigger needed; update it in the same database transaction as the collection operation.

No new dependency. Schema migration + service-layer update.

Cursor-Based Pagination

Drizzle ORM 0.45.x has documented cursor pagination support (two-column keyset). Use (hotScore DESC, id DESC) for the trending feed and (createdAt DESC, id DESC) for "recently added." Encode cursor as base64 JSON — opaque to the client.

The Hono + Drizzle cursor pattern is documented and actively used in the ecosystem. No pagination library needed.

No new dependency. Drizzle already supports this natively.

Full-Text Catalog Search

globalItems needs fast free-text search across brand + model + description. Use PostgreSQL native tsvector with a GIN index.

Drizzle 0.45.x does not generate GENERATED ALWAYS AS ... STORED syntax for tsvector columns in drizzle-kit. Add the searchVector column and GIN index via a raw SQL migration file (create via drizzle-kit generate then manually add the ALTER TABLE and CREATE INDEX statements to the generated file).

For the Hono route, use Drizzle's sql template tag with to_tsquery:

.where(sql`search_vector @@ plainto_tsquery('english', ${q})`)
.orderBy(sql`ts_rank(search_vector, plainto_tsquery('english', ${q})) DESC`)

No new dependency. Schema migration + raw SQL in service layer.

Feed Client (TanStack Query + IntersectionObserver)

useInfiniteQuery from @tanstack/react-query (already at 5.90.x) handles cursor pagination natively via getNextPageParam. The scroll trigger uses the browser-native IntersectionObserver API — implement a useIntersectionObserver(ref, callback) hook (~12 lines) rather than adding a scroll library. This matches the existing GearBox pattern of minimal third-party UI dependencies.

No new dependency.

3. Catalog Enrichment Infrastructure

Schema Additions to `globalItems`

New fields for attribution, source tracking, and feed ranking:

Field	Type	Purpose
`sourceUrl`	`text`	Canonical product page (retailer or manufacturer)
`sourceAttribution`	`text`	Human-readable credit ("via REI", "via manufacturer")
`imageAttributionUrl`	`text`	URL where product image was originally sourced
`imageAttributionText`	`text`	License or credit line for the image
`submittedByUserId`	`integer FK → users`	Who submitted this catalog entry (null = seeded by admin/agent)
`verifiedAt`	`timestamp`	When an admin approved the entry (null = unverified)
`ownerCount`	`integer NOT NULL DEFAULT 0`	Denormalized count of collection items referencing this
`productUrl`	`text`	Retailer/manufacturer product link (duplicates item-level, but catalog-owned)

These are Drizzle schema additions. No new dependency.

Zod Schemas for Enriched Catalog

Add CreateCatalogItemSchema in src/shared/schemas.ts with attribution fields. Zod 4.3.x handles this natively. The schema feeds the new POST /api/global-items route (currently only GET is public — writes will require auth but open to non-admins for catalog submissions).

4. Agent-Powered Catalog Seeding via MCP

The existing MCP server (@modelcontextprotocol/sdk 1.29.x, 19 tools) already provides the infrastructure. The agent workflow:

Claude agent receives a category or brand as a prompt
Uses a new create_catalog_item MCP tool — purpose-built for globalItems insertion with full attribution fields
Server validates via Zod, inserts into globalItems, updates ownerCount denormalization
Agent uses the existing upload_image_from_url tool to fetch and store product images

The new tool registers identically to existing tools in src/server/mcp/index.ts. Batch seeding sessions: the agent runs N create_catalog_item calls in sequence within one MCP session — no parallel execution framework needed at catalog bootstrap scale.

For standalone seed scripts (bun run src/db/dev-seed.ts extensions), use the Drizzle db instance directly. No external seeding framework.

No new dependency.

5. HTTP Caching for Public Endpoints

Public GET endpoints (discovery feed, catalog detail pages) will be hit by anonymous users repeatedly. Add HTTP-level cache hints to reduce DB round-trips.

Catalog item detail pages (GET /api/global-items/:id): Use Hono's built-in etag() middleware. Content-addressed — returns 304 Not Modified when item hasn't changed.
Discovery feed endpoints (GET /api/discovery/*): Set Cache-Control: public, max-age=60, stale-while-revalidate=300 manually in route handlers. Feed data tolerates 60s staleness.

Do NOT use Hono's cache() middleware — it is platform-specific to Cloudflare Workers and Deno, and silently does nothing on Bun. This is a documented limitation. Known issue #4401 in the Hono repo also shows the etag() middleware can generate inconsistent ETags when combining with other middleware — test in integration tests before shipping.

No new dependency. etag is built into Hono 4.12.x.

6. Rate Limiting for Public Traffic

The existing rateLimit.ts in-process Map handles auth endpoints correctly (5 req/15 min per IP). It is inappropriate for public discovery traffic because:

5 req/15 min is far too strict for anonymous browsing
In-process state resets on server restart (tolerable for auth, wrong for general rate limiting)
No way to differentiate authenticated vs anonymous callers in the current implementation

Recommendation: Keep the existing rateLimit.ts for auth endpoints only. Add hono-rate-limiter for discovery/catalog public endpoints with a permissive anonymous limit (e.g., 100 req/min per IP) and no limit for authenticated callers.

import { rateLimiter } from "hono-rate-limiter";

const discoveryLimiter = rateLimiter({
  windowMs: 60 * 1000,  // 1 minute
  limit: 100,
  keyGenerator: (c) => c.req.header("x-forwarded-for")?.split(",")[0] ?? "unknown",
});

app.use("/api/discovery/*", discoveryLimiter);

The in-process storage adapter (default in hono-rate-limiter) is sufficient for single-instance deployment. If the app scales horizontally, swap to @hono-rate-limiter/redis — but that is a future decision, not a v2.1 concern.

New dependency:

Library	Version	Purpose
`hono-rate-limiter`	`^0.5.3`	Per-route rate limiting with configurable windows for public endpoints

bun add hono-rate-limiter

Full Stack Additions Summary

New Dependencies (v2.1 only)

Library	Version	Purpose	Why
`hono-rate-limiter`	`^0.5.3`	Configurable rate limits for public discovery routes	Existing in-process limiter is auth-only with a 5-req cap; public browse traffic needs separate, permissive limits

No New Dependencies Needed For

Capability	Existing Stack Component Used
Public auth model (`tryAuth` variant)	Hono middleware — no library
Discovery feed cursor pagination	Drizzle 0.45.x cursor pagination docs
Full-text catalog search (tsvector GIN)	PostgreSQL native + Drizzle `sql` template
Trending score computation	PostgreSQL SQL expression — no extension
Infinite scroll client	TanStack Query `useInfiniteQuery` + native IntersectionObserver
Catalog attribution fields	Drizzle schema migration
Agent catalog seeding	Existing MCP SDK + new `create_catalog_item` tool
HTTP cache headers	Hono built-in `etag()` + manual `Cache-Control`
Feed ranking denormalization	Service-layer transaction update (no trigger, no cron)

Schema Changes Required (Not Library Changes)

These are Drizzle schema additions generating migrations:

`globalItems` additions

// In src/db/schema.ts — globalItems table additions
sourceUrl: text("source_url"),
sourceAttribution: text("source_attribution"),
imageAttributionUrl: text("image_attribution_url"),
imageAttributionText: text("image_attribution_text"),
submittedByUserId: integer("submitted_by_user_id").references(() => users.id),
verifiedAt: timestamp("verified_at"),
ownerCount: integer("owner_count").notNull().default(0),
productUrl: text("product_url"),

Raw SQL migration additions (cannot be expressed in Drizzle schema)

-- Add after Drizzle-generated migration runs:

-- Generated tsvector column for full-text search
ALTER TABLE global_items
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english',
      coalesce(brand, '') || ' ' ||
      coalesce(model, '') || ' ' ||
      coalesce(description, '')
    )
  ) STORED;

CREATE INDEX global_items_search_vector_idx ON global_items USING GIN(search_vector);

-- Partial index for public setup discovery feed
CREATE INDEX setups_public_updated_idx ON setups (updated_at DESC) WHERE is_public = true;

-- Trending feed index
CREATE INDEX global_items_owner_count_id_idx ON global_items (owner_count DESC, id DESC);

Note: Drizzle Kit does not generate GENERATED ALWAYS AS ... STORED for tsvector. Add these as a separate raw SQL file appended to the Drizzle migration or as a separate customMigration file in the migrations folder. Run via bun run db:push after the Drizzle migration applies.

`setups` additions

// In src/db/schema.ts — setups table additions
viewCount: integer("view_count").notNull().default(0),

Alternatives Considered

Recommended	Alternative	Why Not
PostgreSQL tsvector + GIN	Meilisearch / Typesense	Separate search service adds infra ops complexity; tsvector covers structured gear catalog search at GearBox scale without additional containers
PostgreSQL tsvector + GIN	pg_textsearch (BM25 extension)	Requires installing a PostgreSQL extension in production; BM25 ranking is unnecessary for a catalog of branded products where exact brand/model matches dominate
Denormalized `ownerCount` column	COUNT JOIN per feed request	Feed queries fire on every anonymous page load; a JOIN COUNT becomes a bottleneck before any other part of the stack does
Native IntersectionObserver hook	react-infinite-scroll-component	Zero-dependency — 12-line hook replaces an 8KB library; consistent with GearBox's minimal-external-dependency UI philosophy
Manual `Cache-Control` headers	Hono `cache()` middleware	Hono `cache()` is Cloudflare Workers/Deno only — silently does nothing on Bun
`hono-rate-limiter` in-process	Redis-backed rate limiter	Single-instance deployment — Redis adds an infra dependency not justified at current scale
Extend existing MCP toolset	Separate seeding CLI script	MCP agents already have auth and structured tool calling; a dedicated `create_catalog_item` tool is cleaner than a one-off script that bypasses the service layer
Service-layer `ownerCount` update	PostgreSQL trigger	Triggers are invisible to the TypeScript codebase, harder to test, and prone to silent failures in complex transactions

What NOT to Add

Avoid	Why	Use Instead
Elasticsearch / OpenSearch	Separate cluster, ops overhead, overkill for a structured product catalog	PostgreSQL tsvector with GIN index
pg_textsearch / VectorChord-BM25	PostgreSQL extension install required in prod; BM25 precision unnecessary for brand+model search	PostgreSQL native `ts_rank`
Hono `cache()` middleware	Platform-specific to Cloudflare/Deno; does nothing on Bun	Manual `Cache-Control` headers in route handlers
react-virtual / windowing	Feed is paginated, not a virtual list; items per page (~20) never hit DOM performance limits	Standard DOM list with cursor pagination
Prisma	Already using Drizzle ORM; two ORMs in one codebase is a maintenance trap	drizzle-orm (existing)
Materialized views for feed caching	drizzle-kit does not fully support materialized view migrations; manual REFRESH logic is brittle	Denormalized score columns + partial indexes
Separate seeding framework (Faker, etc.)	Catalog data is real product data, not fake; agent seeding produces real structured records	MCP `create_catalog_item` tool

Version Compatibility

Package	Current Version	v2.1 Notes
`hono`	4.12.x (4.12.12 latest)	`etag()` built-in available; `cache()` is NOT compatible with Bun — do not use
`drizzle-orm`	0.45.x (0.45.2 latest stable)	Cursor pagination confirmed; generated tsvector column requires raw SQL migration appended to drizzle-kit output
`@tanstack/react-query`	5.90.x	`useInfiniteQuery` with `getNextPageParam` fully supports cursor pattern natively
`hono-rate-limiter`	0.5.3 (latest, published ~16 days ago)	In-process storage adapter works on Bun; actively maintained
`@modelcontextprotocol/sdk`	1.29.x	Existing MCP tooling is sufficient for adding `create_catalog_item`
`zod`	4.3.x	New catalog attribution schemas are straightforward additions to existing `schemas.ts`
`@hono/zod-validator`	0.7.x	Already used for all routes; covers new discovery/catalog endpoints

Installation

# Only one new package for v2.1
bun add hono-rate-limiter

Everything else is schema migrations, new service/route/middleware code, and one new MCP tool — all on the existing stack.

Sources

Drizzle ORM cursor-based pagination — two-column keyset pattern, v0.45.x confirmed (HIGH)
Drizzle ORM PostgreSQL full-text search — tsvector approach confirmed (HIGH)
Drizzle ORM full-text search with generated columns — generated column pattern for tsvector (HIGH)
Hono ETag middleware — built-in, no install required (HIGH)
Hono Cache middleware — explicitly listed as Cloudflare/Deno only, not Bun (HIGH)
Hono ETag issue #4401 — known inconsistency bug in etag middleware (MEDIUM)
hono-rate-limiter GitHub — v0.5.3, active, Bun compatible (HIGH)
hono-rate-limiter npm — version 0.5.3 confirmed (HIGH)
TanStack Query infinite queries — useInfiniteQuery cursor pattern (HIGH)
Drizzle ORM materialized views issue #2653 — confirmed drizzle-kit does not fully support materialized view migrations (MEDIUM)
Hono middleware docs — selective auth middleware pattern (HIGH)
GearBox package.json — all existing dependency versions verified directly (HIGH)
GearBox src/server/index.ts — existing skip-list pattern verified directly (HIGH)
GearBox src/server/middleware/auth.ts — existing three-way auth verified directly (HIGH)
GearBox src/db/schema.ts — existing globalItems table columns verified directly (HIGH)

Stack research for: GearBox v2.1 Public Discovery milestone Researched: 2026-04-09

18 KiB Raw Blame History