docs: complete project research

2026-04-09 14:44:12 +02:00
parent f9c69a1366
commit c4ad5c1b2a
4 changed files with 910 additions and 1592 deletions
--- a/.planning/research/ARCHITECTURE.md
+++ b/.planning/research/ARCHITECTURE.md
--- a/.planning/research/FEATURES.md
+++ b/.planning/research/FEATURES.md
@@ -1,28 +1,26 @@
 # Feature Research

-**Domain:** Multi-user gear management and discovery platform
-**Researched:** 2026-04-03
+**Domain:** Public-first gear discovery platform with catalog enrichment
+**Researched:** 2026-04-09
 **Confidence:** MEDIUM-HIGH
+**Milestone scope:** v2.1 Public Discovery — builds on v2.0 multi-user foundation

 ---

-## Context
+## Context: What Already Exists (v2.0)

-This is the feature research for **v2.0 Platform Foundation** -- transforming GearBox from a single-user gear tracker into a multi-user platform with discovery, global item database, structured reviews, and setup sharing.
+These are shipped. New features below only mention them when v2.1 extends them:

-**Existing features (already built through v1.4):**
- Gear collection CRUD with categories, weight/price, images, quantity
+- Full gear collection CRUD with weight/price tracking, categories, images
 - Planning threads with candidate comparison, ranking, pros/cons, impact preview
- Named setups (loadouts) with classification, donut chart visualization
- Search/filter, CSV import/export, item duplication
- Dashboard home page, onboarding wizard
- Single-user auth (cookie sessions + API keys), MCP server (19 tools)
+- Named setups with classification, donut chart, weight breakdowns
+- PostgreSQL multi-user data model, Logto OIDC external auth
+- S3 image storage (MinIO), global item catalog with tags and search
+- User profiles with avatar/bio, public setup sharing
+- Catalog-driven add flow, global FAB, item/catalog detail pages
+- MCP server (19 tools), API key + OAuth auth methods

-**Key project constraints:**
- No freeform UGC until moderation infrastructure exists (structured input only)
- Discovery-first, not social-first
- External auth provider (self-hosted, open-source)
- Postgres for multi-user platform
+All features below are **new for v2.1** unless explicitly marked "extend existing."

 ---

@@ -30,150 +28,113 @@ This is the feature research for **v2.0 Platform Foundation** -- transforming Ge

 ### Table Stakes (Users Expect These)

-Features users assume exist on any multi-user gear platform. Missing these makes the platform feel broken or pointless.
+Features that public-first gear discovery platforms are expected to provide. Missing these makes the product feel broken or hostile to new visitors.

 | Feature | Why Expected | Complexity | Notes |
 |---------|--------------|------------|-------|
-| **User registration and authentication** | Cannot have multi-user without accounts. Every platform has sign-up/login. | HIGH | External auth provider integration (Authentik, Keycloak, or similar). Replaces current single-user cookie auth. All existing entities need userId FK. |
-| **User profiles (public)** | Every community platform has profiles. Users need identity to share and be discovered. | LOW | Minimal: display name, avatar URL, bio text, joined date. Public profile page lists user's public setups. No follower counts needed. |
-| **Setup visibility controls** | Users will not share setups if they cannot control what is public. Privacy is table stakes for any sharing platform. | LOW | Binary public/private toggle per setup. Default to private (opt-in sharing). Existing setups migrated as private. |
-| **Public setup detail pages** | Shared setup links must resolve to a readable page. If sharing is a feature, the shared thing must be viewable. | MEDIUM | Read-only view with item list, weight/cost totals, donut chart, creator attribution. No auth required for public setups. Extends existing setup detail view. |
-| **Global item database (searchable)** | Users expect to find gear by name rather than entering specs from scratch every time. LighterPack's weakness is fully manual data entry. | HIGH | Central product catalog with brand, model, category, manufacturer weight, MSRP, product URL, image. Users search and link rather than re-enter. Seed with 200-500 items in core categories to bootstrap. This is the foundational dependency for reviews, aggregation, and item detail pages. |
-| **Link personal items to global items** | Once a global DB exists, users expect to connect their gear to canonical entries for richer data. | MEDIUM | Optional FK from user items to global items. Enables aggregation (owner count, avg weight, reviews). Must handle items not yet in global DB gracefully. |
-| **Item detail page (aggregated)** | When browsing gear, clicking an item should show consolidated info: specs, who owns it, ratings. Standard on any product platform. | HIGH | Aggregated view combining: manufacturer specs from global DB, owner count, setup appearances, average ratings, crowd-reported weights. This is the integration hub for all platform features. |
-| **Structured reviews (ratings)** | Any product-oriented community needs evaluation. Users expect to rate gear and see what others think. | MEDIUM | Overall 1-5 star rating plus 3-5 dimension ratings (varies by product category). Attached to global items, not personal items. One review per user per global item. No freeform text per project constraint. |
-| **Discovery browse page** | Users expect a way to find interesting setups and gear beyond their own collection. Without this, multi-user adds no value. | MEDIUM | Not algorithmic for v2.0. Three sections: recent public setups, recently reviewed items, popular gear (most owned). Simple sorted lists with pagination. |
-| **Search global items** | Must be able to find products by name/brand in the global database. Powers linking, browsing, and review discovery. | MEDIUM | Full-text search on name, brand, category. Used in "link my item" flow, discovery browsing, and review lookup. Postgres full-text search or trigram index. |
+| Browse catalog and setups without login | All comparable platforms (Lighterpack shared lists, BikeGearDatabase, RTINGS) allow full read access. Forcing login before browse kills SEO and casual discovery. | LOW | Middleware change: lift auth guard from all GET /api/* endpoints. Public setup sharing already exists at v2.0 — generalize to all read routes. Session-optional pattern already proven. |
+| Discovery landing page with catalog search prominent | RTINGS, Wirecutter, and BikeGearDatabase all lead with search or category browse above the fold. Users arriving from search engines expect to search immediately, not to log in. | MEDIUM | Replace dashboard for unauthenticated visitors. Search bar + tag chips already exist as FAB overlay — promote to inline page hero. Authenticated users still see their dashboard. Route-level auth split. |
+| Contextual auth prompt only on write actions | Users must understand the access model without reading documentation. "Browse freely, sign in to save" must be self-evident. Confusing this causes drop-off. | LOW | Inline "Sign in to add to your collection" CTA on catalog item detail pages. No login wall on any browse action. |
+| Product attribution: brand and manufacturer fields | Any gear database users trust shows where a product originates. Missing attribution makes catalog look scraped or unverifiable. | LOW | Add `brand`, `manufacturer` fields to catalog items schema. Already has `name` — add structured attribution alongside. Display prominently on detail pages and cards. |
+| Image source attribution display | Legal requirement and trust signal. Gear Patrol, BikeGearDatabase, and manufacturer catalogs all credit image source. Omitting creates IP risk on manufacturer-supplied images. | LOW | Add `imageCredit` (display text, e.g. "Apidura") and `imageSourceUrl` fields to catalog items. Display as "Photo: [credit]" beneath product images on detail pages. |
+| Community usage signal on catalog items | Users expect to see "owned by N people" or "in N setups" to gauge real-world adoption. Lighterpack shows this per shared list. RTINGS shows review counts. | LOW | `ownerCount` already exists on catalog items in v2.0. Surface it prominently on catalog cards and detail pages. Add "appears in N setups" count derived from setupItems. |
+| Shareable catalog item and setup URLs resolve without login | Public-first means deep-linking works. If a setup or catalog item URL is shared, it must render for anyone — no login redirect. | LOW | Detail pages already exist at v2.0. Verify: unauthenticated API responses work end-to-end, meta tags render, no auth redirect on page load. Likely already 90% working given public setup sharing. |

 ### Differentiators (Competitive Advantage)

-Features that set GearBox apart from LighterPack, GearGrams, Trailspace, and MyGear. Aligned with core value: "help people make better gear decisions."
+Features that set GearBox apart from Lighterpack (lists only, no catalog), BikeGearDatabase (editorial, not user collections), and generic wishlist tools.

 | Feature | Value Proposition | Complexity | Notes |
 |---------|-------------------|------------|-------|
-| **Crowd-verified specs** | LighterPack trusts user-entered data blindly. GearBox can show "manufacturer says 450g, 12 owners measured avg 478g." Real-world weight verification is unique and high-value for weight-conscious users. | MEDIUM | Aggregate weightGrams from all user items linked to a global item. Compare against manufacturer spec. Display on item detail page. Needs sufficient linked items to be meaningful (threshold: 3+ owners). |
-| **Review dimensions per product category** | Trailspace and OutdoorGearLab use editorial ratings with fixed dimensions. GearBox crowd-sources structured ratings with category-specific dimensions: a tent gets "weather protection, ventilation, setup ease" while a stove gets "boil time, fuel efficiency, packability." More relevant than one-size-fits-all. | MEDIUM | Define 3-5 rating dimensions per product category via admin config. Store dimension ratings alongside overall rating. Display as radar chart or bar chart on item detail page. |
-| **"X people own this" social proof** | Shows popularity and real adoption. No gear tracker does this because they lack a global item database. Simple count, powerful signal. | LOW | Count of users who linked a collection item to this global item. Displayed prominently on item detail page and in search results. Zero implementation complexity once linking exists. |
-| **Setup composition insights** | "This item appears in 47 bikepacking setups, commonly paired with Y and Z." Cross-setup analysis no competitor offers. Answers "what do people use this with?" | MEDIUM | Query across all public setups containing a given global item. Show co-occurrence patterns. Powerful but can be deferred to v2.x if query performance is a concern. |
-| **Setup impact preview with global items** | Already built for personal items. Extending to global items lets users preview "adding this from the store to my setup changes weight by X." Bridges research and collection management. | LOW | Already exists for personal items. Add "preview in my setup" button on global item detail pages. Reuse existing impact preview logic. |
-| **Planning threads with global item integration** | Research threads that pull in specs, reviews, and owner data from the global DB. Candidates link to global items for richer comparison than manual data entry. | MEDIUM | Add optional globalItemId to thread candidates. Auto-populate weight, price, image from global item. Show community ratings and owner count inline on candidates. |
-| **Real-world weight distribution** | Histogram showing "owners report weights between 440g-490g" for a product. Beats a single manufacturer number. Valuable for ultralight community. | LOW | Aggregate weightGrams from all linked items. Display min/max/avg. Histogram if 10+ data points. |
-| **Copy/fork public setups** | Use someone else's setup as a starting template. LighterPack has clunky CSV-based copying. One-click fork is much better UX. | LOW | Create new setup copying all items from a public setup. Items must exist in user's collection (or be linked to same global items). Clear UX for "items you do not own yet." |
+| Discovery landing feed (community setups + catalog items) | No direct competitor combines a global gear catalog with user setup feeds. Lighterpack has no discovery page. BikeGearDatabase is editorial, not community-driven. GearBox can show real user gear choices with weight data. | MEDIUM | Two feed sections: (a) recently shared public setups sorted by recency, filterable by category; (b) popular/new catalog items by ownerCount. No algorithm needed at launch — recency + ownerCount is sufficient and honest. |
+| Agent-powered catalog seeding via MCP tools | Unique to GearBox. No other gear platform has agent-friendly structured import. Enables rapid catalog population by Claude agent swarms without manual data entry. Programmatic SEO value compounds with catalog size. | HIGH | Requires: bulk create MCP tool, structured import with dry-run/preview mode, attribution tracking on agent-inserted records. GearBox already has MCP server and API key auth — foundation exists. |
+| Catalog enrichment infrastructure with provenance tracking | Enables crowd + agent contributions with full source tracking. Comparable to Wikipedia's citation model but structured. Builds long-term trust in catalog data quality. | MEDIUM | New schema fields: `sourceUrl`, `sourceType` (enum: manufacturer / community / agent / import), `contributedBy` (userId or agent identifier string), `verifiedAt`. Migration only, lightweight UI needed initially. |
+| SEO-indexable catalog pages ranking for product searches | Public catalog pages that rank for "[product name] weight specs" are a major organic acquisition channel. RTINGS built a durable traffic moat this way via programmatic SEO. GearBox can do the same for gear. | MEDIUM | Pages already exist. Add: `<title>` tags with product name + category, OG meta tags, JSON-LD Product schema markup. Primary complexity: TanStack Router is client-rendered — crawlers need either SSR or static prerender for bots. This is the phase's primary technical risk. |
+| Setup impact preview teaser on public catalog pages | Showing "add this to your setup and base weight changes by +Xg" is unique. No other gear catalog does this. Showing the feature on public pages teases value and drives sign-up intent. | MEDIUM | Extend existing impact preview (v1.3) to show a teaser CTA on unauthenticated catalog detail pages: "See how this affects your setup → [Sign in to try]". Requires no new backend work — frontend auth-conditional render. |

 ### Anti-Features (Commonly Requested, Often Problematic)

 | Feature | Why Requested | Why Problematic | Alternative |
 |---------|---------------|-----------------|-------------|
-| **Freeform text reviews** | Users want to explain their experience in detail | Requires moderation, spam filtering, content policy, reporting infrastructure. PROJECT.md explicitly defers until moderation exists. | Structured ratings with predefined dimensions. Short predefined tags for pros/cons (e.g., "lightweight", "durable", "runs small"). |
-| **Comments on setups** | Social engagement, questions about gear choices | Moderation burden, notification system, spam, harassment risk. Deferred in PROJECT.md. | Link to user profile. Contact happens outside platform. |
-| **Follow users / activity feed** | Social graph, staying updated on people | Turns a gear tool into a social network. Notification infrastructure, feed ranking, engagement metrics, retention loops. Project decision: discovery-first, not social-first. | Discovery feed shows popular/recent content without requiring social connections. |
-| **Marketplace / buy-sell** | Users want to trade used gear | Payment processing, fraud prevention, disputes, shipping logistics, tax compliance. Massive liability. | Link to product URLs on global items. Users buy through retailers. |
-| **AI gear recommendations** | "What tent should I buy for bikepacking?" | Training data requirements, bias, liability for bad recommendations, hallucination risk. | Global item pages with ratings, owner counts, and setup co-occurrence do implicit recommendation. "People who own X also own Y." |
-| **Wiki-style open item editing** | Community wants to correct/enrich global item specs | Edit wars, vandalism, quality degradation, dispute resolution. PROJECT.md explicitly rules this out. | Structured contributions only: report measured weight, submit rating. Admin approval for spec corrections. Trusted contributor program later. |
-| **Price tracking / deal alerts** | Users want to know when gear goes on sale | Requires scraping retailer sites, fragile, legal gray area, maintenance burden. PROJECT.md rules this out. | Store product URL so users can check prices manually. |
-| **Real-time collaborative setups** | "Plan a group trip together" | WebSocket infrastructure, conflict resolution, permissions model, presence indicators. Massive complexity for niche use case. | Each user builds their own setup. Fork public setups as templates. |
-| **Gamification (badges, points, levels)** | Drive engagement and contributions | Incentivizes quantity over quality. Users game systems for points rather than providing genuine data. Creates toxic dynamics. | Soft social proof: "contributed X reviews" on profile. No points, no leaderboards. |
-| **Instagram-style infinite scroll feed** | Addictive browsing experience | Engagement-maximizing design conflicts with utility-focused tool. Users come to research decisions, not scroll endlessly. | Paginated, filterable discovery page. Browse with intent, not addiction. |
+| Algorithmic feed ranking using engagement signals | "Show popular content" feels natural | Requires engagement data volume that does not exist at v2.1 scale. Empty or manipulated feed is worse than no feed. Gaming and spam risk immediately. | Simple recency + ownerCount sort. Add engagement signals only when data volume and moderation infrastructure justify it. |
+| Open wiki-style catalog editing (anyone edits any item) | Fastest path to catalog enrichment | Data quality collapses without moderation. Adversarial edits, edit wars. Requires revert/history infrastructure. Already decided out-of-scope in PROJECT.md. | Structured contributions: users submit items, agents bulk-seed with attribution, admins verify. provenance fields track every change. |
+| Bulk catalog import from scraped external sources | "Just import all BikeGearDB items" | Copyright risk. Data quality issues. Stale data. Attribution impossible — you do not know who owns the content. Legal exposure. | Agent-seeding via MCP with explicit source tracking. Manual + agent creates clean provenance chain with `sourceUrl` per item. |
+| Real-time "X users viewing this" presence indicators | Social proof, FOMO feeling | Zero signal value at current traffic scale, adds WebSocket complexity, privacy concern for a utility tool. | ownerCount ("X people own this") is sufficient social proof without live presence tracking. |
+| Comments on catalog items or setups | Community enrichment, Q&A | Freeform UGC explicitly blocked in PROJECT.md until moderation infrastructure exists. Moderation requires policy, tooling, reporting. | Structured fields only: tags, ratings, attribution. Defer freeform to future milestone after moderation is designed. |
+| Social follow / activity feed | "See what friends added" | Social graph is a separate product. Deferred explicitly in PROJECT.md. Notification infrastructure, feed ranking, retention loops all out of scope. | Public setup browsing by category or recency is sufficient discovery without requiring a follow graph. |
+| Infinite scroll personalized feed | "Netflix for gear" | Personalization requires user history. Unauthenticated visitors have no history. Personalized recommendations require ML infrastructure far beyond v2.1 scope. | Category-filtered browse + search. Personalization post-login once collection data exists is a v3+ feature. |

 ---

 ## Feature Dependencies

 ```
-[External Auth Provider]
-    |
-    v
-[Multi-User Data Model (userId FK on all entities)]
-    |
-    +---> [Postgres Migration] (concurrent access, auth provider needs Postgres)
-    |
-    +---> [User Profiles (public)]
-    |         |
-    |         +---> [Public Profile Pages]
-    |         |         |
-    |         |         +---> [Discovery Feed (browse users' public content)]
-    |         |
-    |         +---> [Setup Visibility Controls (public/private)]
-    |                   |
-    |                   +---> [Public Setup Detail Pages]
-    |                             |
-    |                             +---> [Copy/Fork Public Setups]
-    |
-    +---> [Global Item Database]
-              |
-              +---> [Search Global Items]
-              |
-              +---> [Link Personal Items to Global Items]
-              |         |
-              |         +---> [Owner Count ("X people own this")]
-              |         |
-              |         +---> [Crowd-Verified Specs (aggregated weight)]
-              |         |
-              |         +---> [Setup Appearances Count]
-              |         |
-              |         +---> [Real-World Weight Distribution]
-              |
-              +---> [Structured Reviews]
-              |         |
-              |         +---> [Review Dimensions per Category]
-              |         |
-              |         +---> [Average Ratings Display]
-              |
-              +---> [Item Detail Pages (aggregated hub)]
-              |         |
-              |         +---> [Setup Composition Insights]
-              |
-              +---> [Planning Thread Global Item Integration]
-                        |
-                        +---> [Candidate Auto-populate from Global DB]
+Public browse without login
+    └──prerequisite for──> Discovery landing page (needs unauth API render)
+    └──prerequisite for──> SEO-indexable catalog pages (bots must reach pages)
+    └──prerequisite for──> Setup impact preview teaser on public pages
+    └──prerequisite for──> Shareable URLs confirmed working without auth
+
+Catalog enrichment schema (attribution fields)
+    └──prerequisite for──> Agent-powered MCP catalog seeding (tools write into these fields)
+    └──prerequisite for──> Image attribution display (imageCredit field must exist)
+    └──prerequisite for──> Source provenance display on detail pages
+
+Agent-powered MCP catalog seeding tools
+    └──requires──> Catalog enrichment schema (attribution fields must exist first)
+    └──enhances──> Discovery landing feed (more items = richer feed)
+    └──enhances──> SEO surface area (more pages = more potential rankings)
+
+Discovery landing page
+    └──requires──> Public browse without login
+    └──requires──> Feed query API (popular setups + recent catalog items)
+    └──uses existing──> Catalog search (FAB overlay promoted to page hero)
+
+SEO metadata on catalog pages
+    └──requires──> Public browse without login (bots must reach pages)
+    └──depends on──> Crawlability solution (SSR or prerender for TanStack Router)
+    └──enhances──> Agent-seeded catalog (more items = more indexed pages)
+
+Setup impact preview teaser (public)
+    └──requires──> Public browse without login
+    └──depends on existing──> Impact preview feature (v1.3, already shipped)
 ```

 ### Dependency Notes

- **Multi-user data model is the absolute foundation.** Every feature depends on userId ownership. Items, setups, threads, categories, reviews -- all need user scoping. This is the biggest single migration.
- **Postgres migration is coupled with auth.** The external auth provider (Authentik, Keycloak) needs Postgres. Migrating the app DB at the same time avoids running two databases. Do these together.
- **Global item database is the second foundation.** Reviews, item detail pages, owner counts, crowd-verified specs, and planning thread integration all depend on canonical global item records. Without this, multi-user is just "LighterPack with accounts."
- **Structured reviews require global items.** Reviews attach to global items, not personal collection items. Otherwise reviews fragment across duplicate user-entered items with no way to aggregate.
- **Item detail pages are the integration point.** They combine global item specs, aggregated user data, reviews, owner count, and setup appearances. Should be built after all data sources exist.
- **Discovery feed requires profiles + public content.** Cannot browse without user identity and visibility controls producing public content to show.
- **Linking is the bridge.** Personal items link to global items. This single FK enables owner count, crowd-verified specs, weight distribution, and setup appearances. Prioritize this flow.
+- **Public browse is the prerequisite for everything.** Auth middleware change must land first. All other v2.1 features depend on unauthenticated API access working correctly.
+- **Catalog enrichment schema must precede agent MCP tools.** The bulk create and import MCP tools write attribution fields. Building tools before schema means schema-breaking changes later.
+- **SEO crawlability is the primary technical risk.** TanStack Router renders client-side. Search engine bots do not execute JavaScript. Without SSR or a static prerender pass, catalog pages will not be indexed. This is a known gap with the current stack — needs a solution before SEO-targeted work makes sense. Defer SEO metadata work to P2 until crawlability is resolved.
+- **Agent seeding is high complexity but high leverage.** It is both a catalog population tool and a v2.1 launch enabler. Without sufficient catalog items, the discovery feed is thin and the platform feels empty. Prioritize MCP tooling early so catalog seeding can run in parallel with UI work.

 ---

 ## MVP Definition

-### Launch With (v2.0 Platform Foundation)
+This is a subsequent milestone on an existing shipped product. MVP here means minimum to deliver the v2.1 goal: public-first discovery platform.

- [ ] **External auth provider integration** -- Nothing works without multi-user identity
- [ ] **Postgres migration** -- Required for concurrent access; auth provider dependency
- [ ] **Multi-user data model** -- userId on items, setups, threads, categories; data isolation
- [ ] **User profiles (minimal)** -- Display name, avatar, bio; public profile page
- [ ] **Setup visibility controls** -- Public/private toggle, default private
- [ ] **Public setup detail pages** -- Shareable read-only view with attribution
- [ ] **Global item database with seed data** -- Schema, admin seeding, search
- [ ] **Link personal items to global items** -- Association flow in collection UI
- [ ] **Structured reviews** -- Overall rating + dimension ratings on global items
- [ ] **Item detail pages** -- Aggregated specs, owner count, average ratings
- [ ] **Discovery browse page** -- Recent public setups, recently reviewed, popular items
+### Launch With (v2.1 core)

-### Add After Validation (v2.x)
+- [ ] Public browse without login — lift auth guard from all GET routes. Every other feature depends on this.
+- [ ] Discovery landing page — replaces dashboard for unauthenticated visitors. Catalog search hero + two feed sections (recent setups, popular catalog items). Recency + ownerCount sort, no algorithm.
+- [ ] Catalog enrichment schema migration — add `brand`, `manufacturer`, `sourceUrl`, `sourceType`, `imageCredit`, `imageSourceUrl`, `contributedBy` fields. Schema first, UI follows.
+- [ ] Image attribution display on catalog detail pages — "Photo: [credit]" below product images, sourced from new `imageCredit` field.
+- [ ] Agent MCP catalog seeding tools — bulk create endpoint/tool, structured import with attribution, dry-run/preview mode, batch result reporting.
+- [ ] Initial catalog population via agent — run agent seeding for 3-5 priority categories (bikepacking bags, tents, sleeping bags, navigation devices, cycling computers). Target: 100+ catalog items with attribution.
+- [ ] Community usage signals surfaced — ownerCount and "appears in N setups" count prominent on catalog cards and detail pages.

- [ ] **Crowd-verified specs display** -- "Manufacturer: 450g, Community avg: 478g" (needs 3+ owners per item to be meaningful)
- [ ] **Setup composition insights** -- "Commonly paired with" co-occurrence analysis
- [ ] **Planning thread global item integration** -- Candidates auto-populate from global DB
- [ ] **Popular gear rankings by category** -- Most owned, highest rated per category
- [ ] **Copy/fork public setups** -- One-click template from public setups
- [ ] **Review dimension customization** -- Admin configures rating dimensions per product category
- [ ] **Real-world weight distribution** -- Histogram on item detail pages
- [ ] **Global item suggestion workflow** -- Users propose new items for admin review
+### Add After Core is Stable (v2.1.x)

-### Future Consideration (v3+)
+- [ ] Contextual "See how this affects your setup" CTA on public catalog pages — setup impact preview teaser with login prompt. Add once public browse is confirmed stable.
+- [ ] Manufacturer/brand filter on catalog browse — add brand as a filterable facet. Only valuable once catalog volume justifies filtering (target: after initial seeding).
+- [ ] SEO metadata on catalog pages — `<title>`, OG tags, JSON-LD Product schema. Add after crawlability solution is determined.

- [ ] **Freeform reviews with moderation** -- After moderation infrastructure exists
- [ ] **Comments on setups** -- After moderation infrastructure exists
- [ ] **Follow users / activity feed** -- After discovery model is validated
- [ ] **OAuth / social login** -- After external auth provider is stable
- [ ] **Trusted contributor program** -- Verified users can edit global item specs
+### Future Consideration (v2.2+)
+
+- [ ] Personalized discovery feed post-login — requires collection data volume and recommendation design.
+- [ ] Verified catalog item badge — admin-marked verified items. Requires admin tooling.
+- [ ] User-submitted catalog enrichment — structured form to suggest corrections or add missing items. Requires contribution review workflow.
+- [ ] Engagement signals in feed — view count, saves. Requires data volume to be meaningful.

 ---

@@ -181,122 +142,57 @@ Features that set GearBox apart from LighterPack, GearGrams, Trailspace, and MyG

 | Feature | User Value | Implementation Cost | Priority |
 |---------|------------|---------------------|----------|
-| External auth provider | HIGH | HIGH | P1 |
-| Postgres migration | HIGH | HIGH | P1 |
-| Multi-user data model (userId on entities) | HIGH | HIGH | P1 |
-| User profiles (basic) | HIGH | LOW | P1 |
-| Setup visibility controls | HIGH | LOW | P1 |
-| Public setup detail pages | HIGH | MEDIUM | P1 |
-| Global item database (schema + seed) | HIGH | HIGH | P1 |
-| Link personal items to global items | HIGH | MEDIUM | P1 |
-| Search global items | HIGH | MEDIUM | P1 |
-| Structured reviews | HIGH | MEDIUM | P1 |
-| Item detail pages (aggregated) | HIGH | HIGH | P1 |
-| Discovery browse page | MEDIUM | MEDIUM | P1 |
-| Crowd-verified specs | HIGH | LOW | P2 |
-| Setup composition insights | MEDIUM | MEDIUM | P2 |
-| Planning thread global DB integration | MEDIUM | MEDIUM | P2 |
-| Copy/fork public setups | MEDIUM | LOW | P2 |
-| Popular gear rankings | MEDIUM | LOW | P2 |
-| Freeform reviews + moderation | MEDIUM | HIGH | P3 |
-| Follow users | LOW | MEDIUM | P3 |
-| Setup comments | LOW | MEDIUM | P3 |
+| Public browse without login | HIGH | LOW | P1 |
+| Discovery landing page | HIGH | MEDIUM | P1 |
+| Catalog enrichment schema (attribution fields) | HIGH | LOW | P1 |
+| Image attribution display | MEDIUM | LOW | P1 |
+| Agent MCP catalog seeding tools | HIGH | HIGH | P1 |
+| Initial catalog population (agent run) | HIGH | MEDIUM (depends on MCP tools) | P1 |
+| Community usage signals (ownerCount visible) | MEDIUM | LOW | P1 |
+| Shareable URL audit (confirm unauth render) | HIGH | LOW | P1 |
+| Setup impact preview teaser (public) | MEDIUM | MEDIUM | P2 |
+| Brand/manufacturer filter on catalog browse | LOW | LOW | P2 |
+| SEO metadata on catalog pages | MEDIUM | MEDIUM (crawlability dependency) | P2 |
+| Personalized discovery feed | MEDIUM | HIGH | P3 |
+| Verified catalog badge | LOW | MEDIUM | P3 |
+| User-submitted enrichment form | LOW | MEDIUM | P3 |

 **Priority key:**
- P1: Must have for v2.0 platform launch
- P2: Should have, add in v2.x once core is validated
- P3: Future consideration, requires new infrastructure (moderation, notifications)
+- P1: Required for v2.1 milestone goal
+- P2: Add once v2.1 core is validated
+- P3: Future consideration, requires new infrastructure

 ---

 ## Competitor Feature Analysis

-| Feature | LighterPack | GearGrams | Trailspace | MyGear | GearBox v2.0 |
-|---------|-------------|-----------|------------|--------|-------------|
-| Gear lists/setups | Yes, drag-and-drop | Yes, trip-based | No (review only) | Yes, "Locker" | Yes, named setups with classification |
-| Weight tracking | Base/worn/consumable | Carried/worn/consumable | No | Basic | Base/worn/consumable + unit conversion + donut charts |
-| User profiles | Minimal (no bio) | Minimal | Review history page | Full social profile | Display name, avatar, bio, public setups |
-| Sharing | Public link, embed code | Public link | N/A | Social feed posts | Public/private toggle, shareable URLs |
-| Global item database | No (all user-entered) | No | Yes (editorial catalog) | No | Yes, seeded + crowd-enriched with verified specs |
-| Structured reviews | No | No | Yes (summary/pros/cons + rating) | Basic star rating | Dimension ratings per product category |
-| Item aggregation | No | No | Editorial scores only | No | Owner count, avg weight, setup appearances, crowd specs |
-| Discovery/browse | No | No | Browse by category | AI-tagged social feed | Browse setups, items, popular gear (intent-driven, not feed) |
-| Purchase research | No | No | Price comparison links | No | Planning threads with candidates, ranking, impact preview |
-| Crowd-verified specs | No | No | No | No | Manufacturer vs. community-measured weight comparison |
-| Mobile app | No | Yes (iOS/Android) | No | Yes (iOS/Android) | No (responsive web, per project constraint) |
-
-### Competitive Positioning
-
-GearBox occupies a unique niche: the only platform combining **gear management** (LighterPack's strength), **structured community reviews** (Trailspace's strength), and **crowd-verified specs** (nobody does this). The planning threads feature has no direct competitor equivalent in the gear domain.
-
-**Key advantages over each competitor:**
- **vs. LighterPack:** Global item database eliminates manual spec entry. Multi-user with profiles and sharing. Structured reviews provide community intelligence.
- **vs. GearGrams:** Richer comparison tools (planning threads). Crowd-verified specs. Item detail pages with aggregated data.
- **vs. Trailspace:** Not just reviews -- full gear management and setup composition. Users own and track their gear, not just review it. Crowd ratings, not editorial-only.
- **vs. MyGear:** Not social-first (no engagement loops, no AI tagging gimmicks). Utility-focused: research decisions, verify specs, compare options. Hobby-agnostic data model.
-
-**Accepted gaps:**
- No mobile native app (web-first, responsive design sufficient per project constraints)
- No social feed in the Instagram sense (intentional: discovery-first, not social-first)
- No freeform text content (intentional: structured input only until moderation exists)
-
---
-
-## Implementation Notes for Key Features
-
-### Global Item Database Schema
-
-The global item table is distinct from user items. It represents canonical products:
-
- `globalItems`: id, brand, model, name (display), categoryId, manufacturerWeightGrams, manufacturerPriceCents, productUrl, imageFilename, description, createdAt, updatedAt, createdByUserId
- User items get optional `globalItemId` FK for linking
- Admin-seeded initially; later users can suggest additions via a proposal workflow
-
-### Structured Review Schema
-
- `reviews`: id, userId, globalItemId, overallRating (1-5), createdAt, updatedAt
- `reviewDimensionRatings`: id, reviewId, dimensionId, rating (1-5)
- `reviewDimensions`: id, categoryId, name (e.g., "durability", "packability"), sortOrder
- Unique constraint: one review per user per global item
- Dimensions are per-category, admin-defined
-
-### Discovery Feed Approach
-
-Not a personalized algorithmic feed. Three content streams, each a simple sorted query:
-
-1. **Recent public setups** -- ORDER BY createdAt DESC, paginated
-2. **Recently reviewed items** -- Global items with recent reviews, ORDER BY latest review date
-3. **Popular gear** -- Global items ORDER BY linked owner count DESC
-
-No recommendation engine. No engagement scoring. Users browse with intent.
-
-### User Profile Data
-
-Minimal profile extending the auth provider's user record:
-
- Display name (from auth provider or custom)
- Avatar URL (from auth provider or uploaded)
- Bio (short text, 280 char limit)
- Joined date
- Public setups list (derived from setup visibility)
- Review count (derived)
- Collection size (count of items, public stat)
+| Feature | Lighterpack | BikeGearDatabase | RTINGS | GearBox v2.1 |
+|---------|-------------|------------------|--------|--------------|
+| Browse without login | Yes (shared list links only) | Yes (all content public) | Yes (fully public) | Yes — all catalog + setups public |
+| Discovery landing page | No (login required to see anything) | Yes (editorial feed + categories) | Yes (category browse + new/updated) | Yes — catalog search hero + community feed |
+| Global gear catalog | No (fully user-entered) | Editorial reviews only | Product test database | Yes — crowd + agent-seeded with attribution |
+| Image attribution | N/A (no images) | Editorial photo credit | Manufacturer-supplied images | Explicit imageCredit + imageSourceUrl fields |
+| Community setups visible publicly | Yes (shared list links) | No | No | Yes — public setups with weight data |
+| Setup weight analysis | Yes (per list) | No | No | Yes + impact preview |
+| Agent-friendly catalog API (MCP) | No | No | No | Yes — unique differentiator |
+| SEO catalog pages | No | Yes (editorial articles) | Yes (programmatic product pages) | Target for v2.1.x after crawlability resolved |
+| Provenance / source tracking | No | Editorial byline only | "Tested by RTINGS staff" | Yes — sourceType enum, contributedBy, sourceUrl |

 ---

 ## Sources

- [LighterPack](https://lighterpack.com/) -- Gear list builder, community standard for ultralight hikers. Public sharing via link, no profiles or reviews.
- [LighterPack tutorial (99Boulders)](https://www.99boulders.com/lighterpack-tutorial) -- Feature overview including sharing, linking, limitations.
- [GearGrams](https://www.geargrams.com/) -- Trip-based gear list tracker with weight classification.
- [Trailspace](https://www.trailspace.com/) -- User gear reviews with structured Summary/Pros/Cons format and Review Corps program.
- [Trailspace Review Form](https://www.trailspace.com/blog/2012/02/29/new-gear-review-form.html) -- Details on structured review fields with category-specific suggestions.
- [MyGear](https://mygear.world/) -- Social app for sports gear with Locker, feed, AI gear recognition, challenges.
- [Outdoor Gear Lab](https://www.outdoorgearlab.com/) -- Professional structured gear reviews with side-by-side comparison methodology.
- [Ultralight App](https://trailsmag.net/blogs/hiker-box/ultralight-the-gear-tracking-app-i-m-leaving-lighterpack-for) -- LighterPack alternative analysis showing community pain points.
- [Ready Set Sim](https://www.readysetsim.com/) -- Sim racing gear profiles and build sharing (cross-domain reference for hobby-agnostic patterns).
- [GetStream Social Feed Architecture](https://getstream.io/blog/social-media-feed/) -- Feed implementation patterns and anti-patterns.
+- [LighterPack](https://lighterpack.com/) — public list sharing model, community usage patterns. Public browse only via shared links, no general discovery. (MEDIUM confidence, WebSearch)
+- [Bike Gear Database](https://www.bikegeardatabase.com/) — public editorial gear catalog, category browse patterns, ~30k monthly visitors. (MEDIUM confidence, WebSearch)
+- [RTINGS SEO Case Study — Ahrefs](https://ahrefs.com/blog/rtings-seo-case-study/) — programmatic SEO via catalog pages, category-based navigation, discovery-oriented layout. (MEDIUM confidence)
+- [NN/G E-commerce Homepages and Listing Pages](https://www.nngroup.com/articles/ecommerce-homepages-listing-pages/) — subcategory surfacing above listings improves discoverability; 30-50% of product interactions come from unintended category navigation. (HIGH confidence)
+- [Sales Layer MCP Server for catalog enrichment](https://www.saleslayer.com/ai-pim/mcp) — agent-powered product information management, bulk update patterns, audit and quality scoring via MCP tools. (MEDIUM confidence)
+- [Creative Commons Attribution Best Practices](https://wiki.creativecommons.org/wiki/Recommended_practices_for_attribution) — TASL attribution standard; attribution must be visible and associated with the image. (HIGH confidence)
+- [Pixsy Image Credits Guide](https://www.pixsy.com/image-licensing/image-credits) — legal requirements and UX placement for image credits; "image courtesy of" as standard phrasing. (HIGH confidence)
+- [GS1 Image Standards](https://orbitvu.com/blog/gs1-image-standards-how-automation-can-help-effective-product-representation/) — product image metadata standards including GTIN linkage and consistent attribution for catalog platforms. (MEDIUM confidence)
+- PROJECT.md — existing feature set, out-of-scope decisions, constraints, v2.1 milestone definition. (HIGH confidence, first-party)

 ---
-*Feature research for: GearBox v2.0 Platform Foundation -- multi-user gear discovery platform*
-*Researched: 2026-04-03*
+
+*Feature research for: GearBox v2.1 Public Discovery — public-first gear discovery platform*
+*Researched: 2026-04-09*
--- a/.planning/research/PITFALLS.md
+++ b/.planning/research/PITFALLS.md
@@ -1,314 +1,187 @@
 # Pitfalls Research

-**Domain:** Single-user to multi-user gear platform migration (GearBox v2.0)
-**Researched:** 2026-04-03
-**Confidence:** HIGH (based on direct codebase analysis of v1.4 + established migration patterns)
+**Domain:** Public-first discovery platform with catalog enrichment (GearBox v2.1)
+**Researched:** 2026-04-09
+**Confidence:** HIGH (based on direct codebase inspection of v2.0 + verified ecosystem patterns)
+
+> v2.0 migration pitfalls (SQLite→Postgres, single→multi-user) are archived in git history.
+> This document covers pitfalls specific to the v2.1 milestone: public access model, discovery feed, catalog enrichment, and agent-powered seeding.
+
+---

 ## Critical Pitfalls

-### Pitfall 1: Missing userId Filters Leak Data Between Users
+### Pitfall 1: Frontend Auth Guard Blocks All New Public Routes

 **What goes wrong:**
-Every query in the existing codebase operates without a `userId` filter. After adding `userId` columns to `items`, `categories`, `threads`, `setups`, and `settings`, any service function not updated to filter by `userId` will return or mutate other users' data. The current `getAllItems()` returns `db.select().from(items).innerJoin(...)` with zero WHERE clauses. One missed function means User A sees User B's gear.
-
-The surface area is large: 6 service files, 19 MCP tools, 7 route files, aggregate queries in `totals`, the `duplicateItem` function, the `getCollectionSummary` MCP resource, setup-item joins, and thread resolution (which creates a new item).
+The root layout (`__root.tsx`) hard-redirects any unauthenticated visitor to `/login` unless they are already on `/users/*` or `/login`. When public routes are added — a discovery landing page at `/`, a public catalog at `/global-items/` that is meant to be the new entry point — they will silently redirect anonymous users before rendering anything. The server already correctly skips auth middleware for `GET /api/global-items` (line 136 of `src/server/index.ts`), but the frontend guard is a separate allowlist that has not been updated.

 **Why it happens:**
-Developers add `userId` to the schema, update the obvious CRUD functions, but miss edge cases. The codebase has enough query sites (~30+) that manual "find all queries" misses something. Thread resolution is particularly dangerous because it creates an item as a side effect of updating a thread.
+The client-side guard and the server-side middleware allowlist live in different files (`__root.tsx` vs `server/index.ts`) and can drift. Developers add routes to the server-side skip list but forget the frontend guard, then wonder why authenticated users see the feature but unauthenticated visitors hit the login page.

 **How to avoid:**
-1. Enable Postgres Row-Level Security (RLS) as a safety net -- even if the app filters by `userId`, RLS prevents cross-user access at the database level.
-2. Add `userId` as NOT NULL to the Drizzle schema first, then use TypeScript compiler errors to find every query that needs updating (insert calls will fail where `userId` is required but not provided).
-3. Write one integration test per entity: create data as User A, query as User B, assert empty results.
-4. Grep the codebase for every `.from(items)`, `.from(categories)`, `.from(threads)`, `.from(setups)`, `.from(settings)` and verify each has a `userId` filter.
+Refactor the auth guard before building any public UI. Invert the logic: instead of allowlisting public routes, define a small `PROTECTED_ROUTES` set (collection, planning, settings, threads) and use TanStack Router's `beforeLoad` to protect those specific routes. Everything else renders without auth. The root layout should not gate render — it should only determine which UI chrome elements to show based on auth state.

 **Warning signs:**
- Any service function that does not accept a `userId` parameter after migration.
- Tests that pass without specifying which user is performing the action.
- MCP tools that work without user context.
+- Loading `/global-items/` in a private browser window redirects to `/login`
+- The `isPublicRoute` check in `__root.tsx` is a string allowlist that grows as features are added
+- New routes work for authenticated users but are invisible to anonymous users during testing

 **Phase to address:**
-Multi-user data model phase. This is the single most important thing to get right. Do not add public content or discovery features until every query is provably user-scoped.
+Public access auth model phase — must be the first change made. Every other public feature depends on this being correct.

 ---

-### Pitfall 2: Category Name Uniqueness Breaks in Multi-User
+### Pitfall 2: `useAuth()` Spinner Blocks Public Page First Contentful Paint

 **What goes wrong:**
-The current schema has `name: text("name").notNull().unique()` on the `categories` table -- a global unique constraint. When User A creates a "Bikepacking" category, User B cannot. The migration must change this to a composite unique constraint on `(userId, name)`.
+The root layout shows a full-screen spinner while `useAuth()` resolves. For authenticated users this is imperceptible (~50ms for a cached session). For anonymous visitors on a public discovery page, this is 300–800ms of blank white screen before any content appears — because the auth check hits `/api/auth/me` which must complete before the page renders. This directly undercuts "public-first" positioning.
+
+Additionally, `useOnboardingComplete()` fires for all users. For anonymous visitors it will hit an auth-required endpoint and produce a 401. Even though it is conditionally rendered, verify the hook itself does not fetch when `isAuthenticated` is false.

 **Why it happens:**
-Single-user apps use simple unique constraints. Developers add `userId` to the table but forget to update the unique constraint from `unique(name)` to `unique(userId, name)`. The migration runs fine on an empty database but fails the moment a second user creates a category with a common name.
+Login-first apps legitimately gate the entire UI on auth resolution — there is nothing useful to show an unauthenticated user. The same pattern applied to a public discovery page creates a perceived login wall.

 **How to avoid:**
-Audit every `.unique()` constraint in the schema during migration. `categories.name` must become a composite unique on `(userId, name)`. The `users.username` unique stays global (desired). No other tables currently have unique constraints, but new tables (reviews, products) should use composite uniqueness from the start.
+Public routes must render immediately with unauthenticated defaults. Auth state loads in the background and hydrates progressive elements (nav user avatar, "Add to collection" CTAs) without blocking content. Use React Query's `enabled: isAuthenticated` on all hooks that call auth-required endpoints. The `useAuth()` query itself should never block page render — only auth-gated actions should wait on it.

 **Warning signs:**
- Database constraint errors when a second user creates categories.
- Tests that only ever use one user.
+- Full-screen spinner visible to anonymous visitors on the landing page
+- Lighthouse FCP score degrades after the public access change
+- Network tab shows 401 on `/api/settings` or `/api/totals` for logged-out users

 **Phase to address:**
-Multi-user data model phase, during schema migration.
+Public access auth model phase — same as Pitfall 1, tackled together.

 ---

-### Pitfall 3: Drizzle Schema Rewrite Is a Replacement, Not a Migration
+### Pitfall 3: Root-Level Components Fire Auth-Required Queries for Anonymous Users

 **What goes wrong:**
-Drizzle ORM schemas are dialect-specific. The current schema imports from `drizzle-orm/sqlite-core` and uses `sqliteTable`, `integer().primaryKey({ autoIncrement: true })`, and `real()`. The Postgres schema must import from `drizzle-orm/pg-core` and use `pgTable`, `serial()` or `integer().generatedAlwaysAsIdentity()`, and `doublePrecision()`. This is not a migration Drizzle can auto-generate -- it requires a full schema rewrite and a fresh migration history.
-
-Specific differences that will cause bugs if missed:
- `integer("id").primaryKey({ autoIncrement: true })` becomes `serial("id").primaryKey()` or `integer("id").primaryKey().generatedAlwaysAsIdentity()`.
- `integer("created_at", { mode: "timestamp" })` -- SQLite stores timestamps as integers. Postgres has native `timestamp` type. Must decide: keep integer storage or switch to Postgres `timestamp()`.
- `real("weight_grams")` -- SQLite `REAL` is 8-byte float. Postgres `real` is 4-byte float (less precision). Use `doublePrecision()` for equivalent behavior.
- SQLite `text("status")` with string values works as pseudo-enum. Postgres has native `pgEnum` for type safety.
- The `Db` type alias (`typeof prodDb`) changes entirely -- every service file and MCP tool imports this type.
+`TotalsBar` is rendered at the root layout level for all routes and calls `useTotals()` which hits `GET /api/totals`. The auth middleware does not skip `/api/totals` for GET requests (verified in `server/index.ts`) — it requires a `userId`. Anonymous visitors will receive a 401 on every public page load, and React Query will retry the failed query three times. `FabMenu`, `CatalogSearchOverlay`, `AddToCollectionModal`, and `AddToThreadModal` are similarly rendered at root level and may trigger auth-gated operations.

 **Why it happens:**
-Developers assume Drizzle abstracts away database differences. It does not at the schema layer. The query builder is mostly compatible, but schema definition is dialect-specific by design.
+Root layout components were designed when every user was authenticated. Adding public routes does not automatically suppress these components' data fetches.

 **How to avoid:**
-1. Write a new `schema.ts` from scratch using `pg-core`, not edit the existing one.
-2. Start a fresh Drizzle migration history for Postgres. SQLite migrations are irrelevant.
-3. Write a data migration script that reads from old SQLite and inserts into new Postgres.
-4. Update the `Db` type alias in all service files.
-5. Use `doublePrecision()` not `real()` for weight values to maintain precision parity with SQLite.
+Audit every component rendered in the root layout. For each one: (1) does it make an API call? (2) does that endpoint require auth? If yes, add `enabled: isAuthenticated` to the query, or conditionally render the component itself behind `{isAuthenticated && <TotalsBar />}`. `TotalsBar` should not appear on the new public discovery landing page at all — it is a user-specific widget.

 **Warning signs:**
- Weight values losing precision (245.5g becoming 245.49999...).
- Timestamps behaving differently (integer epoch vs. native timestamp).
- drizzle-kit refusing to generate migrations against the wrong dialect.
+- Network tab shows 401 on `/api/totals` for anonymous users
+- React Query error boundaries firing on public pages for components that are not relevant to anonymous users
+- Console shows `[auth] OIDC auth failed` log spam from root-level queries

 **Phase to address:**
-Database migration phase. Must complete before any other v2.0 feature.
+Public access auth model phase — audit and guard every root-level component before deploying the public landing page.

 ---

-### Pitfall 4: Test Infrastructure Collapses During Database Switch
+### Pitfall 4: Discovery Feed Built as Per-Card Queries (N+1)

 **What goes wrong:**
-The entire test infrastructure is built on SQLite. `createTestDb()` uses `bun:sqlite` with `Database(":memory:")` and `drizzle-orm/bun-sqlite`. E2E tests use a file-based SQLite (`e2e/test.db`). After switching to Postgres, every test needs a Postgres connection -- no more in-memory databases.
+A discovery feed showing popular public setups or recently added catalog items typically starts as a list query followed by per-item detail fetches. For example: `getAllPublicSetups()` returns 20 setup IDs, then the frontend or backend fetches each setup's item count, owner display name, and total weight individually. At 20 items this is invisible; at 100+ items or with multiple feed sections it causes 2+ second response times and high DB connection pressure.

-The MCP server hard-codes `db as prodDb` which is an SQLite Drizzle instance. The Hono context variable type for `db` changes. Every route handler that does `c.get("db")` gets a different type.
+The existing `getPublicSetupWithItems()` service function is optimized for a single-setup detail view. Reusing it in a loop for a feed is the most common trap.

 **Why it happens:**
-In-memory SQLite is the best testing story in the Bun ecosystem -- fast, isolated, no external services. Postgres testing requires either: (a) a running Postgres instance, (b) testcontainers with Docker, or (c) PGlite (lightweight Postgres in WebAssembly). Developers delay updating tests and end up with a broken test suite for weeks.
+Developers reach for familiar service functions. The function works. Performance issues only appear under real data volumes, not in development with 3 test setups.

 **How to avoid:**
-1. Adopt PGlite (`@electric-sql/pglite`) for unit/integration tests. It provides in-memory Postgres without Docker. Drizzle supports PGlite via `drizzle-orm/pglite`.
-2. Update `createTestDb()` to use PGlite instead of bun:sqlite.
-3. For E2E tests, use Docker Compose with a test Postgres instance, or PGlite if performance is acceptable.
-4. Update the Hono context variable type to the new Postgres Drizzle instance type.
-5. Migrate test infrastructure in the same phase as the schema, not after.
+Write dedicated feed query functions using Drizzle joins from day one. A single SQL query should return all feed cards with their aggregates (item count, total weight in grams, owner display name). Add PostgreSQL indexes on `setups.is_public`, `setups.created_at`, and `setups.updated_at` before building the feed query. Mirror the pattern already used for aggregate totals (computed via SQL on read, not stored).

 **Warning signs:**
- `bun test` fails across the board after schema change.
- "Type 'BunSQLiteDatabase' is not assignable to type 'PgDatabase'" errors everywhere.
- E2E tests silently skipped or disabled "temporarily."
+- Feed query time scales linearly with results count
+- `pg_stat_statements` shows repeated single-row lookups for users or items
+- Adding a second feed section doubles total response time

 **Phase to address:**
-Database migration phase. Tests must migrate alongside the schema.
+Discovery landing page phase — design feed queries as joins from the first implementation, not as a later optimization.

 ---

-### Pitfall 5: Auth Provider Integration Breaks Existing Sessions, API Keys, and MCP
+### Pitfall 5: Image Attribution Stored as Unstructured Text

 **What goes wrong:**
-The current auth stores users, sessions, and API keys in the local database. Switching to an external auth provider means: (1) user identity moves external, (2) session management changes (JWT or OAuth flow vs. cookie sessions), (3) existing API keys become orphaned because they reference the old user table, (4) the MCP server authenticates via API keys stored locally, (5) E2E tests authenticate via `POST /api/auth/login` with a seeded user, (6) the onboarding flow (`POST /api/auth/setup`) creates the first user.
+If image attribution for catalog items is stored as a single `attribution: text` field (the fastest approach), it becomes impossible to: programmatically render a copyright badge, distinguish manufacturer press images from community uploads from AI-generated placeholders, enforce a "no scraped retailer images" policy, or filter catalog items by image source type. Agent-seeded catalog items will have inconsistent attribution formats that are expensive to clean up retroactively.
+
+The current `globalItems` schema has only `imageUrl: text`. There is no `imageSourceType` or structured attribution.

 **Why it happens:**
-Auth migration is treated as "swap the login page" when it touches the entire authentication surface: user identity, session lifecycle, API key management, MCP authentication, E2E test setup, and onboarding.
+"We'll add a text note" is the zero-friction path. Attribution structure seems like a nice-to-have until you need to answer "how many catalog items have manufacturer-licensed images?" or build a compliance filter.

 **How to avoid:**
-1. Keep API keys in the local database even after auth moves external. API keys are long-lived credentials managed by the application, not the auth provider.
-2. Map external provider user IDs to a local `users` table. The external provider handles authentication; the local table handles application-level data (userId foreign keys, API keys, preferences). Foreign keys reference local `users.id`, not the provider's UUID.
-3. Replace the onboarding flow: instead of "create admin account," it becomes "sign up via external provider, first user gets admin role."
-4. Update E2E tests to either mock the auth provider or use API key authentication exclusively for E2E.
+Define a structured attribution model at schema design time before any seeding. Minimum: `imageSourceType: text` (enum: `manufacturer`, `community`, `agent_seeded`, `no_image`), `imageAttribution: text` (human-readable credit line), and `imageSourceUrl: text` (already exists on items but not on globalItems). This allows source-type-specific rendering and filtering without a schema migration mid-catalog-build.

 **Warning signs:**
- MCP server stops working after auth migration.
- E2E tests that log in via `POST /api/auth/login` all fail.
- API keys created before migration stop working.
- No local `users` table -- everything delegated to external provider.
+- Seeding agent instructions say "put attribution in the description field"
+- Catalog items display images without any credit indication
+- No way to query "show me only manufacturer-sourced images"

 **Phase to address:**
-Auth migration phase. Should be done early because user identity is the foundation.
+Catalog enrichment infrastructure phase — schema changes must be in the migration before any seeding begins.

 ---

-### Pitfall 6: Global Item Database Creates a Data Model Fork
+### Pitfall 6: Agent Catalog Seeding Creates Duplicate Global Items

 **What goes wrong:**
-The current `items` table represents user-owned gear. The v2.0 vision includes a "global item database" with manufacturer specs. These are fundamentally different entities: a user's item has quantity, personal notes, setup associations, and belongs to a user. A global item is a product definition with canonical specs, owned by nobody. Conflating them in one table (via `isGlobal` flag or `NULL userId`) creates an unmaintainable mess. Separating them creates a sync problem.
+Without a unique constraint on `(brand, model)` in the `globalItems` table (which currently has none), running an MCP agent seeding pass twice creates duplicate rows for the same product. Agents also retry on API errors, compounding the issue. The current `create_item` MCP tool creates a new row unconditionally — it was designed for personal collection management where duplicates are intentional (a user can own two of the same item). Reusing it for catalog seeding carries no deduplication.

 **Why it happens:**
-It seems efficient to add an `isGlobal` flag. But then queries need to handle both cases, user items need to link to global items for spec inheritance, and the API surface doubles with different permission models.
+The catalog seeding flow is built on top of existing personal item tools because they are already available via MCP. The semantic mismatch (user-owned vs. global reference item) is not obvious until duplicates appear.

 **How to avoid:**
-1. Create a separate `products` table for the global database. A product has: name, manufacturer, canonical weight, canonical price, product URL, image, category.
-2. User `items` gets a nullable `productId` foreign key. When set, the item inherits specs from the product but can override them (user's measured weight vs. manufacturer spec).
-3. User items without a `productId` are standalone (backward-compatible with all existing items).
-4. Reviews, owner counts, and setup appearances link to `products`, not user `items`.
+Add a unique constraint on `globalItems(brand, model)` as part of the catalog enrichment schema migration. Create a dedicated `upsert_catalog_item` MCP tool or admin API endpoint that uses `ON CONFLICT (brand, model) DO UPDATE` semantics. This tool should be explicitly different from personal collection tools: no `userId`, upsert behavior, admin-scoped access.

 **Warning signs:**
- `items` table query complexity increases beyond what is reasonable.
- Ambiguity about whether an operation affects "my item" or "the global product."
- Permission model becomes unclear (who can edit a global product?).
+- Catalog search returns two entries for the same product ("Apidura Backcountry Food Pouch")
+- Owner count on a duplicate item is 0 because user-owned items link to the wrong copy
+- Re-running a seed script doubles the catalog size

 **Phase to address:**
-Global item database phase. Must come after multi-user data model is stable.
+Catalog enrichment infrastructure phase — unique constraint and upsert endpoint before any agent seeding run.

 ---

-### Pitfall 7: Image Storage Migration Breaks Existing URLs and the MCP Tool
+### Pitfall 7: Storing Third-Party Product Images in S3 Creates Legal and Cost Exposure

 **What goes wrong:**
-Images are stored in `./uploads/` on the filesystem, served via `app.use("/uploads/*", serveStatic({ root: "./" }))`, and referenced by `imageFilename` in the database. Moving to object storage changes URLs from `/uploads/uuid.jpg` to `https://bucket.s3.region.amazonaws.com/uuid.jpg`. Every existing `imageFilename` reference becomes a broken image.
-
-Both `items` and `threadCandidates` have `imageFilename` and `imageSourceUrl` fields. The MCP tool `upload_image_from_url` saves to the local filesystem. The image route `POST /api/images` saves to `./uploads/`.
+The existing `upload_image_from_url` MCP tool fetches a URL and saves it to MinIO/S3. If an agent uses this to seed manufacturer product images from brand websites, retailer pages, or Amazon listings, those images are copyright-protected. Storing and publicly serving them creates: (1) legal liability for hosting images without a license — up to $150,000 per infringement in the US; (2) storage and egress costs that grow with public traffic; (3) dependency on external URLs that 404 silently when retailers change their CDN paths.

 **Why it happens:**
-The current design stores only the filename, not the full URL. The serving path is implicit (prepend `/uploads/`). When storage moves to S3, the "prepend `/uploads/`" pattern breaks.
+"Just grab the product image from the brand website" produces accurate images immediately. It feels like fair use. It is not — attribution does not create a license, and copyright does not require a watermark or notice.

 **How to avoid:**
-1. Add a reverse proxy route: keep `/uploads/*` working but proxy to S3 instead of local filesystem. This maintains backward compatibility during transition.
-2. Or migrate `imageFilename` to store full URLs. Existing filenames get prefixed with the S3 URL during data migration.
-3. Write a migration script that uploads all `./uploads/` files to S3 and updates database references.
-4. Update `POST /api/images`, `POST /api/images/from-url`, and the MCP `upload_image_from_url` tool to write to S3.
-5. Create an image storage abstraction layer so dev can use local filesystem and production uses S3.
+Define a clear image sourcing policy before seeding begins. Safest options in order: (1) store `imageUrl` as a reference to the external source without copying to S3; (2) use manufacturer-provided press/media kit images that explicitly grant redistribution; (3) use Creative Commons–licensed images from Wikimedia Commons or similar. Document which sources are permitted in the seeding agent's prompt. Do not hotlink to third-party URLs either — they create external dependencies. Distinguish permitted images from unverified ones using `imageSourceType`.

 **Warning signs:**
- Broken images after deployment.
- Mixed URLs (some `/uploads/`, some `https://s3...`) in the database.
- MCP tool `upload_image_from_url` silently failing.
+- Seeding instructions tell the agent to call `upload_image_from_url` on Amazon product listing URLs
+- All catalog items have `imageFilename` values from manufacturer/retailer URLs
+- No documented image licensing policy before seeding starts

 **Phase to address:**
-Infrastructure phase. Should be done before discovery/public profiles (which serve images to many users).
+Catalog enrichment infrastructure phase — establish policy and `imageSourceType` schema before any seeding.

 ---

-### Pitfall 8: Thread Resolution Creates Items Without Proper User Scoping
+### Pitfall 8: MCP Catalog Tools Share the Seeding Agent's Personal userId

 **What goes wrong:**
-Thread resolution copies a candidate's data into a new item. In multi-user, the newly created item must inherit the thread owner's `userId`. If the resolution logic does not explicitly set `userId` on the new item, it either fails (NOT NULL constraint) or creates an orphaned item.
-
-This is a specific instance of Pitfall 1 but deserves its own callout because resolution is a multi-step transaction: update thread status, set `resolvedCandidateId`, create new item. Any step that forgets `userId` breaks the chain.
+The MCP server binds every tool invocation to the `userId` of the authenticated API key or OAuth token. When an agent uses a regular user API key to create catalog items, those items are implicitly associated with that user's account context. This creates two problems: (1) catalog items appear in the seeding user's personal collection or produce permission collisions; (2) running the seeding agent as a specific user creates a "ghost user" with thousands of catalog entries that pollutes collection analytics and owner counts.

 **Why it happens:**
-The resolution logic is tested as a unit but the test does not set a `userId` because none existed. After adding `userId`, the test still passes if using a default/NULL value. The bug only surfaces with a second user.
+There is no separation between personal collection MCP tools and catalog admin tools in the current implementation. The `userId` context flows through all tool handlers automatically.

 **How to avoid:**
-1. Make `userId` NOT NULL on all entity tables from day one.
-2. Update `resolveThread` to accept and propagate `userId`.
-3. Write a test: resolve thread as User A, verify created item belongs to User A.
+Catalog write operations must not carry a personal `userId`. Options: (1) create a separate admin-scoped API key that maps to a "system" user with no personal collection; (2) build dedicated catalog MCP tools that explicitly ignore `userId` for the globalItems table while still requiring authentication for authorization; (3) use a separate REST endpoint (`POST /api/admin/catalog-items`) with admin-only auth, bypassing the user-scoped MCP tools entirely.

 **Warning signs:**
- Items appearing in the wrong user's collection after resolution.
- Thread resolution failing with constraint violations.
+- Running the seeding agent creates items visible in someone's personal collection
+- Owner count on seeded global items starts at 1 (the seeding user's implicit ownership)
+- Catalog items appear in the seeding user's dashboard totals

 **Phase to address:**
-Multi-user data model phase.
-
---
-
-### Pitfall 9: Public Content Without Explicit Privacy Controls
-
-**What goes wrong:**
-The v2.0 plan includes "public user profiles with shared setups" and a "discovery feed." Without explicit visibility controls, the default state is ambiguous: are new setups public? Are all items in a public setup visible? Can someone discover gear a user has not chosen to share? Users expecting a private gear tracker are surprised when their collection appears in search results.
-
-**Why it happens:**
-The developer defaults to "everything public" because it is simpler to build discovery features. Privacy controls are added as an afterthought, requiring a retroactive audit of all existing data.
-
-**How to avoid:**
-1. Default to private. Every entity (setup, profile) is private unless explicitly published.
-2. Add a `visibility` column (`private` | `public`) to setups. Items are visible publicly only through public setups.
-3. User profiles are private by default. Public profile is opt-in.
-4. Public API endpoints (discovery, search) only query entities with `visibility = 'public'`.
-5. Build the visibility model in the data layer before building any discovery UI.
-
-**Warning signs:**
- No `visibility` or `isPublic` column in the schema.
- Discovery queries that do not filter by visibility.
- User complaints about unexpected data exposure.
-
-**Phase to address:**
-Multi-user data model phase (add visibility columns) and discovery phase (enforce in queries).
-
---
-
-### Pitfall 10: SQLite-Specific Patterns That Silently Break on Postgres
-
-**What goes wrong:**
-The codebase has SQLite-specific patterns that will not error but will behave differently on Postgres:
- `src/db/index.ts` runs `PRAGMA journal_mode = WAL` and `PRAGMA foreign_keys = ON` -- Postgres has no PRAGMAs. Foreign keys are always enforced. WAL is always on.
- `bun:sqlite` is used as the driver. Postgres needs `postgres` (postgres.js) or `pg` (node-postgres) as the driver.
- The existing Drizzle migrator import is `drizzle-orm/bun-sqlite/migrator`. Postgres uses `drizzle-orm/node-postgres/migrator` or `drizzle-orm/postgres-js/migrator`.
- SQLite allows inserting strings into integer columns silently. Postgres will error.
- SQLite `AUTOINCREMENT` guarantees IDs never reuse. Postgres `serial` reuses IDs after deletions if the sequence is not explicitly configured.
- The test helper's `Database(":memory:")` has no Postgres equivalent without PGlite.
-
-**Why it happens:**
-These patterns are invisible in a working SQLite app. They only surface during or after the switch, often as runtime errors in production.
-
-**How to avoid:**
-1. Remove all PRAGMA statements when switching to Postgres.
-2. Replace `bun:sqlite` driver with `postgres` (postgres.js is recommended for Bun compatibility).
-3. Update all migrator imports.
-4. Run the full test suite against Postgres to catch type strictness differences.
-5. Use `serial` or `identity` columns for auto-increment; accept that IDs may be reused after deletion (this should not matter for a web app).
-
-**Warning signs:**
- "PRAGMA" in the Postgres codebase.
- `bun:sqlite` imports anywhere in production code after migration.
- Tests passing against SQLite but failing against Postgres.
-
-**Phase to address:**
-Database migration phase.
-
---
-
-### Pitfall 11: Setup-Item Delete-All-Reinsert Pattern Causes Phantom Reads
-
-**What goes wrong:**
-The current setup item sync uses delete-all-then-re-insert: `DELETE FROM setup_items WHERE setupId = X`, then re-insert all items. In single-user SQLite this is fine. In multi-user Postgres with concurrent writes: (a) race conditions if two users modify setups simultaneously, (b) brief windows where a public setup appears empty to concurrent readers.
-
-**Why it happens:**
-The pattern was chosen for simplicity (noted in CLAUDE.md: "Simpler than diffing, atomic in transaction"). "Atomic in transaction" only holds if the transaction isolation level prevents phantom reads, which is not the default in Postgres (`READ COMMITTED`).
-
-**How to avoid:**
-1. Wrap in an explicit transaction with `SERIALIZABLE` or `REPEATABLE READ` isolation for the sync operation.
-2. Or switch to diff-based approach for public setups: compare existing vs. new list, delete removed, insert added.
-3. For private setups, the delete-reinsert pattern with a basic transaction is acceptable.
-
-**Warning signs:**
- Public setups briefly appearing empty.
- Foreign key violations in concurrent scenarios.
-
-**Phase to address:**
-Multi-user data model phase, when updating the setup service.
-
---
-
-### Pitfall 12: Existing Data Has No Owner After Multi-User Migration
-
-**What goes wrong:**
-The existing SQLite database has items, categories, threads, setups -- all without a `userId` column. When the schema adds `userId NOT NULL`, the existing data needs an owner. If the migration script does not assign existing data to the original user, the data is either lost (NOT NULL violation prevents migration) or orphaned.
-
-**Why it happens:**
-The developer writes the new schema with `userId NOT NULL`, runs `db:push`, and the migration fails because existing rows have no `userId`. The "fix" is to make `userId` nullable, which undermines the entire data isolation model.
-
-**How to avoid:**
-1. The data migration script must: (a) create the original user in the new system, (b) assign all existing data to that user's ID, (c) then apply the NOT NULL constraint.
-2. Migration order: create tables with `userId` nullable, insert data with the owner's userId, then ALTER to NOT NULL.
-3. Verify row counts match before and after migration.
-
-**Warning signs:**
- `userId` column is nullable in the final schema "because of migration."
- Existing data missing after migration.
- Migration script that only handles schema, not data.
-
-**Phase to address:**
-Database migration phase, specifically the data migration step.
+Catalog enrichment infrastructure phase — design catalog write path before building seeding tooling.

 ---

@@ -316,121 +189,116 @@ Database migration phase, specifically the data migration step.

 | Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
 |----------|-------------------|----------------|-----------------|
-| Keeping SQLite test infrastructure while developing Postgres features | Tests keep passing during migration | Two database dialects to maintain, false confidence from tests that do not match production | Never -- migrate tests alongside schema |
-| Storing both old `/uploads/` paths and new S3 URLs | Avoid data migration script | Every image-rendering component handles both URL formats forever | Only as a 1-2 week transition |
-| Using `userId` as nullable during migration | Existing data does not need backfilling | Every query must handle NULL userId, privacy bugs when userId is missing | Only during the migration transaction itself, then enforce NOT NULL |
-| Skipping RLS and relying only on app-level userId filtering | Faster to implement | Single missed WHERE clause = data leak | Never for multi-user platforms |
-| Deferring visibility controls to "after discovery ships" | Ship discovery faster | Retroactive privacy audit, potential data exposure, user trust damage | Never |
-| Keeping the local `users` table password hash after external auth | Avoid migration complexity | Dead column confuses future developers, potential security liability | Never -- remove password hash column after auth migration |
+| Single `isPublicRoute` allowlist in `__root.tsx` | Simple to reason about | Every new public route requires updating this list; lists drift | Never — use per-route `beforeLoad` guards on protected routes instead |
+| Reuse personal item MCP tools for catalog seeding | No new tools to build | Creates wrong userId semantics, no deduplication, wrong ownership | Never for bulk ops — build a dedicated catalog upsert tool |
+| `attribution: text` free-form field for image credit | Zero schema change | Cannot programmatically distinguish source types, filter, or enforce licensing policy | Only for internal admin-only catalog; never for public content |
+| Hotlink external product images without copying to S3 | Zero storage cost | Silent 404s when retailers change CDN URLs; external dependency | Only for dev/prototype with a clear plan to replace |
+| Discovery feed as multiple React Query calls per card | Familiar pattern | N+1 queries degrade at scale; visible at ~30 feed cards | Only for MVP with < 20 items and a committed optimization plan |
+| No unique constraint on `globalItems(brand, model)` | Faster initial schema | Duplicate catalog entries after every re-seed or agent retry | Never — add the constraint before any seeding |
+
+---

 ## Integration Gotchas

 | Integration | Common Mistake | Correct Approach |
 |-------------|----------------|------------------|
-| External auth provider | Removing the local `users` table entirely | Keep a local `users` table with `externalId` (from auth provider) + local fields (preferences, API keys). Foreign keys reference local `users.id`, not the external provider's UUID. |
-| External auth provider | Storing user profile data in the auth provider and querying it at runtime | Store only identity in auth provider. Sync user profile to local `users` table on login. Application queries local table only. |
-| External auth provider | Using auth provider's session tokens directly as API authentication | Auth provider handles login/logout. Application mints its own session after verifying the auth provider's token. This decouples session lifecycle from the provider. |
-| S3-compatible object storage | Using the S3 SDK directly in route handlers | Create an image storage abstraction (interface with `upload`, `getUrl`, `delete`). Swap implementations (local filesystem for dev, S3 for production) via environment config. |
-| Postgres driver | Assuming `bun:sqlite` patterns work with Postgres | Postgres uses `postgres` (postgres.js) or `pg`. Connection pooling, async queries, and error handling differ. SQLite is synchronous; Postgres is async. Service functions may need to become async. |
-| Postgres | Assuming SQLite PRAGMA behaviors exist | Postgres has no PRAGMAs. Foreign keys are always on. WAL is always on. Remove all PRAGMA code. |
-| Drizzle ORM Postgres driver | Using synchronous `.get()` and `.all()` query methods | SQLite Drizzle uses `.get()` (sync). Postgres Drizzle uses `.execute()` or `await` on queries. Every service function that calls `.get()` or `.all()` must be updated. |
+| Logto OIDC + public routes | `oidcAuthMiddleware()` throws or redirects when there is no session, breaking public routes | Use `getAuth(c)` which returns null gracefully for unauthenticated requests; only apply `oidcAuthMiddleware()` on login-gated routes |
+| MCP tools + catalog seeding | Using user-scoped tools (bound to API key owner's `userId`) to write global catalog entries | Build separate catalog admin tools or a REST endpoint that writes to `globalItems` without personal userId semantics |
+| MinIO/S3 + public catalog | Using presigned URLs (which expire) for catalog image delivery | Catalog item images need stable public paths or a CDN URL; presigned URLs are for user-private content only |
+| TanStack Router `beforeLoad` + auth check | `beforeLoad` that re-fetches auth on every navigation creates a waterfall | Read from React Query cache (already has 5-min `staleTime` in `useAuth`); `beforeLoad` should read cached auth state, not re-fetch |
+| PostgreSQL + public feed queries | Missing indexes on `is_public`, `created_at` cause full-table scans | Add composite indexes on `(is_public, created_at)` on setups table before the feed goes live |
+
+---

 ## Performance Traps

 | Trap | Symptoms | Prevention | When It Breaks |
 |------|----------|------------|----------------|
-| N+1 queries in discovery feed | Feed page takes 2+ seconds | Use joins or batch queries for setups with items and categories | 50+ setups in feed, each with 10+ items |
-| Unindexed `userId` columns | All queries slow after adding userId filtering | Add indexes on `userId` for every table. Composite indexes for `(userId, categoryId)` on items. | 1000+ items across 50+ users |
-| Full-table scans for aggregates | Dashboard slow for large collections | Current aggregates are computed via SQL on read. Add materialized views or cache for public setup totals. | 100+ items per user, or public setups viewed by 100+ visitors |
-| Image serving from app server | Server CPU/bandwidth saturated | Serve images from S3/CDN. Current `serveStatic` for uploads hits the app server for every request. | 100+ concurrent users browsing image-heavy pages |
-| Global product search without full-text index | Product search slow or inaccurate | Use Postgres full-text search (`tsvector`/`tsquery`) or `pg_trgm` trigram index. | 10,000+ products |
-| Synchronous service functions on Postgres | Request timeouts, connection pool exhaustion | SQLite Drizzle is sync. Postgres Drizzle is async. Service functions that were sync must become async. | Any usage under load |
+| Per-card queries in discovery feed | Feed loads in > 2s; each section multiplies DB time | Single JOIN query returning all feed card data with aggregates | At ~30 items in feed |
+| Auth check blocking public FCP | Blank + spinner visible on first load; LCP degraded | Render public content immediately; auth state hydrates progressively | Immediately on first deploy — visible in Lighthouse |
+| Full-table scan on `globalItems` text search | Search feels fine at 18 items; slows visibly at 500+ | Add `pg_trgm` trigram index or `tsvector` GIN index before catalog grows | At ~200 catalog items |
+| Image egress costs without CDN | MinIO egress scales with public traffic | CDN in front of public catalog images, or store external `imageUrl` references | Once catalog is publicly discoverable |
+| React Query refetching public feed on every window focus | Unnecessary server load for anonymous browsing | Set appropriate `staleTime` (5–10 min) on public catalog/feed queries | At moderate traffic |
+
+---

 ## Security Mistakes

 | Mistake | Risk | Prevention |
 |---------|------|------------|
-| No RLS, relying only on app-level userId filtering | Single missed WHERE clause exposes all user data | Enable Postgres RLS on all user-owned tables. App filtering is primary; RLS is safety net. |
-| Public setup exposes private item details | Users share a setup but private notes/pricing leak | Public setup views project only public fields (name, weight, category). Define a "public item projection" and enforce it. |
-| API keys not scoped to users after auth migration | API key created by User A operates on User B's data | API keys must associate with a userId. After validation, the key's userId scopes all operations. |
-| Auth provider misconfigured for open self-registration | Random users create accounts without approval | Configure auth provider for admin-approval or invite-only registration. Test explicitly. |
-| Image upload accepts any file type | Stored XSS via SVG uploads, executable content | Validate MIME type on upload (JPEG, PNG, WebP only). Set `Content-Type` and `Content-Disposition` headers. Strip EXIF metadata. |
-| External auth provider callback URL not validated | OAuth redirect attack | Whitelist exact callback URLs in auth provider config. Never use wildcard redirect URIs. |
+| Regular user API key authorized to write global catalog items | Any user with an API key can pollute the shared catalog | Catalog write operations require admin scope or a designated system API key; regular user keys are read-only on globalItems |
+| Public setup pages exposing private item fields | Public setup view leaks item notes, threads, or product URLs not intended for sharing | Audit `getPublicSetupWithItems` — return only explicitly public fields (name, weight, image); strip notes and thread data |
+| No rate limiting on public catalog search endpoint | `GET /api/global-items?q=...` is unauthenticated; bots can enumerate or abuse it | Add basic rate limiting middleware to unauthenticated GET endpoints before making them discoverable |
+| `imageSourceUrl` storing retailer order URLs with auth tokens in query params | Private session or order data in stored URLs | Normalize and validate `imageSourceUrl` before storage; strip query params that resemble auth or session tokens |
+
+---

 ## UX Pitfalls

 | Pitfall | User Impact | Better Approach |
 |---------|-------------|-----------------|
-| Forcing existing single user to re-register via external auth | User loses access to their own data until they figure out new login | Migration path: on first visit after upgrade, guide user to create auth provider account and automatically link to existing data. |
-| Public profiles default to showing everything | Users surprised their gear list is public | Default profile to private. Public is opt-in with clear preview of what others see. |
-| Review system with only star ratings | Ratings without context are useless for gear decisions | Structured reviews with predefined fields (durability, weight accuracy, value) per category. "Weight is 15g heavier than listed" is actionable; a 4-star rating is not. |
-| Discovery feed dominated by one hobby | Users in other hobbies see irrelevant content | Category-based feed filtering. Show content relevant to user's categories. |
-| No indication of data ownership when browsing others' setups | User tries to edit someone else's setup and gets error | Clear visual distinction between "my setup" and "someone else's setup." Read-only view with "copy to my setups" action. |
-| Settings lost during migration | User's weight unit preference, onboarding state disappear | Migrate the `settings` table data alongside everything else. Map settings to the original user. |
+| Hard login wall immediately after discovery | Anonymous users discover value, click a setup, hit a login wall — they leave | Show full public setup/item detail to anonymous users; only prompt login at the point of a write action (add to collection) |
+| Empty state on catalog search with no query | Users expect to browse; zero results on open page is confusing | Return a curated/ranked set for empty queries (popular, recently added, or featured tags) |
+| Catalog feed with no images | Text-only cards look sparse and unfinished | Ensure most catalog items have images before the feed is public; add a styled placeholder with brand initial |
+| Replacing dashboard for logged-in users | Existing users lose their familiar personal dashboard entry point | Discovery page is the anonymous entry point; authenticated users see a hybrid or a personal dashboard — do not remove the existing dashboard |
+| Agent-seeded content displayed raw without quality review | Inconsistent formatting, wrong weights, or invalid product links visible publicly | Implement `status: draft | published` on catalog items; agents create drafts, a review step publishes them |
+
+---

 ## "Looks Done But Isn't" Checklist

- [ ] **Multi-user data model:** Often missing userId on the `settings` table -- verify settings are user-scoped (weight unit preference, onboarding state).
- [ ] **Multi-user data model:** Often missing userId filter on `threadCandidates` queries that join through `threads` -- verify candidates are not directly queryable across users.
- [ ] **Multi-user data model:** Often missing userId on thread resolution -- verify `resolveThread` propagates userId to the newly created item.
- [ ] **Auth migration:** Often missing MCP server auth update -- verify MCP tools operate in context of the authenticated user, not as global admin.
- [ ] **Auth migration:** Often missing E2E test auth update -- verify E2E tests authenticate against new auth system or use API keys.
- [ ] **Auth migration:** Often missing API key userId association -- verify API keys created after migration are scoped to the creating user.
- [ ] **Database migration:** Often missing data migration script -- verify existing SQLite data is actually moved to Postgres, not just the schema.
- [ ] **Database migration:** Often missing timestamp conversion -- verify SQLite integer timestamps are correctly handled in Postgres schema.
- [ ] **Database migration:** Often missing weight precision check -- verify `real()` vs `doublePrecision()` does not lose decimal precision.
- [ ] **Database migration:** Often missing sync-to-async conversion -- verify all service functions are async after Postgres switch.
- [ ] **Image migration:** Often missing MCP tool update -- verify `upload_image_from_url` writes to S3, not local filesystem.
- [ ] **Image migration:** Often missing `imageSourceUrl` field -- verify source URL metadata is preserved during migration.
- [ ] **Public content:** Often missing visibility filtering on aggregate endpoints -- verify `/api/totals` only counts requesting user's items.
- [ ] **Reviews:** Often missing rate limiting -- verify a user cannot submit 100 reviews in a minute.
- [ ] **Discovery feed:** Often missing pagination -- verify feed does not load all public setups at once.
- [ ] **Global items:** Often missing product-vs-item distinction -- verify adding a product to global database does not add it to anyone's collection.
+- [ ] **Public route guard:** Routes `/`, `/global-items/`, `/global-items/:id`, and `/users/:id` render without redirect in a private browser window with no session cookies — verify manually before shipping
+- [ ] **Root-level component suppression:** No 401 responses in the network tab when browsing public pages as an anonymous user — `TotalsBar`, `FabMenu`, and `OnboardingWizard` must not fire auth-required queries
+- [ ] **Catalog deduplication:** Running the agent seed script twice does not increase the row count in `globalItems` — verify unique constraint exists and upsert behavior works
+- [ ] **Image attribution schema:** `globalItems` has `imageSourceType` column in the migration before any seeding starts — verify migration file exists and was applied
+- [ ] **Feed query efficiency:** Discovery feed data loads from a single JOIN query — verify using `EXPLAIN ANALYZE` or query logging, not by eyeballing response time
+- [ ] **Public setup privacy:** `getPublicSetupWithItems` response does not include item `notes`, thread data, or private product URLs — verify the response shape manually
+- [ ] **Catalog write authorization:** A regular user's API key cannot create or modify `globalItems` — verify the catalog tool/endpoint requires admin scope
+- [ ] **Image copyright policy:** Seeding instructions explicitly specify which image sources are permitted; no `upload_image_from_url` calls against brand/retailer URLs — verify in the agent prompt before any seeding run
+
+---

 ## Recovery Strategies

 | Pitfall | Recovery Cost | Recovery Steps |
 |---------|---------------|----------------|
-| Data leaked between users (missing userId filter) | HIGH | Audit all queries, add RLS immediately, notify affected users, review access logs. Reputation damage is the real cost. |
-| Broken images after storage migration | MEDIUM | Keep old uploads directory as fallback. Re-upload missing images. Update database references. |
-| Test suite broken for weeks during DB migration | MEDIUM | Pause feature work. Set up PGlite test infrastructure. Port tests one file at a time. |
-| Auth migration breaks MCP server | LOW | MCP server can fall back to API key auth (already implemented). Fix isolated to MCP auth middleware. |
-| Category unique constraint failures | LOW | Drop old unique constraint, add composite unique. Single transaction. |
-| Weight precision loss (SQLite real to Postgres real) | LOW | Alter column to `doublePrecision`. One-time verification script. |
-| Public data exposure before visibility controls | HIGH | Emergency: set all entities to private, deploy, then build visibility controls properly. Cannot undo exposure. |
-| Existing data orphaned after migration | MEDIUM | Re-run data migration script with correct userId assignment. Verify row counts. |
-| Service functions still sync after Postgres switch | MEDIUM | Systematic conversion of all service functions to async. Update all callers. TypeScript will catch most issues. |
+| Login redirect blocking public routes | LOW | Update `isPublicRoute` allowlist in `__root.tsx` and add server-side guard bypasses; redeploy; verify in incognito |
+| Duplicate catalog items from agent seeding | MEDIUM | Write a deduplication migration to merge duplicates keeping owner links; add unique constraint post-merge; re-run seed in upsert mode |
+| Copyrighted images stored in S3 | HIGH | Identify affected items via `imageSourceType`; delete S3 objects; replace with permitted images or null `imageFilename`; legal review |
+| N+1 feed queries causing degraded response times | MEDIUM | Write optimized JOIN query; API response shape may change requiring frontend update; deploy together |
+| Auth-scoped queries firing for anonymous users | LOW | Add `enabled: isAuthenticated` to each affected query; guard root-level components with auth check |
+| Catalog items created with seeding user's userId | MEDIUM | Migration to null out `userId` on globalItems created during seeding; update catalog write path to not accept userId |
+
+---

 ## Pitfall-to-Phase Mapping

 | Pitfall | Prevention Phase | Verification |
 |---------|------------------|--------------|
-| Missing userId filters (P1) | Multi-user data model | Integration tests: create as User A, query as User B, assert empty. RLS policies active. |
-| Category uniqueness (P2) | Multi-user data model | Two users create identically-named categories without constraint violations. |
-| Drizzle schema rewrite (P3) | Database migration | Schema compiles with pg-core. drizzle-kit generates valid Postgres migrations. Weight values maintain precision. |
-| Test infrastructure collapse (P4) | Database migration | `bun test` passes with PGlite. E2E tests pass against Postgres. No SQLite imports in test code. |
-| Auth provider breaks sessions/keys (P5) | Auth migration | Existing API keys work. MCP server authenticates. E2E tests pass. First-time setup works via external provider. |
-| Global item data model fork (P6) | Global item database | Separate `products` table exists. User items optionally reference a product. CRUD operations distinct. |
-| Image URL breakage (P7) | Infrastructure / Image storage | Existing images render. New uploads go to S3. MCP upload tool works. |
-| Thread resolution userId (P8) | Multi-user data model | Resolving a thread creates an item owned by the thread's owner. Tested with multiple users. |
-| Privacy/visibility (P9) | Multi-user data model + Discovery | Default is private. Public queries filter by visibility. No private data in discovery feed. |
-| SQLite-specific patterns (P10) | Database migration | No PRAGMAs in codebase. No bun:sqlite imports. All queries async. |
-| Setup sync race conditions (P11) | Multi-user data model | Concurrent setup modifications do not produce empty setups or constraint violations. |
-| Existing data ownership (P12) | Database migration | All existing data assigned to original user. Row counts match. userId NOT NULL enforced. |
+| Frontend auth guard blocks public routes (P1) | Public access auth model | Load `/global-items/` and `/` in private window — no redirect |
+| `useAuth()` spinner blocks public FCP (P2) | Public access auth model | Lighthouse FCP on landing page with cold cache — no full-screen spinner |
+| Root-level components 401 for anonymous users (P3) | Public access auth model | Zero 401 responses in network tab on public pages |
+| Discovery feed N+1 queries (P4) | Discovery landing page | `EXPLAIN ANALYZE` on feed endpoint confirms single query, no per-row loops |
+| Image attribution stored as free text (P5) | Catalog enrichment infrastructure | Schema review — `imageSourceType` column exists on `globalItems` before seeding |
+| Agent seeding creates duplicates (P6) | Catalog enrichment infrastructure | Run seed script twice — row count unchanged on second run |
+| Copyrighted images in S3 (P7) | Catalog enrichment infrastructure | Seeding instructions reviewed — no calls to `upload_image_from_url` on brand URLs |
+| Agent catalog tools carry personal userId (P8) | Catalog enrichment infrastructure | Seeded items have null userId or system userId; not in any user's collection |
+
+---

 ## Sources

- Direct codebase analysis of GearBox v1.4 (schema.ts, services, auth middleware, MCP server, test helpers, db/index.ts, E2E seed)
- [Drizzle ORM PostgreSQL documentation](https://orm.drizzle.team/docs/get-started/postgresql-new)
- [Drizzle ORM SQLite column types](https://orm.drizzle.team/docs/column-types/sqlite)
- [Drizzle ORM migrations documentation](https://orm.drizzle.team/docs/migrations)
- [SQLite to PostgreSQL migration pitfalls (Open WebUI discussion)](https://github.com/open-webui/open-webui/discussions/21609)
- [How to migrate from SQLite to PostgreSQL (Render)](https://render.com/articles/how-to-migrate-from-sqlite-to-postgresql)
- [Multi-tenant architecture guide (WorkOS)](https://workos.com/blog/developers-guide-saas-multi-tenant-architecture)
- [Multi-tenant vs single-tenant SaaS (Clerk)](https://clerk.com/blog/multi-tenant-vs-single-tenant)
- [Migrating file storage to Amazon S3 (DZone)](https://dzone.com/articles/migrating-file-storage-to-amazon-s3)
- [Drizzle ORM PostgreSQL best practices 2025 (GitHub Gist)](https://gist.github.com/productdevbook/7c9ce3bbeb96b3fabc3c7c2aa2abc717)
+- GearBox codebase: `src/client/routes/__root.tsx` — root auth guard and `isPublicRoute` allowlist (direct inspection)
+- GearBox codebase: `src/server/index.ts` — server-side public route bypass patterns (direct inspection)
+- GearBox codebase: `src/db/schema.ts` — `globalItems` table confirming no unique constraint on brand/model, no `imageSourceType` (direct inspection)
+- GearBox codebase: `src/server/mcp/index.ts` — MCP userId binding per API key (direct inspection)
+- [TanStack Router: Auth performance issue with recommended patterns (GitHub #3997)](https://github.com/TanStack/router/issues/3997)
+- [TanStack Router: Authenticated Routes documentation](https://tanstack.com/router/v1/docs/guide/authenticated-routes)
+- [Practical Ecommerce: Online Retailer's Guide to Photo Copyrights](https://www.practicalecommerce.com/Online-Retailers-Guide-to-Photo-Copyrights)
+- [MCP Idempotency: Best Practices 2025 (BytePlus)](https://www.byteplus.com/en/topic/542207)
+- [Six Fatal Flaws of MCP (Scalifiai, 2025)](https://www.scalifiai.com/blog/model-context-protocol-flaws-2025)
+- [Hostwinds: Hotlinking Pitfalls and How to Protect Yourself](https://www.hostwinds.com/blog/hotlinking-pitfalls-and-how-to-protect-yourself)

 ---
-*Pitfalls research for: GearBox v2.0 -- Single-user to multi-user platform migration*
-*Researched: 2026-04-03*
+*Pitfalls research for: GearBox v2.1 — Public-first discovery platform with catalog enrichment*
+*Researched: 2026-04-09*
--- a/.planning/research/STACK.md
+++ b/.planning/research/STACK.md
@@ -1,260 +1,333 @@
 # Stack Research

-**Domain:** Multi-user gear management platform (v2.0 platform additions)
-**Researched:** 2026-04-03
-**Confidence:** MEDIUM-HIGH
+**Domain:** Public-first gear discovery platform — catalog enrichment, discovery feed, agent-powered seeding (v2.1)
+**Researched:** 2026-04-09
+**Confidence:** HIGH (existing stack verified against package.json; additions verified against npm/official docs)

-This document covers ONLY the new stack additions for v2.0. The existing stack (React 19, Hono, Drizzle ORM, TanStack Router/Query, Tailwind CSS v4, Lucide React, Recharts, framer-motion, Zustand, Zod, Bun) is validated and unchanged.
+---

-## Recommended Stack
+## Context: What Already Exists (Do Not Re-Research)

-### Authentication -- Logto (Self-Hosted)
+The following are validated and in production at v2.0. This file covers ADDITIONS AND CHANGES only.

-| Technology | Version | Purpose | Why Recommended |
-|------------|---------|---------|-----------------|
-| Logto OSS | v1.36+ | External OIDC/OAuth 2.1 auth provider | TypeScript-native, purpose-built for app auth (not enterprise IAM), requires Postgres (shared infra), beautiful pre-built sign-in UI, React SDK with hooks, lightweight JWT validation on backend. MIT-licensed core. |
-| @logto/react | ^4.0.13 | React SDK for auth flows | LogtoProvider wraps app, provides useLogto() hook for sign-in/sign-out/token access. Handles OIDC redirect flow, token refresh, and user info. |
-| jose | ^6.2.2 | JWT validation on Hono backend | Zero-dependency, Bun-compatible, used to verify Logto-issued access tokens via JWKS. Recommended by Logto docs over heavier alternatives. |
+| Layer | Current |
+|-------|---------|
+| Runtime | Bun |
+| Frontend | React 19, TanStack Router/Query v5, Tailwind CSS v4, Zustand, Zod 4.x, framer-motion, Recharts, Lucide React |
+| Backend | Hono 4.12.x, Drizzle ORM 0.45.x, PostgreSQL (postgres.js 3.4.x driver) |
+| Auth | @hono/oidc-auth 1.8.x (Logto), API key auth, MCP OAuth 2.1 |
+| Storage | @aws-sdk/client-s3 3.x (MinIO) |
+| MCP | @modelcontextprotocol/sdk 1.29.x (19 tools) |
+| Rate limiting | Custom in-process Map (auth endpoints only, 5 req/15 min per IP) |

-**Why Logto over alternatives:**
+---

-| Provider | Why Not |
-|----------|---------|
-| Authentik | Python-based, heavyweight (designed for enterprise proxy/SSO), overkill for app-level auth. No React SDK -- requires raw OIDC integration. Better for infra-level SSO (Portainer, Grafana). |
-| Zitadel | Go-based, Kubernetes-first architecture, AGPL 3.0 license (copyleft since 2025). Stronger for multi-tenant B2B SaaS. Over-engineered for a single-product platform. |
-| SuperTokens | Session-based by default (not OIDC), requires embedding their middleware into your backend. Tighter coupling than external provider model. |
-| Keycloak | Java-based, heavy memory footprint (1-2GB RAM), complex admin UI. Industry standard but vastly over-scoped for this use case. |
+## New Capability Areas

-**Integration pattern:** Logto runs as a separate Docker container alongside Postgres. React app redirects to Logto's hosted sign-in page for auth flows. Hono backend validates JWT access tokens from the Authorization header using `jose` JWKS verification -- no Logto SDK needed on the backend, just standard OIDC token validation. User identity is the Logto `sub` claim (a stable string ID), stored as `userId` on all user-owned records.
+### 1. Public Access Auth Model

-**Backend middleware pattern (Hono):**
+**What's needed:** The `requireAuth` middleware in `src/server/middleware/auth.ts` already handles three auth paths (API key, OAuth Bearer, OIDC session). The skip-list pattern in `src/server/index.ts` already exempts public GETs on `/api/global-items`, `/api/tags`, `/api/users/:id/profile`, and `/api/setups/:id/public`.
+
+**This milestone extends the skip-list** to cover new discovery endpoints (`/api/discovery/*`). Additionally, a new `tryAuth` middleware variant is needed for endpoints that work for both anonymous and authenticated users — it resolves `userId` if credentials are present but does NOT 401 on absence. This enables auth-aware responses (e.g., annotating feed items with "in your collection" for logged-in users).
+
+**No new dependency.** Pure middleware logic — add `tryAuth` to `auth.ts`, update skip-list in `index.ts`.
+
+---
+
+### 2. Discovery Feed (Popular Setups, Trending Items)
+
+The feed requires: ranked/scored queries, cursor-based pagination, and cheap repeated reads by anonymous users.
+
+#### Trending Score
+
+Use a hot-score computed in PostgreSQL SQL — no external search engine or materialized view needed at this scale.
+
+```sql
+-- Hacker News-style decay: engagement / time^gravity
+SELECT id, brand, model,
+  (owner_count::float / power((extract(epoch from now()) - extract(epoch from created_at)) / 3600.0 + 2, 1.8)) AS hot_score
+FROM global_items
+ORDER BY hot_score DESC
+LIMIT 20;
+```
+
+This requires `ownerCount` as a real column (not a JOIN-time COUNT) on `globalItems`. The column already logically exists via join — promote it to a denormalized integer that the collection add/remove service path updates. No trigger needed; update it in the same database transaction as the collection operation.
+
+**No new dependency.** Schema migration + service-layer update.
+
+#### Cursor-Based Pagination
+
+Drizzle ORM 0.45.x has documented cursor pagination support (two-column keyset). Use `(hotScore DESC, id DESC)` for the trending feed and `(createdAt DESC, id DESC)` for "recently added." Encode cursor as base64 JSON — opaque to the client.
+
+The Hono + Drizzle cursor pattern is documented and actively used in the ecosystem. No pagination library needed.
+
+**No new dependency.** Drizzle already supports this natively.
+
+#### Full-Text Catalog Search
+
+`globalItems` needs fast free-text search across `brand + model + description`. Use PostgreSQL native `tsvector` with a GIN index.
+
+Drizzle 0.45.x does not generate `GENERATED ALWAYS AS ... STORED` syntax for tsvector columns in drizzle-kit. Add the `searchVector` column and GIN index via a raw SQL migration file (create via `drizzle-kit generate` then manually add the ALTER TABLE and CREATE INDEX statements to the generated file).
+
+For the Hono route, use Drizzle's `sql` template tag with `to_tsquery`:

 ```typescript
-import { createRemoteJWKSet, jwtVerify } from "jose";
+.where(sql`search_vector @@ plainto_tsquery('english', ${q})`)
+.orderBy(sql`ts_rank(search_vector, plainto_tsquery('english', ${q})) DESC`)
+```

-const jwks = createRemoteJWKSet(
-  new URL("https://logto.example.com/oidc/jwks")
-);
+**No new dependency.** Schema migration + raw SQL in service layer.

-const authMiddleware = createMiddleware(async (c, next) => {
-  const token = c.req.header("Authorization")?.replace("Bearer ", "");
-  if (!token) return c.json({ error: "Unauthorized" }, 401);
+#### Feed Client (TanStack Query + IntersectionObserver)

-  const { payload } = await jwtVerify(token, jwks, {
-    issuer: "https://logto.example.com/oidc",
-    audience: "your-api-resource-indicator",
+`useInfiniteQuery` from `@tanstack/react-query` (already at 5.90.x) handles cursor pagination natively via `getNextPageParam`. The scroll trigger uses the browser-native IntersectionObserver API — implement a `useIntersectionObserver(ref, callback)` hook (~12 lines) rather than adding a scroll library. This matches the existing GearBox pattern of minimal third-party UI dependencies.
+
+**No new dependency.**
+
+---
+
+### 3. Catalog Enrichment Infrastructure
+
+#### Schema Additions to `globalItems`
+
+New fields for attribution, source tracking, and feed ranking:
+
+| Field | Type | Purpose |
+|-------|------|---------|
+| `sourceUrl` | `text` | Canonical product page (retailer or manufacturer) |
+| `sourceAttribution` | `text` | Human-readable credit ("via REI", "via manufacturer") |
+| `imageAttributionUrl` | `text` | URL where product image was originally sourced |
+| `imageAttributionText` | `text` | License or credit line for the image |
+| `submittedByUserId` | `integer FK → users` | Who submitted this catalog entry (null = seeded by admin/agent) |
+| `verifiedAt` | `timestamp` | When an admin approved the entry (null = unverified) |
+| `ownerCount` | `integer NOT NULL DEFAULT 0` | Denormalized count of collection items referencing this |
+| `productUrl` | `text` | Retailer/manufacturer product link (duplicates item-level, but catalog-owned) |
+
+These are Drizzle schema additions. **No new dependency.**
+
+#### Zod Schemas for Enriched Catalog
+
+Add `CreateCatalogItemSchema` in `src/shared/schemas.ts` with attribution fields. Zod 4.3.x handles this natively. The schema feeds the new `POST /api/global-items` route (currently only GET is public — writes will require auth but open to non-admins for catalog submissions).
+
+---
+
+### 4. Agent-Powered Catalog Seeding via MCP
+
+The existing MCP server (`@modelcontextprotocol/sdk` 1.29.x, 19 tools) already provides the infrastructure. The agent workflow:
+
+1. Claude agent receives a category or brand as a prompt
+2. Uses a new `create_catalog_item` MCP tool — purpose-built for `globalItems` insertion with full attribution fields
+3. Server validates via Zod, inserts into `globalItems`, updates `ownerCount` denormalization
+4. Agent uses the existing `upload_image_from_url` tool to fetch and store product images
+
+The new tool registers identically to existing tools in `src/server/mcp/index.ts`. Batch seeding sessions: the agent runs N `create_catalog_item` calls in sequence within one MCP session — no parallel execution framework needed at catalog bootstrap scale.
+
+For standalone seed scripts (`bun run src/db/dev-seed.ts` extensions), use the Drizzle db instance directly. No external seeding framework.
+
+**No new dependency.**
+
+---
+
+### 5. HTTP Caching for Public Endpoints
+
+Public GET endpoints (discovery feed, catalog detail pages) will be hit by anonymous users repeatedly. Add HTTP-level cache hints to reduce DB round-trips.
+
+- **Catalog item detail pages** (`GET /api/global-items/:id`): Use Hono's built-in `etag()` middleware. Content-addressed — returns 304 Not Modified when item hasn't changed.
+- **Discovery feed endpoints** (`GET /api/discovery/*`): Set `Cache-Control: public, max-age=60, stale-while-revalidate=300` manually in route handlers. Feed data tolerates 60s staleness.
+
+**Do NOT use Hono's `cache()` middleware** — it is platform-specific to Cloudflare Workers and Deno, and silently does nothing on Bun. This is a documented limitation. Known issue #4401 in the Hono repo also shows the `etag()` middleware can generate inconsistent ETags when combining with other middleware — test in integration tests before shipping.
+
+**No new dependency.** `etag` is built into Hono 4.12.x.
+
+---
+
+### 6. Rate Limiting for Public Traffic
+
+The existing `rateLimit.ts` in-process Map handles auth endpoints correctly (5 req/15 min per IP). It is inappropriate for public discovery traffic because:
+
+- 5 req/15 min is far too strict for anonymous browsing
+- In-process state resets on server restart (tolerable for auth, wrong for general rate limiting)
+- No way to differentiate authenticated vs anonymous callers in the current implementation
+
+**Recommendation:** Keep the existing `rateLimit.ts` for auth endpoints only. Add `hono-rate-limiter` for discovery/catalog public endpoints with a permissive anonymous limit (e.g., 100 req/min per IP) and no limit for authenticated callers.
+
+```typescript
+import { rateLimiter } from "hono-rate-limiter";
+
+const discoveryLimiter = rateLimiter({
+  windowMs: 60 * 1000,  // 1 minute
+  limit: 100,
+  keyGenerator: (c) => c.req.header("x-forwarded-for")?.split(",")[0] ?? "unknown",
 });

-  c.set("userId", payload.sub);
-  await next();
-});
+app.use("/api/discovery/*", discoveryLimiter);
 ```

-**React provider pattern:**
+The in-process storage adapter (default in `hono-rate-limiter`) is sufficient for single-instance deployment. If the app scales horizontally, swap to `@hono-rate-limiter/redis` — but that is a future decision, not a v2.1 concern.

-```typescript
-import { LogtoProvider, LogtoConfig } from "@logto/react";
+**New dependency:**

-const config: LogtoConfig = {
-  endpoint: "https://logto.example.com",
-  appId: "<your-app-id>",
-  resources: ["https://api.gearbox.example.com"],
-};
-
-// Wrap app root
-<LogtoProvider config={config}>
-  <App />
-</LogtoProvider>
-```
-
-### Database -- PostgreSQL via Bun Native Driver
-
-| Technology | Version | Purpose | Why Recommended |
-|------------|---------|---------|-----------------|
-| PostgreSQL | 16+ | Primary database | Required by Logto anyway, proper concurrent access for multi-user, JSONB for flexible spec fields, full-text search for discovery feed. |
-| drizzle-orm | ^0.45.1 (existing) | Type-safe ORM | Already in use. Switch from `drizzle-orm/bun-sqlite` to `drizzle-orm/bun-sql` for Postgres. Schema definitions move from `sqlite-core` to `pg-core`. |
-| Bun native SQL | built-in | Postgres driver | Zero additional dependencies. `import { SQL } from "bun"` provides native Postgres bindings. Drizzle ORM supports it via `drizzle-orm/bun-sql`. |
-| postgres (postgres.js) | ^3.4.8 | Fallback Postgres driver | Only needed if Bun native SQL has issues with drizzle-kit CLI tooling (known issue #4122). More mature ecosystem, proven with Drizzle. Install as dev dependency for drizzle-kit. |
-
-**Schema migration approach:**
-
-1. Rewrite `src/db/schema.ts` imports from `drizzle-orm/sqlite-core` to `drizzle-orm/pg-core`
-2. Replace `sqliteTable` with `pgTable`
-3. Replace `integer().primaryKey({ autoIncrement: true })` with `integer().primaryKey().generatedAlwaysAsIdentity()` for PKs
-4. Replace `integer("created_at", { mode: "timestamp" })` with `timestamp("created_at").defaultNow().notNull()`
-5. Add `userId text("user_id").notNull()` to all user-owned tables (items, threads, setups, categories)
-6. Add `visibility text("visibility").notNull().default("private")` to setups and profiles
-7. Generate fresh Postgres migration with `drizzle-kit generate`
-8. Write a one-time data migration script (SQLite read -> Postgres insert) for existing data
-
-**drizzle.config.ts change:**
-
-```typescript
-// Before
-{ dialect: "sqlite", dbCredentials: { url: "./gearbox.db" } }
-
-// After
-{ dialect: "postgresql", dbCredentials: { url: process.env.DATABASE_URL } }
-```
-
-**Known issue:** drizzle-kit CLI does not use the Bun SQL driver for `push`/`generate` commands (GitHub issue #4122). Workaround: install `postgres` (postgres.js) as a dev dependency for drizzle-kit, while the app runtime uses Bun native SQL.
-
-### Image Storage -- Bun Native S3 + MinIO
-
-| Technology | Version | Purpose | Why Recommended |
-|------------|---------|---------|-----------------|
-| Bun S3Client | built-in | S3 API client | Zero dependencies, native Bun bindings, extends Blob interface. Supports presigned URLs, streaming uploads. Built-in MinIO compatibility. |
-| MinIO | latest | Self-hosted S3-compatible object storage | Replaces local `./uploads/` directory. Single Go binary, Docker-friendly, S3 API compatible. Handles multi-user image scaling without cloud vendor lock-in. |
-
-**Why Bun native S3 over @aws-sdk/client-s3:**
-
- Zero additional dependencies (Bun ships with it)
- Simpler API (extends Blob, web-standard patterns)
- Native performance bindings
- Full MinIO compatibility documented by Bun team
-
-**Migration from ./uploads/:**
-
-1. Deploy MinIO container alongside app
-2. Create `gearbox-images` bucket
-3. Write migration script to upload existing files from `./uploads/` to MinIO
-4. Update image service to use S3Client for reads/writes
-5. Serve images via presigned URLs or a proxy route on Hono
-
-**Configuration:**
-
-```typescript
-import { S3Client } from "bun";
-
-const storage = new S3Client({
-  accessKeyId: process.env.S3_ACCESS_KEY!,
-  secretAccessKey: process.env.S3_SECRET_KEY!,
-  bucket: "gearbox-images",
-  endpoint: process.env.S3_ENDPOINT!, // e.g., http://minio:9000
-});
-```
-
-### Supporting Libraries
-
-| Library | Version | Purpose | When to Use |
-|---------|---------|---------|-------------|
-| jose | ^6.2.2 | JWKS-based JWT verification | Every authenticated API request -- validate Logto access tokens on Hono middleware |
-| @logto/react | ^4.0.13 | React auth provider + hooks | Wrap app root, sign-in/sign-out flows, access token retrieval for API calls |
-
-### Development / Infrastructure
-
-| Tool | Purpose | Notes |
-|------|---------|-------|
-| Docker Compose | Local dev environment | Postgres + Logto + MinIO containers. App still runs on bare Bun for HMR. |
-| drizzle-kit | Schema management | Same tool, different dialect config. `bun run db:generate` and `bun run db:push` still work. |
-
-## Installation
+| Library | Version | Purpose |
+|---------|---------|---------|
+| `hono-rate-limiter` | `^0.5.3` | Per-route rate limiting with configurable windows for public endpoints |

 ```bash
-# New production dependencies
-bun add @logto/react jose
-
-# New dev dependencies (for drizzle-kit Postgres support)
-bun add -D postgres
-
-# No install needed for:
-# - Bun native S3 (built-in)
-# - Bun native SQL/Postgres (built-in)
-# - drizzle-orm (already installed, just change imports)
+bun add hono-rate-limiter
 ```

+---
+
+## Full Stack Additions Summary
+
+### New Dependencies (v2.1 only)
+
+| Library | Version | Purpose | Why |
+|---------|---------|---------|-----|
+| `hono-rate-limiter` | `^0.5.3` | Configurable rate limits for public discovery routes | Existing in-process limiter is auth-only with a 5-req cap; public browse traffic needs separate, permissive limits |
+
+### No New Dependencies Needed For
+
+| Capability | Existing Stack Component Used |
+|------------|------------------------------|
+| Public auth model (`tryAuth` variant) | Hono middleware — no library |
+| Discovery feed cursor pagination | Drizzle 0.45.x cursor pagination docs |
+| Full-text catalog search (tsvector GIN) | PostgreSQL native + Drizzle `sql` template |
+| Trending score computation | PostgreSQL SQL expression — no extension |
+| Infinite scroll client | TanStack Query `useInfiniteQuery` + native IntersectionObserver |
+| Catalog attribution fields | Drizzle schema migration |
+| Agent catalog seeding | Existing MCP SDK + new `create_catalog_item` tool |
+| HTTP cache headers | Hono built-in `etag()` + manual `Cache-Control` |
+| Feed ranking denormalization | Service-layer transaction update (no trigger, no cron) |
+
+---
+
+## Schema Changes Required (Not Library Changes)
+
+These are Drizzle schema additions generating migrations:
+
+### `globalItems` additions
+
+```typescript
+// In src/db/schema.ts — globalItems table additions
+sourceUrl: text("source_url"),
+sourceAttribution: text("source_attribution"),
+imageAttributionUrl: text("image_attribution_url"),
+imageAttributionText: text("image_attribution_text"),
+submittedByUserId: integer("submitted_by_user_id").references(() => users.id),
+verifiedAt: timestamp("verified_at"),
+ownerCount: integer("owner_count").notNull().default(0),
+productUrl: text("product_url"),
+```
+
+### Raw SQL migration additions (cannot be expressed in Drizzle schema)
+
+```sql
+-- Add after Drizzle-generated migration runs:
+
+-- Generated tsvector column for full-text search
+ALTER TABLE global_items
+  ADD COLUMN search_vector tsvector
+  GENERATED ALWAYS AS (
+    to_tsvector('english',
+      coalesce(brand, '') || ' ' ||
+      coalesce(model, '') || ' ' ||
+      coalesce(description, '')
+    )
+  ) STORED;
+
+CREATE INDEX global_items_search_vector_idx ON global_items USING GIN(search_vector);
+
+-- Partial index for public setup discovery feed
+CREATE INDEX setups_public_updated_idx ON setups (updated_at DESC) WHERE is_public = true;
+
+-- Trending feed index
+CREATE INDEX global_items_owner_count_id_idx ON global_items (owner_count DESC, id DESC);
+```
+
+> **Note:** Drizzle Kit does not generate `GENERATED ALWAYS AS ... STORED` for tsvector. Add these as a separate raw SQL file appended to the Drizzle migration or as a separate `customMigration` file in the migrations folder. Run via `bun run db:push` after the Drizzle migration applies.
+
+### `setups` additions
+
+```typescript
+// In src/db/schema.ts — setups table additions
+viewCount: integer("view_count").notNull().default(0),
+```
+
+---
+
 ## Alternatives Considered

-### Authentication Provider
+| Recommended | Alternative | Why Not |
+|-------------|-------------|---------|
+| PostgreSQL tsvector + GIN | Meilisearch / Typesense | Separate search service adds infra ops complexity; tsvector covers structured gear catalog search at GearBox scale without additional containers |
+| PostgreSQL tsvector + GIN | pg_textsearch (BM25 extension) | Requires installing a PostgreSQL extension in production; BM25 ranking is unnecessary for a catalog of branded products where exact brand/model matches dominate |
+| Denormalized `ownerCount` column | COUNT JOIN per feed request | Feed queries fire on every anonymous page load; a JOIN COUNT becomes a bottleneck before any other part of the stack does |
+| Native IntersectionObserver hook | react-infinite-scroll-component | Zero-dependency — 12-line hook replaces an 8KB library; consistent with GearBox's minimal-external-dependency UI philosophy |
+| Manual `Cache-Control` headers | Hono `cache()` middleware | Hono `cache()` is Cloudflare Workers/Deno only — silently does nothing on Bun |
+| `hono-rate-limiter` in-process | Redis-backed rate limiter | Single-instance deployment — Redis adds an infra dependency not justified at current scale |
+| Extend existing MCP toolset | Separate seeding CLI script | MCP agents already have auth and structured tool calling; a dedicated `create_catalog_item` tool is cleaner than a one-off script that bypasses the service layer |
+| Service-layer `ownerCount` update | PostgreSQL trigger | Triggers are invisible to the TypeScript codebase, harder to test, and prone to silent failures in complex transactions |

-| Recommended | Alternative | When to Use Alternative |
-|-------------|-------------|-------------------------|
-| Logto | Authentik | If you need proxy-mode SSO for non-OIDC apps (Portainer, legacy tools) |
-| Logto | Zitadel | If building multi-tenant B2B SaaS with organization-level isolation |
-| Logto | Keycloak | If enterprise LDAP/AD integration is mandatory |
-
-### Database Driver
-
-| Recommended | Alternative | When to Use Alternative |
-|-------------|-------------|-------------------------|
-| Bun native SQL (`bun:sql`) | postgres.js | If Bun native SQL has concurrency bugs (known issue in Bun 1.2.0 with concurrent statements) |
-| Bun native SQL (`bun:sql`) | @neondatabase/serverless | If deploying to serverless/edge where persistent connections are not possible |
-
-### Image Storage
-
-| Recommended | Alternative | When to Use Alternative |
-|-------------|-------------|-------------------------|
-| MinIO (self-hosted) | Cloudflare R2 | If you want zero-ops storage with no egress fees and don't mind cloud dependency |
-| MinIO (self-hosted) | Local filesystem (current) | For development/testing only. Not viable for multi-user at scale. |
+---

 ## What NOT to Add

 | Avoid | Why | Use Instead |
 |-------|-----|-------------|
-| @aws-sdk/client-s3 | 60+ transitive dependencies, Bun has native S3 support | Bun built-in S3Client |
-| passport.js / express-session | Wrong paradigm -- we want external OIDC, not embedded session auth | Logto + jose JWT validation |
-| next-auth / auth.js | Designed for Next.js, assumes framework integration we don't have | Logto (external provider) |
-| better-auth | Embedded auth library, opposite of external provider model | Logto (external provider) |
-| pg (node-postgres) | Callback-based API, Bun has native Postgres bindings | Bun native SQL or postgres.js |
-| sharp / image processing libs | Premature optimization -- serve originals first, add resizing later if needed | Direct S3 storage of originals |
-| Redis | Not needed at this scale. Postgres handles sessions (via Logto), caching is premature | Postgres for everything |
-| Prisma | Already using Drizzle ORM, no reason to add a second ORM | drizzle-orm (existing) |
-| nanoid / cuid2 | Postgres `gen_random_uuid()` is built-in for public-facing IDs if needed | Postgres native UUID generation |
-| TypeORM / Sequelize | Legacy ORMs with worse TypeScript support than Drizzle | drizzle-orm (existing) |
+| Elasticsearch / OpenSearch | Separate cluster, ops overhead, overkill for a structured product catalog | PostgreSQL tsvector with GIN index |
+| pg_textsearch / VectorChord-BM25 | PostgreSQL extension install required in prod; BM25 precision unnecessary for brand+model search | PostgreSQL native `ts_rank` |
+| Hono `cache()` middleware | Platform-specific to Cloudflare/Deno; does nothing on Bun | Manual `Cache-Control` headers in route handlers |
+| react-virtual / windowing | Feed is paginated, not a virtual list; items per page (~20) never hit DOM performance limits | Standard DOM list with cursor pagination |
+| Prisma | Already using Drizzle ORM; two ORMs in one codebase is a maintenance trap | drizzle-orm (existing) |
+| Materialized views for feed caching | drizzle-kit does not fully support materialized view migrations; manual REFRESH logic is brittle | Denormalized score columns + partial indexes |
+| Separate seeding framework (Faker, etc.) | Catalog data is real product data, not fake; agent seeding produces real structured records | MCP `create_catalog_item` tool |

-## Infrastructure Architecture
-
-```
-Docker Compose (dev) / Docker (prod)
-+-- gearbox-app        (Bun, port 3000)
-+-- gearbox-postgres   (PostgreSQL 16, port 5432)
-|   +-- gearbox DB     (app data)
-|   +-- logto DB       (Logto data, separate database same instance)
-+-- gearbox-logto      (Logto OSS, port 3001 app / 3002 admin)
-+-- gearbox-minio      (MinIO, port 9000 API / 9001 console)
-```
-
-Logto and the app share a single Postgres instance (different databases). This keeps infrastructure simple -- one Postgres to back up, one to monitor. Logto requires PostgreSQL 14+; using 16 covers both.
+---

 ## Version Compatibility

-| Package | Compatible With | Notes |
-|---------|-----------------|-------|
-| drizzle-orm@0.45.x | Bun native SQL | Supported via `drizzle-orm/bun-sql` driver |
-| drizzle-orm@0.45.x | postgres.js@3.4.x | Supported via `drizzle-orm/postgres-js` driver (fallback) |
-| drizzle-kit@0.31.x | PostgreSQL 16 | Generates Postgres-dialect migrations |
-| @logto/react@4.x | React 19 | Uses React context/hooks, compatible |
-| jose@6.x | Bun runtime | Explicitly lists Bun support in docs |
-| Logto OSS v1.36 | PostgreSQL 14+ | Logto requires PG 14 minimum; use PG 16 for both app and Logto |
-| Bun S3Client | MinIO latest | Documented compatibility with endpoint configuration |
+| Package | Current Version | v2.1 Notes |
+|---------|----------------|------------|
+| `hono` | 4.12.x (4.12.12 latest) | `etag()` built-in available; `cache()` is NOT compatible with Bun — do not use |
+| `drizzle-orm` | 0.45.x (0.45.2 latest stable) | Cursor pagination confirmed; generated tsvector column requires raw SQL migration appended to drizzle-kit output |
+| `@tanstack/react-query` | 5.90.x | `useInfiniteQuery` with `getNextPageParam` fully supports cursor pattern natively |
+| `hono-rate-limiter` | 0.5.3 (latest, published ~16 days ago) | In-process storage adapter works on Bun; actively maintained |
+| `@modelcontextprotocol/sdk` | 1.29.x | Existing MCP tooling is sufficient for adding `create_catalog_item` |
+| `zod` | 4.3.x | New catalog attribution schemas are straightforward additions to existing `schemas.ts` |
+| `@hono/zod-validator` | 0.7.x | Already used for all routes; covers new discovery/catalog endpoints |

-## Migration Checklist (SQLite to Postgres)
+---

-1. **Schema rewrite**: `sqlite-core` -> `pg-core` imports, adjust column types
-2. **Driver swap**: `drizzle-orm/bun-sqlite` -> `drizzle-orm/bun-sql`
-3. **Config update**: `drizzle.config.ts` dialect and credentials
-4. **Fresh migrations**: Generate from scratch for Postgres (do not try to convert SQLite migrations)
-5. **Data migration**: One-time script reads SQLite, writes to Postgres
-6. **Test infrastructure**: Update `createTestDb()` helper to use Postgres test database (or pg-mem for in-memory testing)
-7. **CI pipeline**: Add Postgres service container for test runs
-8. **Remove SQLite deps**: Remove `better-sqlite3` from devDependencies after migration confirmed
+## Installation
+
+```bash
+# Only one new package for v2.1
+bun add hono-rate-limiter
+```
+
+Everything else is schema migrations, new service/route/middleware code, and one new MCP tool — all on the existing stack.
+
+---

 ## Sources

- [Logto official docs -- React quickstart](https://docs.logto.io/quick-starts/react) -- SDK setup, LogtoProvider config (HIGH confidence)
- [Logto API protection -- JWT validation](https://docs.logto.io/api-protection/nodejs/express) -- jose-based middleware pattern (HIGH confidence)
- [Logto OSS getting started](https://docs.logto.io/logto-oss/get-started-with-oss) -- Docker deployment, Postgres requirements (HIGH confidence)
- [Logto @logto/react npm](https://www.npmjs.com/package/@logto/react) -- Version 4.0.13 confirmed (HIGH confidence)
- [Drizzle ORM -- Bun SQL driver](https://orm.drizzle.team/docs/connect-bun-sql) -- Native Postgres via Bun (HIGH confidence)
- [Drizzle ORM -- PostgreSQL column types](https://orm.drizzle.team/docs/column-types/pg) -- pg-core schema definitions (HIGH confidence)
- [drizzle-kit Bun SQL issue #4122](https://github.com/drizzle-team/drizzle-orm/issues/4122) -- Known CLI limitation with Bun driver (MEDIUM confidence)
- [Bun S3 documentation](https://bun.com/docs/runtime/s3) -- Native S3 client, MinIO config (HIGH confidence)
- [MinIO GitHub](https://github.com/minio/minio) -- S3-compatible self-hosted storage (HIGH confidence)
- [jose GitHub](https://github.com/panva/jose) -- JWT library v6.2.2, explicit Bun support (HIGH confidence)
- [Authentik vs Zitadel comparison](https://wz-it.com/en/blog/authentik-vs-zitadel-identity-provider-comparison/) -- Auth provider analysis (MEDIUM confidence)
- [Keycloak vs Authentik vs Zitadel 2026](https://blog.houseoffoss.com/post/keycloak-vs-authentik-vs-zitadel-2026-which-open-source-login-tool-should-you-use) -- Ecosystem overview (MEDIUM confidence)
- [postgres.js npm](https://www.npmjs.com/package/postgres) -- Version 3.4.8, fallback driver (HIGH confidence)
+- [Drizzle ORM cursor-based pagination](https://orm.drizzle.team/docs/guides/cursor-based-pagination) — two-column keyset pattern, v0.45.x confirmed (HIGH)
+- [Drizzle ORM PostgreSQL full-text search](https://orm.drizzle.team/docs/guides/postgresql-full-text-search) — tsvector approach confirmed (HIGH)
+- [Drizzle ORM full-text search with generated columns](https://orm.drizzle.team/docs/guides/full-text-search-with-generated-columns) — generated column pattern for tsvector (HIGH)
+- [Hono ETag middleware](https://hono.dev/docs/middleware/builtin/etag) — built-in, no install required (HIGH)
+- [Hono Cache middleware](https://hono.dev/docs/middleware/builtin/cache) — explicitly listed as Cloudflare/Deno only, not Bun (HIGH)
+- [Hono ETag issue #4401](https://github.com/honojs/hono/issues/4401) — known inconsistency bug in etag middleware (MEDIUM)
+- [hono-rate-limiter GitHub](https://github.com/rhinobase/hono-rate-limiter) — v0.5.3, active, Bun compatible (HIGH)
+- [hono-rate-limiter npm](https://www.npmjs.com/package/hono-rate-limiter) — version 0.5.3 confirmed (HIGH)
+- [TanStack Query infinite queries](https://tanstack.com/query/latest/docs/framework/react/guides/infinite-queries) — `useInfiniteQuery` cursor pattern (HIGH)
+- [Drizzle ORM materialized views issue #2653](https://github.com/drizzle-team/drizzle-orm/issues/2653) — confirmed drizzle-kit does not fully support materialized view migrations (MEDIUM)
+- [Hono middleware docs](https://hono.dev/docs/guides/middleware) — selective auth middleware pattern (HIGH)
+- GearBox `package.json` — all existing dependency versions verified directly (HIGH)
+- GearBox `src/server/index.ts` — existing skip-list pattern verified directly (HIGH)
+- GearBox `src/server/middleware/auth.ts` — existing three-way auth verified directly (HIGH)
+- GearBox `src/db/schema.ts` — existing `globalItems` table columns verified directly (HIGH)

 ---
-*Stack research for: GearBox v2.0 Platform Foundation*
-*Researched: 2026-04-03*
+
+*Stack research for: GearBox v2.1 Public Discovery milestone*
+*Researched: 2026-04-09*