Files
GearBox/.planning/phases/17-object-storage/17-RESEARCH.md

455 lines
23 KiB
Markdown

# Phase 17: Object Storage - Research
**Researched:** 2026-04-04
**Domain:** S3-compatible object storage (MinIO), AWS SDK v3, image upload/serve refactoring
**Confidence:** MEDIUM
## Summary
This phase replaces local filesystem image storage (`uploads/` directory) with S3-compatible object storage. The user has decided on MinIO with `@aws-sdk/client-s3` and `@aws-sdk/s3-request-presigner`. However, research uncovered a significant development: **MinIO's GitHub repository was archived on February 13, 2026**, and official Docker images are no longer published to Docker Hub or Quay.io as of October 2025. The last available pre-built Docker image on quay.io is `RELEASE.2025-09-07T16-13-09Z`, which is still pullable and functional for development use.
The existing quay.io images remain usable -- the image `quay.io/minio/minio:RELEASE.2025-09-07T16-13-09Z` was verified as still available. For a development-only dependency (local Docker Compose), pinning to this release is pragmatic. The S3 API is a standard -- any future migration to SeaweedFS, Garage, or AWS S3 itself requires zero code changes since `@aws-sdk/client-s3` works identically with all S3-compatible services.
**Primary recommendation:** Proceed with MinIO using the pinned quay.io image for Docker Compose. The storage service abstraction via `@aws-sdk/client-s3` ensures the underlying S3 provider is swappable without code changes. Document the MinIO archival status and alternatives in a code comment.
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
- **D-01:** Use `@aws-sdk/client-s3` (AWS SDK v3) for MinIO communication
- **D-02:** Use `@aws-sdk/s3-request-presigner` for generating presigned URLs
- **D-03:** Create `src/server/services/storage.service.ts` with functions: `uploadImage(buffer, filename, contentType)`, `deleteImage(filename)`, `getImageUrl(filename)`
- **D-04:** `getImageUrl()` returns a presigned URL with configurable expiry (default 1 hour)
- **D-05:** Environment variables: `S3_ENDPOINT`, `S3_ACCESS_KEY`, `S3_SECRET_KEY`, `S3_BUCKET` (default: `gearbox-images`), `S3_REGION` (default: `us-east-1`)
- **D-06:** `POST /api/images` and `POST /api/images/from-url` upload to MinIO instead of local filesystem
- **D-07:** `fetchImageFromUrl()` uploads fetched buffer to MinIO instead of writing to disk
- **D-08:** Remove static file serving for `/uploads/*` from the server
- **D-09:** API resolves `imageFilename` to a presigned MinIO URL. Add `imageUrl` field to API responses
- **D-10:** Client components use presigned URL directly
- **D-11:** Migration script `scripts/migrate-images-to-minio.ts`
- **D-12:** No filename changes during migration -- existing `imageFilename` values become MinIO object keys
- **D-13:** MinIO service in docker-compose.yml with automatic bucket creation on startup
- **D-14:** Dev compose uses fixed credentials. Prod compose uses env vars.
### Claude's Discretion
- Presigned URL expiry duration (1h default, configurable)
- Whether to add a GET /api/images/:filename proxy endpoint as fallback
- MinIO Docker image version
- Bucket policy (private with presigned URLs vs public-read)
- Whether to delete local files after successful migration
- Error handling strategy for upload failures
### Deferred Ideas (OUT OF SCOPE)
None
</user_constraints>
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|------------------|
| IMG-01 | Images are stored in MinIO (S3-compatible) instead of local filesystem | Storage service wraps @aws-sdk/client-s3; upload routes refactored to call storage.uploadImage() |
| IMG-02 | Existing uploaded images are migrated to MinIO | Migration script reads uploads/ dir, uploads each file to MinIO bucket |
| IMG-03 | Image upload and retrieval work through the new storage layer | Upload endpoints use storage service; API responses include presigned URLs via getImageUrl() |
| IMG-04 | Docker Compose provides MinIO for local development | MinIO + mc init container in docker-compose.dev.yml with auto bucket creation |
</phase_requirements>
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| @aws-sdk/client-s3 | 3.1024.0 | S3 API operations (PutObject, DeleteObject, GetObject) | Official AWS SDK v3, tree-shakeable, works with any S3-compatible service |
| @aws-sdk/s3-request-presigner | 3.1024.0 | Generate presigned URLs for direct client access | Official companion package for presigned URL generation |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| minio/minio (Docker) | RELEASE.2025-09-07T16-13-09Z | S3-compatible object storage for dev/prod | Docker Compose only -- not an npm dependency |
| minio/mc (Docker) | latest | MinIO client CLI for bucket initialization | Init container in Docker Compose |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| MinIO (archived) | SeaweedFS | More complex Docker setup (master+volume+filer+s3 = 4 containers vs 1); better long-term viability |
| MinIO (archived) | Garage | Lightweight, Rust-based; but complex configuration for single-node |
| Presigned URLs | Proxy endpoint | Proxy adds server load but avoids CORS and presigned URL complexity |
**Installation:**
```bash
bun add @aws-sdk/client-s3 @aws-sdk/s3-request-presigner
```
**Version verification:** Versions confirmed via `npm view` on 2026-04-04. Both packages at 3.1024.0.
## Architecture Patterns
### Recommended Project Structure
```
src/server/
├── services/
│ ├── storage.service.ts # NEW: S3 storage abstraction
│ └── image.service.ts # MODIFIED: Uses storage service instead of Bun.write
├── routes/
│ └── images.ts # MODIFIED: Uses storage service for uploads
scripts/
└── migrate-images-to-minio.ts # NEW: One-time migration script
```
### Pattern 1: S3 Client Singleton
**What:** Create the S3Client once at module level with configuration from env vars. Export functions that use it.
**When to use:** All storage operations.
**Example:**
```typescript
// src/server/services/storage.service.ts
import { S3Client, PutObjectCommand, DeleteObjectCommand, GetObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
const s3 = new S3Client({
endpoint: process.env.S3_ENDPOINT,
region: process.env.S3_REGION ?? "us-east-1",
credentials: {
accessKeyId: process.env.S3_ACCESS_KEY!,
secretAccessKey: process.env.S3_SECRET_KEY!,
},
forcePathStyle: true, // REQUIRED for MinIO and most S3-compatible services
});
const bucket = process.env.S3_BUCKET ?? "gearbox-images";
const presignExpiry = parseInt(process.env.S3_PRESIGN_EXPIRY ?? "3600", 10);
export async function uploadImage(
buffer: Buffer | ArrayBuffer,
filename: string,
contentType: string,
): Promise<void> {
await s3.send(new PutObjectCommand({
Bucket: bucket,
Key: filename,
Body: Buffer.from(buffer),
ContentType: contentType,
}));
}
export async function deleteImage(filename: string): Promise<void> {
await s3.send(new DeleteObjectCommand({
Bucket: bucket,
Key: filename,
}));
}
export async function getImageUrl(filename: string): Promise<string> {
const command = new GetObjectCommand({
Bucket: bucket,
Key: filename,
});
return getSignedUrl(s3, command, { expiresIn: presignExpiry });
}
```
### Pattern 2: Presigned URL Injection in API Responses
**What:** When returning items/candidates with `imageFilename`, resolve to presigned URL and add `imageUrl` field.
**When to use:** All GET endpoints that return records with `imageFilename`.
**Example:**
```typescript
// Helper to enrich records with presigned URLs
async function withImageUrl<T extends { imageFilename: string | null }>(
record: T,
): Promise<T & { imageUrl: string | null }> {
return {
...record,
imageUrl: record.imageFilename
? await getImageUrl(record.imageFilename)
: null,
};
}
```
### Pattern 3: Docker Compose Init Container for Bucket Creation
**What:** Use a `minio/mc` container that waits for MinIO, then creates the bucket.
**When to use:** Docker Compose dev and prod setups.
**Example:**
```yaml
minio:
image: quay.io/minio/minio:RELEASE.2025-09-07T16-13-09Z
command: server /data --console-address ":9001"
environment:
MINIO_ROOT_USER: ${S3_ACCESS_KEY:-minioadmin}
MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY:-minioadmin}
ports:
- "9000:9000"
- "9001:9001"
volumes:
- minio-data:/data
healthcheck:
test: ["CMD", "mc", "ready", "local"]
interval: 5s
timeout: 3s
retries: 5
minio-init:
image: quay.io/minio/mc:latest
depends_on:
minio:
condition: service_healthy
entrypoint: >
/bin/sh -c "
mc alias set myminio http://minio:9000 minioadmin minioadmin;
mc mb --ignore-existing myminio/gearbox-images;
exit 0;
"
```
### Anti-Patterns to Avoid
- **Storing presigned URLs in the database:** URLs expire. Always generate on read.
- **Not setting `forcePathStyle: true`:** MinIO and most S3-compatible services require path-style access. Virtual-hosted style will fail.
- **Using `minio/minio:latest` from Docker Hub:** Images are no longer updated. Pin to a specific quay.io release.
- **Generating presigned URLs for every item in a list:** Batch operations can be slow. Consider caching or generating on demand.
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| S3 request signing | Custom HMAC signing | @aws-sdk/s3-request-presigner | Signature V4 is complex, error-prone |
| Multipart upload | Custom chunked upload | @aws-sdk/client-s3 Upload utility | Handles retries, progress, chunk management |
| Content type detection | Custom magic byte checking | File extension mapping (already in codebase) | Existing validation is sufficient for jpeg/png/webp |
**Key insight:** The entire value of this phase is replacing local filesystem calls (Bun.write, unlink) with S3 SDK calls. The business logic (validation, filename generation, content type checking) stays unchanged.
## Common Pitfalls
### Pitfall 1: CORS Issues with Presigned URLs
**What goes wrong:** Browser blocks direct fetch to MinIO presigned URL due to CORS.
**Why it happens:** MinIO is a different origin (port 9000) from the app (port 3000).
**How to avoid:** Configure MinIO CORS policy via environment or mc command. Alternatively, add a proxy endpoint as fallback.
**Warning signs:** Images display as broken in dev but upload succeeds.
### Pitfall 2: Presigned URL Expiry in Long-Lived Pages
**What goes wrong:** User opens page, leaves it open > 1 hour, images stop loading.
**Why it happens:** Presigned URLs expire after the configured duration.
**How to avoid:** 1-hour default is generous. For extra safety, re-fetch image URLs on focus/visibility change, or use a longer expiry for GET operations.
**Warning signs:** Intermittent broken images in production.
### Pitfall 3: MinIO Health Check Timing
**What goes wrong:** App container starts before MinIO is ready, first uploads fail.
**Why it happens:** Docker Compose `depends_on` only waits for container start, not readiness.
**How to avoid:** Use health checks with `condition: service_healthy` in Docker Compose.
**Warning signs:** Startup failures in CI or fresh dev environments.
### Pitfall 4: Missing forcePathStyle Configuration
**What goes wrong:** SDK tries virtual-hosted style URLs (`bucket.endpoint`) which don't resolve for MinIO.
**Why it happens:** AWS SDK v3 defaults to virtual-hosted style for AWS S3.
**How to avoid:** Always set `forcePathStyle: true` in S3Client config for non-AWS S3 services.
**Warning signs:** DNS resolution errors or "bucket not found" errors.
### Pitfall 5: Performance Impact of Presigned URL Generation
**What goes wrong:** List endpoints become slow because each item needs a presigned URL.
**Why it happens:** `getSignedUrl` is a crypto operation per URL.
**How to avoid:** `getSignedUrl` from @aws-sdk/s3-request-presigner is a local crypto operation (no network call), so it should be fast. But for lists of 100+ items, use `Promise.all` to parallelize. If still slow, consider generating URLs lazily on the client.
**Warning signs:** GET /api/items response time increases noticeably.
## Code Examples
### Current Upload Flow (to be replaced)
```typescript
// src/server/routes/images.ts - current
await mkdir("uploads", { recursive: true });
await Bun.write(join("uploads", filename), buffer);
return c.json({ filename }, 201);
```
### New Upload Flow
```typescript
// After refactoring
import { uploadImage } from "../services/storage.service";
await uploadImage(buffer, filename, file.type);
return c.json({ filename }, 201);
```
### Current Image Deletion (to be replaced)
```typescript
// src/server/routes/items.ts - current
if (deleted.imageFilename) {
try {
await unlink(join("uploads", deleted.imageFilename));
} catch { /* File missing is not an error */ }
}
```
### New Image Deletion
```typescript
// After refactoring
if (deleted.imageFilename) {
try {
await deleteImage(deleted.imageFilename);
} catch { /* Object missing is not an error */ }
}
```
### Current Client Image Display (to be changed)
```typescript
// Multiple components currently use:
src={`/uploads/${imageFilename}`}
```
### New Client Image Display
```typescript
// Components will use the presigned URL from API response:
src={imageUrl}
```
### Files That Reference `/uploads/` (must all be updated)
**Server-side (6 locations):**
1. `src/server/services/image.service.ts` -- `Bun.write(join(uploadsDir, filename), buffer)`
2. `src/server/routes/images.ts` -- `Bun.write(join("uploads", filename), buffer)` + `mkdir("uploads")`
3. `src/server/routes/items.ts` -- `unlink(join("uploads", deleted.imageFilename))`
4. `src/server/routes/threads.ts` -- `unlink(join("uploads", filename))` (2 locations)
5. `src/server/index.ts` -- `app.use("/uploads/*", serveStatic({ root: "./" }))`
6. `docker-compose.yml` -- `volumes: - uploads:/app/uploads`
**Client-side (6 components):**
1. `src/client/components/ImageUpload.tsx` -- `src={/uploads/${value}}`
2. `src/client/components/ItemCard.tsx` -- `src={/uploads/${imageFilename}}`
3. `src/client/components/CandidateCard.tsx` -- `src={/uploads/${imageFilename}}`
4. `src/client/components/CandidateListItem.tsx` -- `src={/uploads/${candidate.imageFilename}}`
5. `src/client/components/ComparisonTable.tsx` -- `src={/uploads/${c.imageFilename}}`
6. `src/client/routes/setups/$setupId.tsx` -- `imageFilename={item.imageFilename}`
**MCP tools (1 location):**
1. `src/server/mcp/tools/images.ts` -- calls `fetchImageFromUrl()` which writes to local fs
## Discretion Recommendations
### Presigned URL Expiry
**Recommendation:** 1 hour default, configurable via `S3_PRESIGN_EXPIRY` env var. 1 hour balances security with usability. For GET-only presigned URLs, there is minimal security risk even with longer expiry.
### Proxy Endpoint Fallback
**Recommendation:** Do NOT add a proxy endpoint. Presigned URLs are the standard pattern. Adding a proxy creates two code paths to maintain and defeats the purpose of offloading image serving to the storage service. If CORS is an issue in dev, configure MinIO CORS instead.
### MinIO Docker Image Version
**Recommendation:** Use `quay.io/minio/minio:RELEASE.2025-09-07T16-13-09Z` (last stable release before project archival). Pin explicitly -- do not use `latest` tag. Add a comment noting the archival status and that the S3 API abstraction makes the provider swappable.
### Bucket Policy
**Recommendation:** Private bucket with presigned URLs. This is the standard secure approach. Public-read would work but is less secure and unnecessary since presigned URL generation is a local operation with negligible overhead.
### Delete Local Files After Migration
**Recommendation:** Do NOT auto-delete. The migration script should log success per file but leave originals intact. Add a manual cleanup step documented in the script output: "Run `rm -rf uploads/` after verifying all images load correctly from MinIO."
### Error Handling for Upload Failures
**Recommendation:** Let S3 SDK errors propagate. Wrap in try/catch at the route level and return 500 with a generic error message. Log the full error server-side. No retry logic needed -- uploads are user-initiated and can be retried manually.
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| MinIO Docker Hub images | quay.io pinned release or build from source | Oct 2025 | Must use quay.io registry or alternative S3 provider |
| MinIO community edition | MinIO archived, AIStor commercial | Feb 2026 | No new features/security patches; S3 API is stable so existing images work |
| AWS SDK v2 (monolithic) | AWS SDK v3 (modular) | 2021+ | Tree-shakeable, smaller bundles, per-service packages |
**Deprecated/outdated:**
- MinIO community Docker images on Docker Hub: No longer updated as of Oct 2025
- MinIO GitHub repository: Archived Feb 2026, read-only
- `@aws-sdk/client-s3` v2 API: Use v3 modular imports
## Open Questions
1. **MinIO CORS Configuration for Dev**
- What we know: Presigned URLs from MinIO (port 9000) will be fetched by the browser app (port 5173/3000), creating a cross-origin request.
- What's unclear: Whether MinIO's default CORS settings allow this, or if explicit configuration is needed.
- Recommendation: Test in dev. If CORS blocks requests, configure MinIO via `mc anonymous set download myminio/gearbox-images` or set CORS policy via mc. The Vite dev server proxy could also be used as a workaround.
2. **Presigned URL Performance at Scale**
- What we know: `getSignedUrl` is a local crypto operation (no network call). For small collections (< 100 items), overhead is negligible.
- What's unclear: Performance impact when listing 500+ items with images.
- Recommendation: Implement with `Promise.all` for list endpoints. Monitor and optimize only if measurable slowdown occurs.
## Environment Availability
| Dependency | Required By | Available | Version | Fallback |
|------------|------------|-----------|---------|----------|
| Docker | MinIO container | Yes | 29.0.0 | -- |
| Docker Compose | Multi-container setup | Yes | v2.40.3 | -- |
| Bun | Runtime | Yes | (project runtime) | -- |
| MinIO (quay.io) | S3 storage | Yes (verified pullable) | RELEASE.2025-09-07T16-13-09Z | SeaweedFS, Garage, or any S3-compatible service |
**Missing dependencies with no fallback:** None
**Missing dependencies with fallback:** None
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | Bun test runner |
| Config file | bunfig.toml (if exists) or default |
| Quick run command | `bun test tests/services/image.service.test.ts` |
| Full suite command | `bun test` |
### Phase Requirements to Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| IMG-01 | Upload stores in S3 instead of filesystem | unit | `bun test tests/services/storage.service.test.ts` | No -- Wave 0 |
| IMG-01 | Image routes call storage service | integration | `bun test tests/routes/images.test.ts` | Yes -- needs update |
| IMG-02 | Migration script uploads all files from uploads/ to MinIO | integration | `bun test tests/scripts/migrate-images.test.ts` | No -- Wave 0 |
| IMG-03 | getImageUrl returns presigned URL | unit | `bun test tests/services/storage.service.test.ts` | No -- Wave 0 |
| IMG-03 | API responses include imageUrl field | integration | `bun test tests/routes/items.test.ts` | Yes -- needs update |
| IMG-04 | Docker Compose MinIO starts and bucket is created | manual | `docker compose -f docker-compose.dev.yml up -d && mc alias set ...` | N/A -- manual |
### Sampling Rate
- **Per task commit:** `bun test tests/services/storage.service.test.ts tests/routes/images.test.ts`
- **Per wave merge:** `bun test`
- **Phase gate:** Full suite green before `/gsd:verify-work`
### Wave 0 Gaps
- [ ] `tests/services/storage.service.test.ts` -- covers IMG-01, IMG-03 (mock S3Client)
- [ ] Update `tests/services/image.service.test.ts` -- refactor to use mocked storage service
- [ ] Update `tests/routes/images.test.ts` -- verify routes call storage service
### Testing Strategy for S3 Operations
Storage service tests should mock the S3Client. The `@aws-sdk/client-s3` SDK supports the `aws-sdk-client-mock` library for unit testing, but for this project's scope, simple mock functions injected via a factory pattern or module-level mocking with `bun:test`'s `mock` are sufficient. Do NOT require a running MinIO instance for unit tests.
## Project Constraints (from CLAUDE.md)
- **Runtime:** Bun (not Node.js)
- **Server framework:** Hono with Zod validation
- **Service pattern:** Pure functions, no HTTP awareness -- storage.service.ts follows this (stateless, no db needed)
- **Path alias:** `@/*` maps to `./src/*`
- **Formatting:** Biome (tabs, double quotes, organized imports)
- **Testing:** Bun test runner, service-level and route-level tests
- **Branching:** Feature branch off Develop, merge back via PR
- **Releases:** Via Gitea Actions pipeline only
## Sources
### Primary (HIGH confidence)
- npm registry -- @aws-sdk/client-s3 version 3.1024.0 (verified via `npm view`)
- npm registry -- @aws-sdk/s3-request-presigner version 3.1024.0 (verified via `npm view`)
- quay.io -- minio/minio:RELEASE.2025-09-07T16-13-09Z image manifest (verified pullable)
- Codebase analysis -- all 12+ locations referencing `/uploads/` or `imageFilename` identified
### Secondary (MEDIUM confidence)
- [AWS Developer Blog - Presigned URLs](https://aws.amazon.com/blogs/developer/generate-presigned-url-modular-aws-sdk-javascript/) -- presigned URL patterns
- [MinIO Docker Compose bucket creation](https://banach.net.pl/posts/2025/creating-bucket-automatically-on-local-minio-with-docker-compose/) -- mc init container pattern
- [Alternatives to MinIO for single-node local S3](https://rmoff.net/2026/01/14/alternatives-to-minio-for-single-node-local-s3/) -- post-archival alternatives
### Tertiary (LOW confidence)
- MinIO CORS configuration requirements -- not verified with current version, needs testing
- Presigned URL performance at scale -- theoretical, not benchmarked
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH -- AWS SDK v3 versions verified, well-documented, stable API
- Architecture: HIGH -- Pattern is straightforward S3 wrapper, codebase touchpoints fully mapped
- Pitfalls: MEDIUM -- CORS and presigned URL expiry are known issues but specific MinIO behavior with current quay.io image not verified
- MinIO availability: MEDIUM -- quay.io image verified pullable today, but no future updates expected
**Research date:** 2026-04-04
**Valid until:** 2026-05-04 (stable -- S3 API unlikely to change; MinIO image is pinned)