diff --git a/README.md b/README.md
index 7fe9681..900b66c 100644
--- a/README.md
+++ b/README.md
@@ -59,6 +59,15 @@ Image bauen und in Coolify neben Qdrant + Ollama deployen:
 docker build -f docker/Dockerfile -t rag-ingestor .
 ```
 
+### Ollama-Ressourcenlimits
+
+Embedding-Inferenz ist CPU-only und skaliert per Default auf alle verfügbaren Cores. Für Produktion daher Ollama hart limitieren, damit der Host nicht von Ingest-Spikes blockiert wird:
+
+- `cpus: "2.0"` (Container-Cap)
+- `OLLAMA_NUM_PARALLEL=1` (serialisiert Embedding-Requests intern)
+
+Beide Werte sind in `docker-compose.yml` für die lokale Entwicklung gesetzt und sollten in Coolify entsprechend mitgepflegt werden. Folge: konstante ~2 CPU statt Peaks bis 8 CPU, dafür längere Bulk-Laufzeiten.
+
 ## Tests
 
 ```bash
diff --git a/docker-compose.yml b/docker-compose.yml
index 01483ee..40b4bc1 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -27,6 +27,12 @@ services:
       - "11434:11434"
     volumes:
       - ollama_data:/root/.ollama
+    # Cap CPU so embedding peaks don't starve the host. Mirror these
+    # limits in the production Coolify config — Ollama otherwise scales
+    # inference threads to all available cores.
+    cpus: "2.0"
+    environment:
+      OLLAMA_NUM_PARALLEL: "1"
 
 volumes:
   qdrant_data: