Rocky_Mountain_Vending/docs/operations/MANUALS_QDRANT_READINESS.md

40 lines
1.8 KiB
Markdown

# Manuals Qdrant Readiness
## Purpose
- The long-term source of truth for this pipeline is now the shared `manuals-platform` package at the workspace root.
- The RMV repo keeps this document as a consumer-side reference for the tenant-filtered artifacts Rocky reads.
## Source inputs
- Shared package location: `../manuals-platform`
- Shared build outputs: `../manuals-platform/output/full/*`
- Rocky tenant outputs: `../manuals-platform/output/tenants/rocky-mountain-vending/*`
## What the corpus builder does
- The shared package scans the full portfolio manual set, classifies every PDF, assigns tenant entitlements, and publishes tenant-filtered Qdrant-ready artifacts.
- It keeps `public_safe` and `internal_tech` retrieval profiles on top of one central corpus.
- Rocky consumes the prebuilt Rocky tenant export instead of rebuilding from raw manuals data inside the app.
## Build and evaluation commands
- Build artifacts:
- `pnpm manuals:qdrant:build`
- Build artifacts into a custom directory:
- `pnpm manuals:qdrant:build -- --output-dir /absolute/path`
- Run the evaluation set:
- `pnpm manuals:qdrant:eval`
## Artifact output
- Default output directory: `output/manuals-qdrant`
- Important files:
- `summary.json`
- `manuals.json`
- `chunks.json`
- `chunks-high-confidence.json`
- `chunks-public-safe.json`
- `chunks-internal-tech.json`
- `evaluation-cases.json`
- `evaluation-report.json`
## Operational notes
- The first Qdrant prototype should ingest `chunks-high-confidence.json` or `chunks-internal-tech.json`, not the full raw corpus.
- Public-facing experiences should stay on `public_safe` filters even after Qdrant is introduced.
- After manuals-data changes, rebuild the artifacts so the new normalized corpus and evaluation report stay in sync.