40 lines
1.8 KiB
Markdown
40 lines
1.8 KiB
Markdown
# Manuals Qdrant Readiness
|
|
|
|
## Purpose
|
|
- The long-term source of truth for this pipeline is now the shared `manuals-platform` package at the workspace root.
|
|
- The RMV repo keeps this document as a consumer-side reference for the tenant-filtered artifacts Rocky reads.
|
|
|
|
## Source inputs
|
|
- Shared package location: `../manuals-platform`
|
|
- Shared build outputs: `../manuals-platform/output/full/*`
|
|
- Rocky tenant outputs: `../manuals-platform/output/tenants/rocky-mountain-vending/*`
|
|
|
|
## What the corpus builder does
|
|
- The shared package scans the full portfolio manual set, classifies every PDF, assigns tenant entitlements, and publishes tenant-filtered Qdrant-ready artifacts.
|
|
- It keeps `public_safe` and `internal_tech` retrieval profiles on top of one central corpus.
|
|
- Rocky consumes the prebuilt Rocky tenant export instead of rebuilding from raw manuals data inside the app.
|
|
|
|
## Build and evaluation commands
|
|
- Build artifacts:
|
|
- `pnpm manuals:qdrant:build`
|
|
- Build artifacts into a custom directory:
|
|
- `pnpm manuals:qdrant:build -- --output-dir /absolute/path`
|
|
- Run the evaluation set:
|
|
- `pnpm manuals:qdrant:eval`
|
|
|
|
## Artifact output
|
|
- Default output directory: `output/manuals-qdrant`
|
|
- Important files:
|
|
- `summary.json`
|
|
- `manuals.json`
|
|
- `chunks.json`
|
|
- `chunks-high-confidence.json`
|
|
- `chunks-public-safe.json`
|
|
- `chunks-internal-tech.json`
|
|
- `evaluation-cases.json`
|
|
- `evaluation-report.json`
|
|
|
|
## Operational notes
|
|
- The first Qdrant prototype should ingest `chunks-high-confidence.json` or `chunks-internal-tech.json`, not the full raw corpus.
|
|
- Public-facing experiences should stay on `public_safe` filters even after Qdrant is introduced.
|
|
- After manuals-data changes, rebuild the artifacts so the new normalized corpus and evaluation report stay in sync.
|