matt/Rocky_Mountain_Vending

DMleadgen 087fda7ce6

Add manuals knowledge retrieval and corpus tooling

2026-04-07 15:38:55 -06:00

1.8 KiB

Raw Blame History

Manuals Qdrant Readiness

Purpose

The long-term source of truth for this pipeline is now the shared manuals-platform package at the workspace root.
The RMV repo keeps this document as a consumer-side reference for the tenant-filtered artifacts Rocky reads.

Source inputs

Shared package location: ../manuals-platform
Shared build outputs: ../manuals-platform/output/full/*
Rocky tenant outputs: ../manuals-platform/output/tenants/rocky-mountain-vending/*

What the corpus builder does

The shared package scans the full portfolio manual set, classifies every PDF, assigns tenant entitlements, and publishes tenant-filtered Qdrant-ready artifacts.
It keeps public_safe and internal_tech retrieval profiles on top of one central corpus.
Rocky consumes the prebuilt Rocky tenant export instead of rebuilding from raw manuals data inside the app.

Build and evaluation commands

Build artifacts:
- pnpm manuals:qdrant:build
Build artifacts into a custom directory:
- pnpm manuals:qdrant:build -- --output-dir /absolute/path
Run the evaluation set:
- pnpm manuals:qdrant:eval

Artifact output

Default output directory: output/manuals-qdrant
Important files:
- summary.json
- manuals.json
- chunks.json
- chunks-high-confidence.json
- chunks-public-safe.json
- chunks-internal-tech.json
- evaluation-cases.json
- evaluation-report.json

Operational notes

The first Qdrant prototype should ingest chunks-high-confidence.json or chunks-internal-tech.json, not the full raw corpus.
Public-facing experiences should stay on public_safe filters even after Qdrant is introduced.
After manuals-data changes, rebuild the artifacts so the new normalized corpus and evaluation report stay in sync.