Why this follows HTML cleanup
Tool A removes page chrome. The chunk inspector checks whether the cleaned Markdown still splits into retrieval-friendly sections.
PromptStage
AI workflow staging tools
RAG chunk staging
Paste cleaned Markdown and inspect how it splits before embedding, retrieval, or agent-context storage.
Input
Start chunks at Markdown headings, then split long sections conservatively.
Use Tool A first for public pages or raw HTML, then paste the cleaned Markdown here before embedding or storing retrieval context.
Output
Paste cleaned Markdown, choose a preset, then inspect heading context, token estimates, code/table boundaries, and JSONL-ready chunk records.
Short answer
Use Markdown Chunk Inspector after HTML cleanup and before embedding. It reviews chunk boundaries, heading context, token estimates, overlap previews, table and code safety, and export shape so retrieval records are easier to inspect before they enter a vector database or agent context store.
Workflow fit
Use this when a page is already clean enough to read, but not yet inspected enough to trust as retrieval context.
Tool A removes page chrome. The chunk inspector checks whether the cleaned Markdown still splits into retrieval-friendly sections.
The first pass flags missing heading context, tiny chunks, oversized chunks, Markdown tables, fenced code, and overlap previews.
Use Markdown for human QA, JSON for app pipelines, and JSONL when each chunk should become one retrieval or agent-context record.
| Warning | What it means | Best next fix |
|---|---|---|
| Missing heading context | A chunk may be technically valid but hard to interpret when retrieved alone. | Preserve the nearest useful H2/H3 path or split from a cleaner section boundary. |
| Oversized chunk | A dense section may exceed the target window before the retrieval layer sees it. | Use paragraph-safe or token-window chunking, then check that the split keeps meaning intact. |
| Code or table boundary risk | Splitting inside a table or fenced code block can damage the retrieved answer. | Use the code/table-safe preset and review the exported JSONL record before ingestion. |
FAQ
No. It only inspects and exports chunks so you can review them before using a vector database, RAG framework, or agent context store.
Use HTML to Markdown for AI first when the source is a public page or raw HTML. Then paste the cleaned Markdown here to inspect chunk boundaries.
No. They are deterministic estimates for planning and comparison. Always leave margin for the tokenizer used by your final model or retrieval stack.