@stnd/ingest
Document-to-Markdown converter for the Standard Framework.
Converts DOCX, PDF, HTML, and RTF files into clean Markdown — ready for @stnd/press to render beautifully.
Philosophy
Strip everything that isn’t content. No formatting noise. The Press will apply beautiful typography automatically.
Supported Formats
| Format | Extensions | Engine |
|---|---|---|
| Plain text / Markdown | .txt, .md |
Pass-through |
| Microsoft Word | .docx, .doc |
mammoth (lazy-loaded) |
| HTML | .html, .htm |
html-to-md |
.pdf |
pdfjs-dist (lazy-loaded) | |
| Rich Text Format | .rtf |
Basic text extraction |
Heavy dependencies (mammoth, pdfjs-dist) are lazy-loaded — they only get imported when you actually convert that format.
Usage
import {
convertDocumentToMarkdown,
extractDocumentTitle,
isSupportedDocument,
} from “@stnd/ingest”;
// Check if a file can be converted
if (isSupportedDocument(file)) {
// Convert to clean markdown
const markdown = await convertDocumentToMarkdown(file);
// Extract a title from the resulting markdown
const title = extractDocumentTitle(markdown, file.name);
}
API
convertDocumentToMarkdown(file): Promise<string>
Accepts a File object (or any object with name, type, text(), and arrayBuffer() methods). Returns clean Markdown.
extractDocumentTitle(markdown, filename?): string
Extracts a title from converted Markdown content. Checks headings first, then the first line, then falls back to the filename.
isSupportedDocument(file): boolean
Checks whether a file can be converted based on its MIME type or extension.
getSupportedMimeTypes(): string[]
Returns all MIME types that can be converted.
getSupportedExtensions(): string[]
Returns all file extensions that can be converted.
Relationship to other packages
User drops a file
↓
@stnd/ingest → “Here's clean Markdown”
↓
@stnd/press → “Here's beautiful HTML”
@stnd/ingest handles the messy input. @stnd/press handles the beautiful output. Clean separation of concerns.