@stnd/ingest

Document-to-Markdown converter for the Standard Framework.

Converts DOCX, PDF, HTML, and RTF files into clean Markdown — ready for @stnd/press to render beautifully.

Philosophy

Strip everything that isn’t content. No formatting noise. The Press will apply beautiful typography automatically.

Supported Formats

Format	Extensions	Engine
Plain text / Markdown	`.txt`, `.md`	Pass-through
Microsoft Word	`.docx`, `.doc`	mammoth (lazy-loaded)
HTML	`.html`, `.htm`	html-to-md
PDF	`.pdf`	pdfjs-dist (lazy-loaded)
Rich Text Format	`.rtf`	Basic text extraction

Heavy dependencies (mammoth, pdfjs-dist) are lazy-loaded — they only get imported when you actually convert that format.

Usage

import {
  convertDocumentToMarkdown,
  extractDocumentTitle,
  isSupportedDocument,
} from “@stnd/ingest”;

// Check if a file can be converted
if (isSupportedDocument(file)) {
  // Convert to clean markdown
  const markdown = await convertDocumentToMarkdown(file);

  // Extract a title from the resulting markdown
  const title = extractDocumentTitle(markdown, file.name);
}

API

`convertDocumentToMarkdown(file): Promise<string>`

Accepts a File object (or any object with name, type, text(), and arrayBuffer() methods). Returns clean Markdown.

`extractDocumentTitle(markdown, filename?): string`

Extracts a title from converted Markdown content. Checks headings first, then the first line, then falls back to the filename.

`isSupportedDocument(file): boolean`

Checks whether a file can be converted based on its MIME type or extension.

`getSupportedMimeTypes(): string[]`

Returns all MIME types that can be converted.

`getSupportedExtensions(): string[]`

Returns all file extensions that can be converted.

Relationship to other packages

User drops a file
       ↓
  @stnd/ingest     →  “Here's clean Markdown”
       ↓
  @stnd/press      →  “Here's beautiful HTML”

@stnd/ingest handles the messy input. @stnd/press handles the beautiful output. Clean separation of concerns.