Thursday, September 26, 2024

Mammoth: JavaScript docx file reader lib

mammoth - npm

Mammoth is designed to convert .docx documents, such as those created by Microsoft Word, Google Docs and LibreOffice, and convert them to HTML. Mammoth aims to produce simple and clean HTML by using semantic information in the document, and ignoring other details. For instance, Mammoth converts any paragraph with the style Heading 1 to h1 elements, rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading.

const mammoth = require("mammoth");

mammoth.extractRawText({ path: "example.docx" })
    .then(result => {
        console.log(result.value); // The raw text
    })
    .catch(err => {
        console.error(err);
    });
const fs = require('fs');
const mammoth = require('mammoth');

async function readDocxFile(filePath) {
        const arrayBuffer = await fs.promises.readFile(filePath);
        const result = await mammoth.convertToHtml({ arrayBuffer });
        console.log(result.value); // The HTML content
}

Other supported platforms

alternatives




No comments: