Tuesday, February 04, 2025

extract PDF data

JS libs/modules

pdf.js-extract - npm

"extracts text from PDF files
This is just a library packaged out of the examples for usage of pdf.js with nodejs.
It reads a pdf file and exports all pages & texts with coordinates. This can be e.g. used to extract structured table data.
This package includes a build of pdf.js. why? pdfs-dist installs not needed dependencies into production deployment.
Note: NO OCR!"

alternative libs:


pdf-parse

pdf2json

pdfreader

AWS OCR API





No comments: