-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot catch error in asBuffer() #312
Comments
This is something pdfjs can handle, besides the described issue with a defective one in between?
I think you are right.
Are you able to provide a small example to repo either or both errors? |
Yes, and it's blazingly fast ;-) First I had to extract the 1500 single pages from around 25 different multi page PDFs and also needed them as JPG files. This took a little time, mostly because of the image extraction: const jpgScale = 5;
for (const filename of fs.readdirSync(multiPagesPdfDirectory)) {
if (!filename.endsWith('.pdf')) continue;
const prefix = path.basename(filename, '.pdf');
const filepath = path.join(multiPagesPdfDirectory, filename);
const src = new pdfjs.ExternalDocument(fs.readFileSync(filepath));
for (let num = 1; num <= src.pageCount; num ++) {
const pdfFilepath = path.join(singlePagesPdfDirectory, `${prefix}-${String(num).padStart(4, '0')}.pdf`);
const jpgFilepath = pdfFilepath + '.jpg';
const doc = new pdfjs.Document();
doc.addPageOf(num, src);
//This created some invalid PDFs (Error: Invalid xref object at 54524 - only noticed when merging again in the next code block)
// doc.pipe(fs.createWriteStream(pdfFilepath));
// await doc.end();
//This created mostly valid PDFs (except: Name must start with a leading slash, found: 0 - only noticed when merging again in the next code block)
await doc.asBuffer().then(data => fs.writeFileSync(pdfFilepath, data, { encoding: 'binary' }));
const image = (await convert(pdfFilepath, { scale: jpgScale }))[0]; //pdf-img-convert
const jpg = sharp(image, { failOn: 'none' })
.flatten({ background: '#ffffff' })
.toColourspace('srgb')
.jpeg({ quality: 85, progressive: true });
await jpg.toFile(jpgFilepath);
}
} Then I had to merge all of them into a single PDF file. This took under 1 second, only issue could have been memory (especially when automatically retrying and not destroying a failed writeStream): const doc = new pdfjs.Document();
for (const filename of fs.readdirSync(singlePagesPdfDirectory)) {
if (!filename.endsWith('.pdf')) continue;
const filepath = path.join(singlePagesPdfDirectory, filename);
try {
const src = fs.readFileSync(filepath);
const ext = new pdfjs.ExternalDocument(src);
doc.addPagesOf(ext);
} catch(err) {
const { data, info } = await sharp(filepath + '.jpg, { failOn: 'none' }).toBuffer({ resolveWithObject: true });
const width = info.width / jpgScale;
const height = info.height / jpgScale;
const pdf = pdfmake.createPdfKitDocument({
pageSize: { width, height },
pageOrientation: 'portrait',
pageMargins: [0, 0, 0, 0],
content: [{
image: data,
left: 0,
top: 0,
width: width,
height: height,
}],
});
const buf = await new Promise((resolve, reject) => {
const chunks = [];
pdf.on('data', chunk => chunks.push(chunk));
pdf.on('end', () => resolve(Buffer.concat(chunks)));
pdf.on('error', reject);
pdf.end();
});
const ext = new pdfjs.ExternalDocument(buf);
doc.addPagesOf(ext);
}
}
doc.pipe(fs.createWriteStream(fullPdfPath));
await doc.end(); As I was in a rush, I added a quick fix falling back to the extracted image. But this only helped with "Invalid xref object at 54524" errors as they occurred while reading the single page PDFs. The "Name must start with a leading slash, found: 0" error occurred while writing the fully merged PDF, this is where I could not catch the error to find out which page. Also trying to repair the affected PDFs (after narrowing down which single page was actually to blame) did not help. Lots of unnecessary code but I thought you might be interested in how I used your library.
Unfortunately the repo is private, but I'll send you example PDFs via email, that you can use along my code above, as soon as I have time to find the relevant files. |
I'm having the exact same error with a few different PDF files which I can send you privately. I'm using latest pdfjs 2.5.0 My real code downloads two file buffers from external pdf, combines them, and then throws the unhandled error while converting the combined document to a buffer. Here's my minimal repro code: import * as pdfjs from 'pdfjs';
import * as https from 'https';
(async () => {
try {
console.log('downloading...');
const pdfBuffer = await downloadExternalReport('contact me for URLs');
const doc = new pdfjs.ExternalDocument(pdfBuffer);
const outputDoc = new pdfjs.Document();
outputDoc.addPagesOf(doc);
console.log('converting to buffer...');
const outBuffer = await outputDoc.asBuffer(); // <- error thrown here but not caught
console.log('done!');
} catch (e) {
console.error('error combining PDFs', e);
}
})();
function downloadExternalReport(url: string) {
const data: Buffer[] = [];
return new Promise<Buffer>((resolve, reject) => {
const request = https.get(url, (response) => {
if (response.statusCode !== 200) {
reject('Error downloading external report');
} else {
response.on('data', (d: Buffer) => data.push(d));
response.on('end', () => resolve(Buffer.concat(data)));
}
});
request.on('error', reject)
})
} One file gives me this error
Another gives me this error
And another gives me this error
|
In my case, all three of my files were previously generated by This error gets triggered when exporting the combined report to a buffer, even though both files were previously exported to buffer by |
The unhandled promise rejection error should be fixed on I am afraid though that this issue isn't very high on my list, since it does not affect my own use-case. So that you can plan, you should know that I don't expect to work on that in the foreseeable future. |
Thanks @rkusa. Do you have an ETA on when you can publish the unhandled promise rejection fix to npm? |
@wildhart just released as 2.5.1 |
I've tried 2.5.1 and I'm afraid I still get the same unhandled errors as before. If I edit your code directly in my
Then the error is properly caught and handled by my own error handler:
|
Well, ... 🤦♂️ – apparently returning a promise to chain it was only a thing inside of a |
@rkusa thanks for the toBuffer error fix. As for the I identified the single page PDF causing this error and I'll send it to you via mail. You can just use it with your newly written test as |
@rkusa if it helps, do you accept Github sponsorship? My client uses this in a commercial environment and this bug is costing them time (when we come across this error the only solution is to "reprint" the offending PDF, then we are able to append it to our own PDFs). I was going to see if I could investigate it myself if I could find any time. So if you could fix this, you'd save us time and therefore $$, so would be happy to send something your way... |
@wildhart If it helps, I can send you our failing single PDF (60-100KB) as well. I was able to repair the PDF via iLovePDF and was then able to use I just found out, that they have a NodeJS library (https://github.com/ilovepdf/ilovepdf-nodejs) to access their API. Feels like a really ugly workaround but I was thinking of implementing this on failing PDFs. |
@wildhart I appreciate the offer, but Anyway, the documents @7freaks-otte send over made it very easy for me to spot the issue. Thanks a lot for narrowing it down to a single page @7freaks-otte! I've just pushed a fix. However, adding pages of Mind checking |
I've tried installing your latest pdfjs direct from github, but I continue to get "Name must start with a leading slash, found: (" with some files. Also, with another file I get your new error "Tried to write reference with I've sent you two files by email... |
@rkusa Thank you very much, I'll try to test your fix the next days and give you feedback. |
@wildhart Thanks for testing. To be sure, my fix prevents that The error |
In the example I sent you "Tax-Invoice-M590936.pdf" that file was not generated by pdfjs (at least not by me) - that file was uploaded by one of our clients and triggers the "Name must start with a leading slash, found: (" error when appended to a pdf using pdfjs, then that pdf is appended to another pdf. |
@rkusa sorry for the delay, I was quite busy the last weeks. Your commit b6cdd70 seems to fix the Maybe its worth noting that I just want to add a single page (3) from the previously generated PDF. What worked for me (though not practical) is:
|
Just FYI, I've moved way from using pdfjs for merging PDFs, due to this issue with certain PDFS causing errors, and also excessive file sizes (#314). Instead I'm using pdf-lib which is really easy to use to copy pages from one PDF to another, and it doesn't have any problems with the files we've provided here which throw errors in pdfjs, and the output file size is never bigger than the original files. It also seems a bit faster. I'm still using pdfjs to generate PDF from html, but then I use pdf-lib to combine that with other PDF files. |
File works for me with the previous fix – not sure if it is a specific constellation on how it is added to the file.
This error was added as part of the fix to prevent
Sounds like a good decision to me. I've also added a note about the current maintenance status to the README. I myself moved most of my uses of pdfjs to a simple HTML to PDF via headless Chrome (I don't have the use-case of adding other PDFs anymore). |
For the moment I'm OK with my workaround above using 2 pdfjs versions at a time, as the PDFs are only genertated once in a several months. I understand your priorities. Thanks for your help anyway @rkusa |
We are currently using the latest version of PDFJS (2.5.3), but we are still encountering the error: 'Name must start with a leading slash' and 'TypeError: Tried to write reference with null object id' for some PDFs. We have attempted to resolve this issue by re-saving the PDFs and appending them to the main PDF using PDFJS, but unfortunately, this solution does not seem to be effective. |
I'm currently merging around 1500 PDFs and tried to find the defect one, but I cannot catch errors produced by
this.end()
inasBuffer()
.While errors get caught here:
The node process quits with an unhandled error here:
I'm pretty sure the reason for the uncaught error is this line:
pdfjs/lib/document.js
Line 636 in 3374d1f
It should probably be:
Interesting side fact:
PDFs throwing errors like
Invalid xref object at 54524
orName must start with a leading slash, found: 0
are single-page PDFs previously extracted by pdfjs from other multi-page PDFs. Extracting worked, but merging again failed.I could get rid of the Invalid xref object error by extracting with asBuffer() writeFileSync and encoding binary instead of pipe and stream but the one PDF with
Name must start with a leading slash, found: 0
drives me crazy.The text was updated successfully, but these errors were encountered: