Printing corrupted (scanned) PDF
My printer is ... special. From its web interface, you can scan a document and get a PDF, but you can't print it!
It generated "nature-friendly" PDFs! Only white pages get out of the printer.
THINK BEFORE YOU PRINT: Please consider the environment before printing this email.
It doesn't work with Evince, nor pdf2ps, nor evince > print to file > print, nor convert.
But Evince does print some useful information:
Syntax Error (5404808): Illegal character '>' Corrupt JPEG data: premature end of data segment Corrupt JPEG data: premature end of data segment Corrupt JPEG data: premature end of data segment
The JPEG images contained in the PDF are corrupted. For some reasons, Evince can display them onscreen, but not translate them to PS for the printer ... There's certainly a PDF library down there that doesn't handle invalid images that is used to transform PDFs into other formats.
Hopefully, Popplet's pdfimages doesn't rely on that "broken" library, and it can extract all the images of a PDF!
When you try to export the images in JPEG format (option -j), it still doesn't work, as it just extracts the invalid images out of the PDF. Eye of Gnome can't display it and explains why:
Error interpreting JPEG image file (Maximum supported image dimension is 65500 pixels)
However, pdfimages can also export PPM images ( portable pixmap file format), that are not invalid! yeay! :-)
pdfimages $PDF $PREFIX
and with ImageMagicks convert, you can rebuild your PDF:
convert-pdf() { PDF=$1 PREFIX=convert- TMP=$(mktemp -d) WD=$(pwd) cp $PDF $TMP mv $PDF $PDF.bak cd $TMP pdfimages $PDF $PREFIX # convert ppm to jpg, that saves a lot space! for i in $PREFIX*.ppm convert $i $(basename $i .ppm).jpg convert $PREFIX*.jpg $PDF mv $PDF $WD cd $WD rm -rf $TMP }