pdf generation - What makes some pdf files much smaller than others? -
i have number of pdf textbooks, , of them upwards of 400 megabytes 1000 pages while others (which similar in quality) 10 megabytes 1500 pages!! thought might image quality, images similar in quality. next, took @ text when zoomed in, , saw larger books have rasterized text while smaller files looked had vector text. it?
if so, how can start making pdf files in vector format? possible scan document / use ocr recognize text, , somehow convert rasterized text vector format? also, can convert rasterized texts vector format?
cheers, evans
check command on sample each 2 different pdf types:
pdfimages -list -f 1 -l 10 the.pdf
(your version of pdf images should recent one, poppler variant.) gives list of images first 10 pages. lists image dimensions (width, height) in pixels, image size (in bytes) , respective compression.) if can bear it, can run:
pdfimages -list the.pdf
this gives list of all images all pages.
i bet larger 1 has more images listed.
pdfs scans vs. pdfs "born digital" ?
also run:
pdffonts -f 1 -l 10 the.pdf
and
pdffonts the.pdf
my guess this: large pdf types not list fonts. means, pages of these pdfs originate scanned papers.
the smaller ones "born digital"...
Comments
Post a Comment