pdf generation - What makes some pdf files much smaller than others? -


i have number of pdf textbooks, , of them upwards of 400 megabytes 1000 pages while others (which similar in quality) 10 megabytes 1500 pages!! thought might image quality, images similar in quality. next, took @ text when zoomed in, , saw larger books have rasterized text while smaller files looked had vector text. it?

if so, how can start making pdf files in vector format? possible scan document / use ocr recognize text, , somehow convert rasterized text vector format? also, can convert rasterized texts vector format?

cheers, evans

check command on sample each 2 different pdf types:

 pdfimages -list -f 1 -l 10 the.pdf 

(your version of pdf images should recent one, poppler variant.) gives list of images first 10 pages. lists image dimensions (width, height) in pixels, image size (in bytes) , respective compression.) if can bear it, can run:

 pdfimages -list the.pdf 

this gives list of all images all pages.

i bet larger 1 has more images listed.

pdfs scans vs. pdfs "born digital" ?

also run:

 pdffonts -f 1 -l 10 the.pdf 

and

 pdffonts the.pdf 

my guess this: large pdf types not list fonts. means, pages of these pdfs originate scanned papers.

the smaller ones "born digital"...


Comments

Popular posts from this blog

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

c++ - OpenMP unpredictable overhead -

javascript - Wordpress slider, not displayed 100% width -