PDFspy

PDFspy is the ultimate “get info” utility for your PDF documents. It can extract a comprehensive list of attributes from a PDF file into an XML-based format.

New features and enhancements including:

  • Support for PDF 1.7/ISO 32000 (Acrobat 9, X, DC)
  • Element now shows CMYK separations that are actually used by text and vector elements
  • New element that shows the number of shading objects in PDF file
  • Restored output being written to stdout if -o option not used, recommend using -quiet option when writing to stdout
  • Fixed calculation of page labels
  • Improved text extraction algorithm
  • Calculates color simulation values for ICCBased, Separation and DeviceN colorspaces
  • Improved Unicode, ISO Latin and AdobePDF character set support

Some examples of the many types of information PDFspy can extract:

  • Page information (count, size, boxes)
  • Fonts usage (name, type, embedding & subset status, use of Unicode)
  • Colorspaces used (alternates, separation names, index bases)
  • Images (size, resolution, compression, colorspace)
  • Use of transparency, smooth shadings and patterns
  • Presence (or absence) of hidden text and optional content/layers
  • Hyperlinks (size, location and destination)
  • Annotations (size, location, type, contents, colors)
  • PDF/X compliance (including output intent details)
  • Metadata (info dictionary & XMP)
  • Security and Encryption settings

Example uses:

  • Asset management system: extract page count, metadata, font & image information
  • Document management: determine text or image only documents, extract comments
  • Preflight: extract information about colorspaces, compression & font types
  • Developers: easily examine the structure of complex PDF documents