LOGO

Why are PDFs from Word so Large? - File Size Issues

October 8, 2015
Why are PDFs from Word so Large? - File Size Issues

File Size Discrepancies Between .docx and .pdf Documents

It's a common assumption that text-based documents saved in .docx and .pdf formats should exhibit comparable file sizes. However, this isn't consistently observed in practice.

A recent inquiry posed to the SuperUser community sheds light on the reasons behind significant variations in file sizes between these two popular document types.

Understanding the Differences

While both formats ultimately represent textual content, the methods they employ for storage and encoding differ substantially.

.docx files, based on the Office Open XML standard, often incorporate metadata, formatting instructions, and embedded resources, contributing to larger file sizes.

Conversely, .pdf files, designed for document preservation, prioritize consistent rendering across platforms. This can involve embedding fonts and flattening graphical elements, which also impacts file size.

SuperUser's Explanation

The SuperUser Q&A forum, a segment of the Stack Exchange network, provided a detailed response to the user's question.

The discussion highlighted that the complexity of the document, including images, fonts, and formatting, plays a crucial role in determining the final file size for both formats.

Furthermore, the specific PDF creation process and compression settings utilized can significantly influence the resulting file size.

Image Attribution

The boxing gloves clip-art featured in the original post was sourced from Clker.com.

In conclusion, the disparity in file sizes between .docx and .pdf documents isn't necessarily indicative of content differences, but rather a consequence of their distinct storage and encoding methodologies.

The Inquiry Regarding PDF File Size

A SuperUser user, Borek, has raised a question concerning the unexpectedly large file sizes of PDF documents created from Microsoft Word. The core of the issue centers on a significant discrepancy in file size between the original Word document and its PDF equivalent.

The Problem Explained

Borek created a basic Microsoft Word document, limited to a single sentence:

  • This is a small document.

Saving this document in both .docx and .pdf formats yielded the following results:

  • .docx file size: 12 kB
  • .pdf file size: 89 kB

This substantial difference in size is problematic, particularly when dealing with predominantly text-based documents. The user observes that documents remain relatively compact in .docx format, but experience a considerable increase in size upon conversion to PDF.

Questioning the Efficiency of PDF

Borek wonders about the inherent inefficiencies within the PDF format itself. He specifically asks if the larger file size is a consequence of Microsoft Word employing a suboptimal output algorithm during the PDF creation process.

PDF Output Settings

It's important to note that the Microsoft Office installation used by Borek is configured to prioritize the creation of the smallest possible PDF files. Despite this setting, the file size remains significantly larger than the original .docx document.

The central question remains: why do PDF files generated by Microsoft Word tend to be so large, even when optimized for minimal file size?

Understanding Large PDF File Sizes from Microsoft Word

A SuperUser community member, rene, provides insight into why PDFs created by Microsoft Word are often surprisingly large.

Examining the PDF Structure

According to rene, opening the PDF file in a text editor like Notepad++ reveals the underlying cause. The editor displays the internal structure of the PDF.

The images included within the PDF demonstrate that a specific object is referenced at the end of the file, within the /FontFile2 instruction.

Font Embedding as the Primary Factor

Microsoft Word embeds the fonts utilized in a document directly into the resulting PDF file. This ensures the document remains self-contained and displays correctly on any system, regardless of installed fonts.

rene utilized a presentation from Adobe to interpret the PDF instructions and understand this process.

Preventing Font Embedding

To reduce PDF file size, it's possible to avoid embedding fonts. This can be achieved by using one of the 14 standard typefaces that are universally available in PDF viewers.

  • Times New Roman corresponds to Times (v3), available in regular, italic, bold, and bold italic variations.
  • Courier New is equivalent to Courier, offered in regular, oblique, bold, and bold oblique styles.
  • Arial maps to Helvetica (v3), with regular, oblique, bold, and bold oblique options.
  • Symbol is represented by Symbol.
  • Wingdings is equivalent to Zapf Dingbats.

Using these standard fonts eliminates the need for embedding, resulting in smaller file sizes. (Source: Wikipedia)

Further discussion and contributions to this explanation can be found in the comments section. The complete conversation with other knowledgeable Stack Exchange users is available through the provided link.

#PDF size#Word to PDF#large PDF#reduce PDF size#PDF optimization#Microsoft Word