PDF Research Adobe Acrobat Uses...
Transfer Files
Proofing
P/S Utility
Presentations
WWW
Archiving
Forms
EDMS

World Wide Web Applications...

  1. Selecting Graphics for Processing
  2. Selecting Text for Processing
  3. Link HTML files to specific pages of a PDF file


HTML page creation and editing is a huge industry right now, and industry that is so profuse and dynamic that it is extremely hard to keep up with all of the format changes and opportunities.

Aside from PDF use for publication of documents on the web, Acrobat is actually a good tool for aiding the HTML editor in grabbing graphics and text from already published materials.

Some files, like GIS files, defy clean conversion to publication as bitmap files. By having the authors save them as Postscript, and then distilling them to PDF, designers can then select and copy the graphics with very clean results.

In another very common example, suppose you have a Quark document that has compositions of text, vector, and bitmap graphics that are all overlapping. Screen captures don't do justice because Quark imported graphics on-screen are very pixelated. You could create EPS files, open and resave them from Illustrator, import them into Photoshop and then resave them as .jpg's but that is alot of steps for a multipage document.

A technique we use is to convert the entire document to PDF and then "harvest" the graphics using the technique listed below. The resulting compositions have a scalable resolution.

Required Components

  • Minimum system requirements
  • Acrobat Distiller® and Acrobat/Exchange®.
  • Image editor like Adobe Photoshop® for manipulating selected graphics.

1. Selecting Graphics for Processing

When you consider some of the characteristics of a PDF file, you can begin to see some additional uses--particularly in the area of file format conversion.

  1. PDF is a universal format. You can make PDFs not only from source programs like Illustrator and Word, but also from layout and composition programs like PageMaker, Quark, and FrameMaker. And the format is cross platform which means you do not need to limit application procedures to only one platform.
  2. PDF retains vector resolution integrity. You can view a vector at up to 800% of size and you get a perfect rendition of that text or vector art.
  3. There are two tools in Acrobat/Exchange that allow you to select and copy elements from a PDF file that can be used elsewhere--Select Text and Select Graphics.

One of my favorite techniques for using PDF files is to canniballize distilled Quark compositions of text, vectors, and bitmaps for web page creation. It is, of course, universally useful for anything you create as a PDF.

  1. Open your PDF document in Acrobat/Exchange.
  2. Zoom in on the area you wish to copy.* This dictates the resolution of your Photoshop bitmap.
  3. Choose "Select Graphics" from the Tools menu.
  4. Marquee your graphic composition (or select all).
  5. Go to Photoshop and File/New. This creates a blank bitmap image area the size of your selected graphic.
  6. Paste the selected graphic into the new window.
  7. In some versions of Photoshop, this creates a new layer. If so, you need to "flatten" the resulting image (in the Layers Palette) before saving in any format other than Photoshop.
  8. You can then edit and resize to your heart's content and save to any format for web pages or other uses.

*One of the great things about this technique is that it is totally scalable. With 800% zoom-in capabiltity, you can create very large blowups of small vector compositions for transferring to Photoshop.

2. Selecting Text for Processing

The good news about PDF conversion of text documents is that it retains the size and style of the text, the layout position of the text, and the kerning position of letters.

Unfortunately, these strong points are also its weak points, depending upon your use of the documents:

  1. Regarding style of text, if the original has text that has been bolded, distilling will sometimes produce a manufactured bold version of that copy (that is, plain text that is plotted 4 times around an original to effect a bold look). Also, text that has been "Shadowed" in style treatment will sometimes distill with two lines of plain text to effect the shadow look.
  2. The layout position of text means that each horizontal line of text is treated as a separate layout element. Text that was word processed as a paragraph with word wrapping, now has hard returns at the end of each line.
  3. The kerning postion of letters can affect how text is laid out in the PDF file. For instance, simple kerning or tracking instructions within a line of text in Quark will distill into a PDF with each letter separated onto its own baseline, making it virtually uneditable within Acrobat/Exchange (or Illustrator for that matter).

Still, the Select Text (under menu item Tools in Acrobat/Exchange) has some uses and there are also a number of plugins now available to help the user "harvest" text from a PDF file.

What You Do..

  1. Open your PDF document in Acrobat/Exchange.
  2. Select the "Select Text" menu item under Tools.
  3. Clicking and dragging over text will select it and it can be copied. TIP: If there is a vertical selection of text that you want, like a column in a table for instance, hold down the Option-key (Macintosh) or Control-key (Windows) while click-dragging.
  4. Open your text editor and paste.

New with version 4.0 (Windows only)

The Table Selection tool has a preference checkbox (File/Preferences/Table/Formatted Text.../General) for specifying "Preserve Line Breaks." By unchecking this box, line breaks are NOT preserved. This makes it easier to copy text into paragraphs. For this feature to work, you MUST use the Table/Formatted Text Selection tool to marquee the text block.

For other "harvesting" approaches for dealing with text in a PDF file, check out some of the "plug-ins" currently available in our Third Party section (notably Redwing from BCL Computing).

3. Linking HTML files to specific pages of a PDF file

New with Acrobat 4:

If your URL says

http://...foo.pdf#page=17

Acrobat 4.0 will open the document on page 17. This is new in Acrobat 4.0. Previous versions of Acrobat will open the document on page 1 (or whatever the default open action is). If the document is *already* open in Acrobat when such a URL link is used, the document gains focus but does not change pages. This was a late addition and missed the documentation cutoff.

Pre-4.x versions of Acrobat:

From Gordon Kent's website on Internet Publishing with Acrobat...

Phil Smith of the University of Nottingham has a script for Acrobat Distiller 3.0 which enables HTML links to open a specific page of a PDF file. The link syntax would be something like:

http://www.mysite.com/test.pdf#Page23

Note: The script, which must be copied to Distiller's startup directory, works only with Acrobat Distiller 3.0. (So you may need to re-distill many files to get this functionality).


Return to PDF Research Companion home page.
a production of Performance Graphics
©1998 The Miller De Wulf Corporation