Dienstag, 30. Mai 2017

Ideas and problems with yellow pin notes on PDF documents

The idea is quite simple: pin this yellow little notes on digital PDF document in an easy-to-use way. You can do it with Adobe Reader but there are some problems:

  • It is not easy to use - e.g. you will have to use "save as" and cannot overwrite the existing document.
  • Adobe Reader is not available on every system
  • There is no way to intergrate the Reader functions  in the System Concept DMS product.

The System Concept DMS software is able to place notes in an easy to use way for about 2 years. But there was no way to remove or edit the notes yet.

The SC DMS features uses Apache PDFBox and draws a note in 3 steps:

  1. yellow box (addRect + fill)
  2. text (beginText + shotText + endText)
  3. border (addrect + stroke)

The user interface provides a two-step assistent to enter the text and choose a position for the note.

PDF document with a nice yellow note pinned


Make it removable

There was customer feedback that it would be great if notes are at least removable. This is not a simple task since a note consists of a number of drawing operations which are not connected in any way within the PDF.
I found a solution for that and use PDF comments (lines beginning with '%') to identify content streams which contain removable objects like notes.

So far so good. It turned out that content streams are put together by a certain page function of PDFBox. This resulted in an empty page if the user removed a note.
The reason was that the note META comment was still in the page but all content has been put into one single stream.

Use annotations

I tried to rewrite the note feature and make use of PDF annotations. Doing some reverse engineering I found out that Adobe Reader produces annotations.

Apache PDFBox is able to manage annotations, too:


PDPage page = doc.getPage(0);
   
List annotations = page.getAnnotations();  
 
PDAnnotationMarkup freeTextMark = new PDAnnotationMarkup();
freeTextMark.setAnnotationName("SCDMS:Note:Peter Pinnau");

freeTextMark.getCOSObject().setName(COSName.SUBTYPE,
   PDAnnotationMarkup.SUB_TYPE_FREETEXT);

freeTextMark.setCreationDate(Calendar.getInstance());
freeTextMark.setAnnotationFlags(4);
   
// Yellow color for background
PDColor yellow = new PDColor(new float[] { 1, 1, 0 }, PDDeviceRGB.INSTANCE);
freeTextMark.setColor(yellow);
  
// Position for the annotation
PDRectangle position = new PDRectangle(); 
   
position.setLowerLeftX(100);
position.setLowerLeftY(200);
position.setUpperRightX(400);
position.setUpperRightY(500);
freeTextMark.setRectangle(position);
   
// set som data
freeTextMark.setTitlePopup("Peter Pinnau");
freeTextMark.setContents("This is the text\nENTER1\nENTER2");
freeTextMark.setPrinted(true);
freeTextMark.setInvisible(false);
   
// Color blaxk, "Helv" font, 11 point
freeTextMark.getCOSObject().setString(COSName.DA, "0 0 0 rg /Helv 11 Tf");
   
// Add the annoation   
annotations.add(freeTextMark);  
  
// Save the document
doc.save(new File("..."));


The above code places a nice multi-line yellow note in the PDF. It is visible and editable in Adobe Reader. It is visible in the PDF viewer shipped with Ubuntu.
But it is NOT visible in Mozillas PDF.JS viewer. Unfortunately SCDMS uses PDF.JS to view PDF documents.

I found out that Apache PDFBox and PDF.JS do not implement a so called default appearance for annotations. Since the annotation has no apperance it is not visible.

Adobe Reader creates a default appearence and displays the annotation correctly. If the PDF is saved ones from Adobe Reader the annotations also become visible in PDF.JS.

There are two open issues concerning that:

PDFJS:
https://github.com/mozilla/pdf.js/issues/6810

PDFBox:
https://issues.apache.org/jira/browse/PDFBOX-2019


The best way to solve this problem concerning SCDMS of course will be to add a correct appearance stream when generating the annotation.
Unfortunately this goes deep into PDF stuff so I hope that PDFBOX-2019 will be solved in the nearer future.

For now I switched back to the old implementation and found another way to do the above mentioned pages operations so that the empty-page-problem could be solved in this particular case.

The content stream merging is done by (page is a page with content from a present document):

PDDocument.importPage(PDPage page)

I now use:

PDDocument.addPage(PDPage page)

and content streams are not put together anymore.