This Java sample shows how to search a text label within a PDF document and then remove the text following that label. This is done by adding a redaction annotation and burning it which remove any content below the redaction annotation. This sample is using Qoppa’s PDF library jPDProcess
In the sample code below, we’re searching for the label “Phone Number” to redact all phone number contained in a PDF document. The same process could be applied to redact Social Security Numbers (SSN), Patient Record Numbers or other confidential information.
// Load the document PDFDocument pdfDoc = new PDFDocument ("C:\\myfolder\\input.pdf", null); // this is my search label that comes before the text to be redacted String searchLabel = "Phone Number"; // Loop through all pages in the document boolean foundLabel = false; for (int pageix = 0; pageix < pdfDoc.getPageCount(); ++pageix) { // Search for the label text Vector labelInstances = pdfDoc.getPage(pageix).findText(searchLabel, false, false); // Add annotations after the instances of the label if (labelInstances != null && labelInstances.size() > 0) { foundLabel = true; for (TextPosition tp : labelInstances) { Rectangle2D labelBounds = tp.getEnclosingShape().getBounds2D(); Rectangle2D.Double eraseBounds = new Rectangle2D.Double(labelBounds.getX() + labelBounds.getWidth() + 1, labelBounds.getY() - 2, 2 * 72, labelBounds.getHeight() + 4); Redaction redact = pdfDoc.getAnnotationFactory().createRedaction("Redaction"); redact.setRectangle(eraseBounds); redact.setInternalColor(Color.black); pdfDoc.getPage(pageix).addAnnotation(redact); } } } // output whether the search label was found or not System.out.println("Search Label found " + foundLabel); // save doc with redaction pdfDoc.saveDocument ("C:\\myfolder\\output_redact.pdf"); |
Download Full Java Sample SearchAndRedact.java
This is a screenshot of the output PDF: