Here is a sample Java program to find all instances of social security numbers in a PDF document using a regex expression. Once the numbers are identified, they are removed from the PDF content and the area blacked out through a process called permanent redaction. The SSN are then covered with redaction annotations and removed when the redaction annotations are applied or “burnt”, leaving just a black rectangle where the SSN used to be. This sample code uses Qoppa’s PDF Redaction SDK API jPDFProcess.

Note: Make sure you use the regular expression corresponding to the format of the social security numbers present in your documents. In the sample code below, we are matching the following pattern: “123-12-1234”.

 
// Open the PDF document
PDFDocument pdfDoc = new PDFDocument("input.pdf", null);
 
// Regular expression to check valid SSN
String redactSSN = "^(?!666|000|9\\d{2})\\d{3}-(?!00)\\d{2}-(?!0{4})\\d{4}$";
 
// per page: search text, create redaction annotations, then apply
for (int i = 0; i < pdfDoc.getPageCount(); i++)
{
PDFPage pdfPage = pdfDoc.getPage(i);
 
// Search for the text
List<TextPosition> searchResults = pdfPage.findTextUsingRegex(redactSSN);
 
//create redaction annotations
for (TextPosition textPos : searchResults)
{
Redaction redact = pdfDoc.getAnnotationFactory().createRedaction("Redaction sample",           textPos.getPDFQuadrilaterals());
pdfPage.addAnnotation(redact);
}
 
//apply ("burn-in") all redaction annotations on the page
pdfPage.applyRedactionAnnotations();
}
// save the redacted PDF document
pdfDoc.saveDocument("output.pdf");

Download Full Java Sample Search & Redact SSN

Note this sample using jPDFProcess v2021R1 to be released in August 2021. For previous version, look at method findTextWithContextUsingRegEx instead. Contact us with any question.