John Main Logo

John Main

Code. Design. Hosting. Maintenance.

15
Apr '18

Extracting Data from a PDF Form in Laravel

One of my projects features a survey / audit system for which I was tasked to provide an offline facility by allowing the download of fillable PDF forms, and then the subsequent re-upload and processing of them.

For the form generation and download I built a wrapper around TCPDF, which I may blog about one day, but it was an extremely involved process and today is not that day!

The import and processing routine I implemented using TCPDF (specifically this PDFTK library aliased as ‘PDFTK’ in my app configuration), and ended up being nicely self-contained:

/*
* Read fields from PDF file
* @param string $path Path to PDF
* @return array
*/
public static function read($path) {
$pdf = new \PDFRead($path, ['_command' => '/usr/bin/pdftk']);
$ignore_fields = ['Validate', 'Submit', 'Reset', 'Print'];
$ignore_values = ['Select...', 'Off'];
if($data = $pdf->getDataFields()) {
$values = collect($data->__toArray())->mapWithKeys(function($data_field) use ($ignore_fields, $ignore_values) {
if(in_array($data_field['FieldName'], $ignore_fields))
return [];
if(isset($data_field['FieldValue'])) {
$value = $data_field['FieldValue'];
if($data_field['FieldType'] != 'Text' && in_array($data_field['FieldValue'], $ignore_values))
$value = '';
}
else
$value = '';
return [$data_field['FieldName'] => $value];
});
$parsed_values = [];
foreach($values as $key => $value) {
if(preg_match('/(.*)\[([0-9\_]+)\]/', $key, $matches)) {
if(!isset($parsed_values[$matches[1]]))
$parsed_values[$matches[1]] = [];
$parsed_values[$matches[1]][$matches[2]] = $value;
}
else
$parsed_values[$key] = $value;
}
return $parsed_values;
}
}
Pass any filled PDF form through that function and it will return an array in the same format as you would get from any normal form submission.

Leave a Reply