15
Apr '18
Extracting Data from a PDF Form in Laravel
One of my projects features a survey / audit system for which I was tasked to provide an offline facility by allowing the download of fillable PDF forms, and then the subsequent re-upload and processing of them.
For the form generation and download I built a wrapper around TCPDF, which I may blog about one day, but it was an extremely involved process and today is not that day!
The import and processing routine I implemented using TCPDF (specifically this PDFTK library aliased as ‘PDFTK’ in my app configuration), and ended up being nicely self-contained:
/* * Read fields from PDF file * @param string $path Path to PDF * @return array */ public static function read($path) { $pdf = new \PDFRead($path, ['_command' => '/usr/bin/pdftk']); $ignore_fields = ['Validate', 'Submit', 'Reset', 'Print']; $ignore_values = ['Select...', 'Off']; if($data = $pdf->getDataFields()) { $values = collect($data->__toArray())->mapWithKeys(function($data_field) use ($ignore_fields, $ignore_values) { if(in_array($data_field['FieldName'], $ignore_fields)) return []; if(isset($data_field['FieldValue'])) { $value = $data_field['FieldValue']; if($data_field['FieldType'] != 'Text' && in_array($data_field['FieldValue'], $ignore_values)) $value = ''; } else $value = ''; return [$data_field['FieldName'] => $value]; }); $parsed_values = []; foreach($values as $key => $value) { if(preg_match('/(.*)\[([0-9\_]+)\]/', $key, $matches)) { if(!isset($parsed_values[$matches[1]])) $parsed_values[$matches[1]] = []; $parsed_values[$matches[1]][$matches[2]] = $value; } else $parsed_values[$key] = $value; } return $parsed_values; } } Pass any filled PDF form through that function and it will return an array in the same format as you would get from any normal form submission.