15
Apr '18

Extracting Data from a PDF Form in Laravel

One of my projects features a survey / audit system for which I was tasked to provide an offline facility by allowing the download of fillable PDF forms, and then the subsequent re-upload and processing of them.

For the form generation and download I built a wrapper around TCPDF, which I may blog about one day, but it was an extremely involved process and today is not that day!

The import and processing routine I implemented using TCPDF (specifically this PDFTK library aliased as ‘PDFTK’ in my app configuration), and ended up being nicely self-contained:

/*
 * Read fields from PDF file
 * @param string $path Path to PDF
 * @return array
 */
public static function read($path) {
    $pdf = new \PDFRead($path, ['_command' => '/usr/bin/pdftk']);

    $ignore_fields = ['Validate', 'Submit', 'Reset', 'Print'];
    $ignore_values = ['Select...', 'Off'];

    if($data = $pdf->getDataFields()) {
        $values = collect($data->__toArray())->mapWithKeys(function($data_field) use ($ignore_fields, $ignore_values) {
            if(in_array($data_field['FieldName'], $ignore_fields))
                return [];

            if(isset($data_field['FieldValue'])) {
                $value = $data_field['FieldValue'];

                if($data_field['FieldType'] != 'Text' && in_array($data_field['FieldValue'], $ignore_values))
                    $value = '';
            }
            else
                $value = '';

            return [$data_field['FieldName'] => $value];
        });

        $parsed_values = [];

        foreach($values as $key => $value) {
            if(preg_match('/(.*)\[([0-9\_]+)\]/', $key, $matches)) {
                
                if(!isset($parsed_values[$matches[1]]))
                    $parsed_values[$matches[1]] = [];

                $parsed_values[$matches[1]][$matches[2]] = $value;
            }
            else
                $parsed_values[$key] = $value;
        }

        return $parsed_values;
    }
}

Pass any filled PDF form through that function and it will return an array in the same format as you would get from any normal form submission.

Leave a Reply