How We Helped a Global FMCG Brand Process 100,000+ Quality Documents with AI

Here’s how we helped the global FMCG brand process over 100,000 product labels for AI compliance without losing accuracy

When a global manufacturer of fast-moving consumer goods came to us, the problem wasn’t a lack of data. It was just too much, squeezed into the wrong shape.

Every batch of product that has left our manufacturing facilities comes with a product label, PDF ingredient lists, nutritional information, regulatory compliance data, and signature certifications. Multiplied across multiple production lines and years of operations, the number exceeds 100,000 documents. Each one needed to be reviewed, key fields extracted, and validated against regulatory standards before it could be archived or entered into final compliance systems.

Manually, this used to be done by a regulatory and compliance team that read each PDF file, copied the values into spreadsheets or recording systems, and checked for errors. It worked, but it didn’t scale. Each new product line, each audit cycle, and each new market added more categories to a queue that was already very long.

The brief from the client was straightforward: automate the extraction and validation of these product labels using Document processing with artificial intelligencewithout compromising the accuracy on which global compliance processes depend.

Why is this not a simple OCR problem

On paper, the solution seems straightforward – play PDF files through a file Artificial intelligence extraction pipelinecheck the trust level, and then approve. We’ve seen this assumption before, and it doesn’t survive contact with real manufacturing documents.

Fast-moving consumer product labeling documents are messy in common ways OCR tools It was not designed for. Some are clean printed label proofs. Many are not. Regulatory officials write corrections by hand in the margins. Text is crossed out and rewritten when wording or compliance requirements change. Stamps overlap printed text. Some documents are scans of carbon forms that are difficult to read. A pipeline that only handles clean printed text may miss a meaningful portion of that customer’s real-world documents, and in a compliance-driven industry like fast-moving consumer goods, the “meaningful portion” is not a number you can ignore.

So, before writing a line of extraction logic, we spent time with the client’s regulatory team understanding what these labels actually look like in practice, not just what they look like in the best case.

AI Document Trust Score Trap

The first version of any Documenting the AI pipeline It tends to use a single confidence score for each document: if, for example, the model is 90% confident overall, approve it; If not, send it to a human.

This approach failed us early on, and it’s worth explaining why. A document can have an overall confidence of 95% when one specific field, for example, a regulatory authorization number or an important allergen warning, is wrong. The overall score is the average across each field on the page. A few easy, clear fields can raise the average and hide the one field that is difficult to read and the field that is actually important for the certificate.

We moved to Registering trust at the field level instead of. Each individual field, not the document as a whole, gets its own confidence score. Only fields that fall below the specified threshold are routed to a human reviewer. Everything else is approved automatically.

This single change had a huge impact. This means that reviewers did not open entire documents to double-check fields that the AI had already extracted correctly. They were looking at two or three fields, out of twenty or so fields, that the system wasn’t really sure about. This is the difference between a reviewer re-reading an entire document and a reviewer spending fifteen seconds on a flagged field.

Handling handwriting, stamps and crossing out

Product label proofs in fast-moving consumer goods environments are working documents, not originals. The officers cross out the initial value and write the corrected value next to it. Stamps are applied for “Approval” or “Retest” and partially obscure the field below it. The accuracy of scans varies widely depending on the facility that produced them.

We have built specific processing into each of these patterns rather than treating them as OCR noise to average them out. One detail surprised us during testing: in several cases, the AI correctly identified the crossed-out value and chose not to extract it, treating it as canceled, which is the correct behavior, but means the validation workflow needs a clear path to “correction made here” rather than treating every cross-out as an extraction failure. We added a separate review flag specifically for documents with visible corrections, so reviewers can quickly confirm that the AI captured the corrected value instead of the original value.

Keeping humans in the loop, intentionally

None of this was about removing the regulatory team from the process. It was about changing what they spent their time on.

Before the pipeline went live, each workflow was validated by the client’s domain experts against the existing manual process. We tested the AI output against documents that had already been manually reviewed, comparing field by field, until extraction accuracy was maintained consistent across different label types, facilities, and scan quality. This verification phase took real time, and we treated it as a non-negotiable – a line that looks accurate in a demo and a line that is fine enough to trust in a global compliance certification are not the same thing.

Once live, the reviewer’s role changed. Instead of reading each document from start to finish, reviewers worked from a queue of flagged fields, fields that the AI wasn’t really confident about, as well as documents that contained visible corrections or unusual formatting. The skilled people on the team were doing the same kind of judgment work they had always done, just applied to a much smaller and more relevant set of cases.

The result: an 80% reduction in manual review effort

more 100,000 documents It went through the pipeline.
extraction Accuracy was higher than 90% at the field level,validated against the client’s manual criteria.
Manual review effort is reduced by approximately 80%since reviewers were only looking at flagged fields and flagged documents rather than at each page.
Processing time, which had been a constant bottleneck for the compliance team, was significantly reduced.
The backlog that was growing with each new audit cycle has stopped growing.
The most important number to the client was not a percentage.
Their quality team, the people who had spent years building the judgment to know when a value didn’t seem right, were finally spending that judgment on the documents you actually needed.

What would we tell anyone considering AI document processing

If there is one lesson from this project worth repeating, it is this Documenting AI in a regulated or quality-sensitive industry It is not a problem to choose the model. It’s a workflow design problem. The extraction model is part of a system that also needs to record trust at the field level, a clear escalation path for ambiguous cases like handwriting and corrections, and a verification phase that is rigorous enough that the people who rely on it actually trust it.

This combination is what turns “we’ve run AI on our documents” into a process that your compliance team will actually support.

If you’re struggling with a backlog of documents, quality reports, compliance logs, lab results, or anything similar, and manual review has become the bottleneck, we’d love to talk about what a workflow like this might look like for your data.