Home » Artificial Intelligence » Even in the Era of Big Data, Document Accuracy Matters

Even in the Era of Big Data, Document Accuracy Matters

October 26, 2012


“Documentation accuracy isn’t exactly the most goose bump-inducing topic,” writes Eric Johnson. “But it may well be one of the more critical aspects of a shipment.” [“Document Drudgery,” American Shipper, August 2012] Johnson notes that “incorrect bills of lading or other documentation can lead to over- or under-billing, customs holds, surcharges, and payment delays. Any of those can force a shipper to have to increase inventory to account for stock that’s held up.” Obviously, none of those are outcomes that a business wants to experience.


In this age of electronic documents, one would imagine that technology could help organizations ensure that their documents are more accurate — and one would be correct. We’ve all filled out online registration forms, for example, that only allow you to input the number of characters that a particular format requires (like a telephone number, Zip Code, serial number, or product key). Such aids, however, don’t always guarantee that the correct information will entered even if it’s in the correct format. But as Johnson notes, “Getting the cargo data entered correctly the first time is the surest way for the shipper, freight forwarder and carrier to cut costs. Liner carriers are increasingly looking at things like documentation accuracy as ways to reduce unneeded expense in an environment where rates are barely covering the costs of operations.”


Johnson discusses the efforts of MOL (America) to increase its document accuracy rate from 98.5 percent to 99.5 percent. The fact that a company would make a concerted extra effort to gain a slight improvement in a metric that already enjoys measurable success says a lot about the importance MOL places on document accuracy. Does your company come anywhere near that standard? Johnson writes, “The errors MOL encounters typically come down to something as simple as typos during the data-entry process, whether it’s from the shipper, forwarder, or carrier side.” Johnson notes that MOL isn’t the only company that works hard to ensure document accuracy. He writes:

“Another liner carrier, OOCL, said typical errors revolve around cargo weight, piece count, consignee or notify party addresses, ocean freight payment party (i.e., prepaid or collect), and which party is responsible for the charges. ‘Close to 50 percent of our bills of lading have some change or amendment due to either shipping instruction error or changes made by the shipper,’ OOCL spokesman Frankie Lau said. ‘But OOCL bill of lading accuracy is 99.9 percent as an end product released to the customer.’ Both MOL and OOCL tackle the issue of accuracy through data audits.”

Stephen Ryan, vice president of customer service in North America for MOL, told Johnson, “Shipping instructions come in lot of different formats, so we do a lot of cutting and pasting so there’s no deviation from what shippers have entered. There are some systems limitations, so if we have to retype some things, there will be some typing errors. It could be something as simple as spacing.” Undoubtedly, technologies will be developed in the future that will automatically cut and paste data from one document into another to avoid typing errors; but, even that kind of technology cannot guarantee that the copied information was entered correctly in the first place.


Johnson’s article focuses on the maritime shipping industry; but, it is not the only transportation sector that has documentation challenges. In another American Shipper article, Eric Kulisch and Chris Gillis note, “The average [airfreight] shipment generates up to 30 different paper documents, according to industry experts.” [“E-freight’s slow assent,” 8 April 2011] Electronic exchange of those documents would dramatically improve accuracy, but they report, “the air cargo industry has been slow to move towards automated documentation exchanges between air carriers, customers, ground-handling agents, truckers and customs authorities.” When documents are exchanged electronically, the benefits are significant. They explain:

“When documents arrive ahead of the cargo, customs clearance and airline processing time can be cut by an average of 24 hours. Delays associated with lost documents are eliminated and accuracy is improved because there is no need to rekey data into various information technology systems along the way.”

Nathan Tableman, Vice President of Technology for UBM Global Trade, claims that “the primary source of this divergent data are Bills of Lading (BOL), the formal documents that contain the routing, parties involved and contents of all maritime shipments that enter and exit different nations.” [“Is Your Supply Chain Intelligent Enough?Supply & Demand Chain Executive, 4 September 2012] But he goes on to point out that divergent BOL data is only one of the challenges that organizations face when it comes to dealing with documents. He continues:

“With these divergent data sets, a variety of national languages and variations in things as simple as the spelling of a port or name of a supplier, combining this information to create intelligence can be very time consuming and costly. Raw data is like any raw material in that the quality varies over time. This also presents a large scale challenge: ‘How do you clean up the data to a level that is perfect without having to read through every one of millions of data points a day?’ To conquer this challenge requires innovative ways to clean, structure and map data points. Without such technology—and at times manual input—data would be unusable.”

I agree completely with Tableman’s point that without technology cleaning big data to make it useable would be impossible. Even the best technology, however, can’t provide perfect results. H. S. Baird, T. M. Breuel, K. Popat, P. Sarkar; and D. P. Lopresti, researchers at Xerox’s PARC, state, “No existing document-image understanding technology, whether experimental or commercially available, can guarantee high accuracy across the full range of documents.” [“Assuring high-accuracy document understanding: retargeting, scaling up, and adapting,” Symposium on Document Image Understanding Technology (SDIUT ’03); 9-11 April 2003] They go on to describe how their research uses artificial intelligence to train systems to recognize inputs so that accuracy can be increased. They write:

“Research at PARC has focused for more than ten years on relieving this critical bottleneck to automatic analysis of the contents of paper-based documents, FAXes, etc. PARC has made significant progress — documented in dozens of publications and patents, and embodied in experimental software tools — towards this goal: we possess ‘document image decoding’ (DID) technology that achieves high accuracy on images of documents printed in a potentially wide variety of writing systems and typefaces, unusual page layout styles, and severely degraded image quality. Our principal method of attack has been ‘retargeting’: that is, our technology is designed to be trainable, i.e., customized to the characteristics of individual documents or sets of similar documents. In recent years we have reduced the effort of manual DID training significantly. … We propose ‘scaling up’ the DID methodology by massively parallel recognition using ensembles of automatically pre-trained DID decoders: this promises to reduce further the need for document-specific training. We have also made recent progress towards ‘adapting’, in which recognizers, without any manual training, adjust their models to fit the document at hand: this offers hope that manual training can someday be reduced to zero.”

Clearly, systems that use artificial intelligence will play a much larger role in helping ensure document accuracy in the years ahead. Efforts like those going on at PARC will go a long way towards addressing the many challenges identified by Johnson, Kulisch, Gillis, and Tableman. Another thing that would help to reduce errors, even without expensive artificial intelligence systems, would be standardization. Johnson reports:

“Brian Conrad, executive administrator of the Westbound Transpacific Stabilization Agreement, said his organization is working with its shipper advisory council to explore possible documentation and EDI standards among various ocean carrier booking portal providers, to help reduce booking and documentation errors. He said 70 percent of WTSA carrier documents (which cover U.S. export shipments to Asia) are still being produced manually.”

In the electronic age, generating 70 percent of shipping documents manually sounds unimaginably high. Conrad contributes this fact to “a lack of standardization.” Johnson reports that “trade software developer GT Nexus told American Shipper that data entry is the most common root cause of errors, but that missing data from smaller ports and facilities is also significant. For data flowing through its system, the errors tend to come from shipper partners, not from the shippers themselves.” Charles Babbage, the inventor of the first programmable computing device design, once wrote, “On two occasions I have been asked, ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.” If supply chain professionals want to get the right answers from their automated logistics systems, getting the right data into the system is the most important thing they can do.

Related Posts:

Full Logo


One of our team members will reach out shortly and we will help make your business brilliant!