Our Danish client spends upwards of a billion euros per year on various goods and services. Their procurement department is responsible for the control and management of these costs. This endeavour relies greatly on knowing in which expense category each invoice line belongs. The greater a category is, the better the negotiation position becomes in the purchasing process. Manual classification of seven million invoice lines each year is a gigantic, expensive and tedious effort.

With a mixed team of Danish and Dutch consultants, Valcon delivered a prototype solution that is able to automatically classify a category with better accuracy. In doing so, we opened a large potential for cost-saving, quality increase and improvement of negotiation position.

The prototype was built within seven weeks using Scrum, client workshops, real data, cloud infrastructure and open-source tooling. We configured a dedicated Microsoft Azure Platform that allowed the client to control the data and for us to control the code. We created a modular and scalable software application in the Python programming language. This application allowed us to train a machine learning model that could automatically identify 99% of the invoice lines of the target category.

The labelled data delivered by the client had an intrinsic bias towards known characteristics, simply because those characteristics were used to label the data. Biased data generally leads to poor model performance on unseen data. Thus, to overcome this bias, we employed proportionate random sampling of labelled and unlabelled data. After this process, we curated the unlabelled data by going through three iterations of consecutive human and machine classification.

The machine learning model was trained on this new unbiased sample which allowed us to generalise the model and determine the performance for unseen data (a.k.a. double-blind test). To guide the training in the right direction, we fed the model with historic knowledge of the suppliers, purchasers, invoice-line descriptions and of course their procurement category.

To establish confidence in the high model performance, we plotted the learning curve which depicts how the performance grows when we gradually add training examples. The shape of the curve indicated that we have a healthy balance between model flexibility and dataset size. To further convince the client that our result was solid, we demonstrated the 100 dominant features on which the model bases its decisions. These features were considered highly intuitive to the business.

We are currently exploring the next steps together with a very enthusiastic client. The prototype has proven that invoice line classification can be automated. The mission ahead of us now is to scale the solution to all procurement categories and integrate it seamlessly into the business.

Would you like to know how your organisation can benefit from using an AI solution? Reach out to us.