Research Internship at SAP Machine Learning

From machine learning research to real-world systems handling over € 200 Million Euro in annual sales pipeline.

Every Computer Science undergraduate at NTU has to do a semester-long Professional Internship Attachement as part of their degree requirement. In January 2018, I had the pleasure to do my internship at the Cash Application team at SAP Leonardo Machine Learning, Singapore as a Research Intern.

I feel very lucky to have gotten the chance to build unique software products that only SAP has the capabilities to build, while working with a very talented and brilliant team. I learnt about how Machine Learning and Artificial Intelligence is taken out of the research lab and put into production. I was also able to genuinely connect with my teammates, in both a professional and a personal sense. At the end of the internship, my direct work and contributions lead to the delivery of a major feature in our product and resulted in three patent applications. I was also awarded the Professional Internship Book Prize by NTU for the best performance among all interns from the Computer Science department. I could not have dreamt of a better internship!


About SAP and Cash Application

Started in Germany in 1972, SAP (Systems Applications and Products) is one of the world’s biggest software companies today. It is the industry leader in enterprise software for managing business operations and customer relations. The biggest organizations, governments and companies in the world run on SAP’s Enterprise Resource Planning (ERP) systems.

SAP’s Machine Learning platform, called SAP Leonardo, offers SAP ERP users advanced analytics services based on the latest technologies such as Machine Learning, Blockchain, IoT, and other buzzwords. The Cash Application team’s work revolved around building automated machine learning solutions in the fields of accounting and finance within SAP Leonardo. Today, CashApp is SAP’s flagship ML product and handles over € 200 Million in annual sales pipeline for global companies.

The SAP Leonardo product portfolio
The SAP Leonardo product portfolio

The main ‘use-case’ being developed was the automation of generic low-level accounting departments. In every company, the accountants’ main job is to receive bank statements of payments made by their customers, and match these to open invoices of the orders placed by these customers. SAP had been building the software interface for accountants to do this repetitive task and had the trust of its customers to share their financial records and data with it. Hence, they were in a unique position where they could now build tools to automate this process and free up time for accountants or help downsize workforces.

No other company in the world had both the financial data as well as the engineering skills required to build products such as CashApp. I felt very passionate to work on this problem as soon as I realized how unique of an opportunity this was!


Onboarding into the team

On my first day, I was welcomed by my to-be supervisor, Sean Saito, who had been working as an ML Developer at SAP after finishing his Bachelor’s degree a year ago. I shook hands with all my teammates for the first time and was introduced to the big Directors and Managers at the office. (I later learnt that Sean was the youngest person in the office and the only Bachelor hire doing ML at SAP at that time. In the whole world. Ever.)

All teams at SAP Leonardo were designed to be self-sufficient in terms of both development and sales. Mini startups of sorts, who share knowledge and office space. Over the next few days, I was onboarded by each sub-team within CashApp: The Machine Learning team, the Back-end team, the Front-end team, and the Business Development team.

I remember feeling excited to start working, and slightly out of depth at the same time. I had studied some of the concepts that were part of CashApp’s system architecture at NTU but was not confident about my knowledge. I was also extremely intimidated by how large and complicated the production codebase seemed!

At the end of my first week, Sean helped me narrow down a list of topics I could tackle at my internship: (1) How to optimize/speed up the current bank statement to invoice matching pipeline, and (2) How to convert this pipeline to a fully deep learning-based approach. (The current approach used non-deep learning methods.)


Pipeline optimizations and speed

I implemented new techniques to handle character string comparisons and operations, which were the main bottleneck for the CashApp pipeline. Extending Python libraries with code in (considerably faster) C, I was able to speed up our machine learning models by over 40 times without any decrease in performance. This helped me get familiar with the entire codebase as I was required to constantly benchmarks the pipeline’s speed and performance metrics.

I learnt a lot about low level programming optimization and the importance of good engineering, even for the smallest parts of a large system. This work was completed in the first month of my internship.


Transition to a Deep Learning pipeline

Working on a deep learning solution to the bank statement to invoice matching problem was my longest project: the Machine Learning team started working on it from my second month at CashApp, and drew it to a conclusion on the final day of my internship by submitting three patent applications related to it.

In general, we were building neural networks for understanding and reasoning about extremely structured, tabular financial data without resorting to feature engineering. Such data is typically handled by more traditional machine learning models such as decision trees. Hence, there was not much academic research that we could leverage for our models. Drawing heavily from research in natural language processing and person re-identification/facial recognition, my teammates and I brainstormed many complicated ideas for our use case.

The CashApp ML Pipeline
The CashApp ML Pipeline

Over the course of three-and-a-half months, we were able to take the most basic version of our idea from the whiteboards and integrate it into the full product released to customers. We started by dividing the project into three sub-parts (I was leading the development of one of these) which we treated as individual projects. Once the three were at a stage where we felt confident in the components, we integrated them together to build a completely new pipeline for the problem.

Through a combination of rigorous tests and benchmarks for each component, we ensured that the new system was performing better than our previous approach in terms of speed and performance. We also tested its ability to scale up to millions of data points that CashApp was currently handling and were pleasantly surprised by how efficiently neural networks allowed us to do so.

Once the Team Manager and Director were convinced by our new benchmarks, we decided to go ahead with implementation in a production environment. In the final two weeks of my internship, I finally got to write production code for my component and integrate it into the new version of the system, scheduled to release to customers in less than a month.


The deep learning pipeline was the coolest feature of this new release: we were able to improve system accuracy by 8% and reduced computation time by 99% compared to the previous approach. Since our ideas were rather novel, we were also encouraged to write three patent applications, one each for the three parts of the pipeline.

Through the course of formulating and implementing the new solution, I learnt all about the ins and outs of taking machine learning research into production. I felt very proud of my work and that I was able to make a meaningful contribution to a real software product which is relied upon by thousands of companies around the world.


Genuine Connections

Technical contributions aside, I would like to make some personal reflections. I felt very lucky that I was part of a great team: people were bonded by a passion for building a great product and the desire and skill to work towards it. Being part of the team, staying late, figuring things out together; all of it felt like I was integrated into something substantial and much bigger than myself.

I was genuinely excited to go to work every Monday morning!


Trust in My Work

I wanted to perform as best as I could during the internship, but I never expected myself to do anything worth writing patents about. I cannot help but feel overwhelmed, satisfied and extremely lucky to have gelled so well with my teammates and my work. It was a big deal for me that my team put as much trust as they did in my work and in myself.

I’m very thankful to the CashApp team, the leadership at SAP and my university for giving me this opportunity. I am also grateful to Sean for being the best supervisor, mentor and friend I could hope for throughout my internship. :)

CashApp team!
CashApp team!

Related