Customer Matching with Oracle AI Vector Search

Customer matching, i.e., the ability to identify the same person across multiple systems even though there are no unique identification features available across systems, is a challenging and non-trivial task. Depending on how the following factors are manifested in the respective systems, it can be very complex and only partially successful:

Poorly designed applications and lack of consistency in data models
Changing customer information (change of address, new phone number, name change, etc.)
Poor data quality (typos, incorrect or incomplete information)

Customer matching is a central component of every company, as it helps to achieve the following goals, among others:

Creation of a 360° view of our customers
Improvement of data quality by eliminating duplicates and correcting customer information
Compliance with regulatory requirements
Fraud prevention

Challenge

To implement customer matching, complex transformation algorithms and processes are usually developed based on existing customer data such as first name, last name, telephone number or residential address. These attempt to compensate for human error and data quality issues. However, such rule-based approaches often lead to many false positives if the transformation rules are defined too generously or to unrecognized matches if they are designed too restrictively to avoid false positives.

How could AI solve this problem much more easily? The various systems involved often contain copies of ID cards, passports or other identification documents — possibly in different forms, but belonging to the same person. With Oracle AI Vector Search, numerical representations (vectors) can be generated from the text and images in these documents. The distance between the vectors is then calculated, with only those that are very close to each other being marked as potential matches.

Approach and customer benefits

To Starting with Oracle Database version 23.4, an ONNX (Open Neural Network eXchange) engine is available directly within the database. This engine makes it possible to generate vector embeddings—i.e., numerical representations of ID documents, for example—directly in the database. Since we work with sensitive customer data, this is a particularly interesting feature: the data never leaves the database, ensuring the security and confidentiality of this information.

In our case, we chose OpenAI's CLIP multimodal embedding model because it allows separate embedding pipelines to be used for both text and images. The model is imported into the database with a simple call to the DBMS_VECTOR.LOAD_ONNX_MODEL procedure. All ID photos from the source and target systems are then transformed into vectors using this model.

Delivery sumIT

sumIT supports you in selecting the right LLM for your AI use case, implementing chunking and embedding of your data, and providing a user-friendly interface or application for similarity searches.

In the case of customer matching, we have given users the option of performing customer searches either based on specific text (e.g., first and last name) or using another image (e.g., a copy of an ID card from the source system). The search is visual and performed at the touch of a button, allowing potential matches to be identified quickly and intuitively.

Tools and technologie

Oracle Database 23 oder 26ai
Oracle Autonomous Database
Oracle AI Vector Search
Mini-LLMs imported into the database or REST API call for embedding data with larger LLMs
Oracle Apex