FINDING DUPLICATE MASTER DATA

The Approach

The pivoting and the reverse pivoting features of SAP MDM Import manager are useful to reorganize records. They allow us to view data both in terms of the traditional ‘values in columns’ and its converse: ‘column names associated with values’.

Applying pivot operations to data and then subsequently its reverse, while preserving key information for each record can lead to interesting results. Essentially, these dual operations constitute different ways to view the original record. We will see those shortly.

3 clip_image002

The Scenario

Let’s start at a point where all the data has already been loaded into MDM and the process for finding duplicates can commence. The repository in question is a vendor database.

First, we need to finalize the combination of fields and appropriate weight to each combination that can be used to search duplicates.

Fields for comparison

This document takes the same approach as the SAP ‘How-to’ document for this purpose. Additionally, we will create a lookup table (‘Duplicate Key Sequence’) which can store the combination of these fields which can be used to find duplicates. The table contents are displayed. This will help us view data in a more meaningful manner.

Fig 1.1 – Weights and Candidate Search Combinations

The Upload File

A file needs to be created with the following fields

1. A unique key for each of the records, say an Auto id field, ‘Vendor id’.

2. All fields in the fig 1.1

These fields can be syndicated out of MDM in the form of a file. As we will need to manipulate the file contents and create pivot tables, XML or Microsoft Access files are preferable.

Xml Schema / MS Access Design

Here’s the graphical view of the XML schema and the design of the MS Access alternative.

clip_image004clip_image006clip_image006[1]clip_image006[2]clip_image006[3]clip_image006[4]clip_image006[5]clip_image006[6]clip_image006[7]clip_image006[8]clip_image006[9]clip_image006[10]clip_image006[11]clip_image006[12]clip_image006[13]clip_image006[14]

Field

Weight

Vat Registration Number

34

Street 1

15

Name 1

30

Postal Code

21

Fig 1.2 – The Source Schema

Either file format will do. Let’s take the XML alternative further.

clip_image008clip_image010

The Target Repository

We’ll add a new multi-valued qualified field to the vendor repository which will help us view duplicates in a meaningful manner.

The new field added is ‘Duplicate Criteria’ as shown below.

Fig 1.3 – Introducing Field DuplicateCriteria

The new qualified lookup table ‘DuplicateCriteria’ is structured as follows.

Fig 1.4 – Storing Duplicate Citations

The ‘Name’ field stores the candidate field combination for which the ‘Criteria’ and ‘Score’ are applicable.

The ‘Name’ field links to the lookup table ‘Duplicate Key Sequence’ previously introduced. The criteria field will store the values for which the duplicate citation is applicable. The ‘Score’ field is filled by the weight information laid out previously. This is a way we can store multiple citations of duplicates.

The Importing process – Pre Mapping preparation

Here’s how the XML file looks when imported by MDM.

Fig 1.5 – Import File Preview

clip_image012clip_image014clip_image016

Let’s take the case of finding duplicates on the candidate combination Vat Registration Number and City

Postal Code fields.

Pivoting

Let’s now pivot this table on the two fields mentioned above. This requires that ‘vendor’ be selected as the source repository. Please note that the candidates have been combined as a part of the pivoting process.

Here’s how the pivot table looks.

Fig 1.5 – Pivoting

Complete the pivot table creation process by pressing the ‘OK’ button. Here is a snapshot of the data in the pivot table from the ‘Source Preview Pane’

.

Fig 1.6 – The Pivot Table

Note that each combination of city postal code and Vat registration number have now become columns, while vendor id continues to remain the unique key value per row. It is already seen that sorting by each column and looking for rows where the ‘Name1’ value is not empty (or NULL) would give you duplicates. Let’s go on to reverse piloting this data and observe outputs.

Reverse Pivoting

Choose the Vendor <Pivot> table as the source repository. Select all fields except vendor ID and create the reverse pilot. Note that the fields selected are all combination values of city postal code and Vat registration

numbers.

clip_image018clip_image020

Here’s how the reverse pivot table looks.

Fig 1.7 – Reverse Pivoting

Complete the process of creating the reverse pivot table by pressing the ‘OK’ button. Select this reverse-pivot table as the new source repository. Now we are ready to go on to the mapping process.

clip_image022

The Importing process – Mapping Steps

We have seen that by pivoting and reverse pivoting the same table, we have been able to combine the candidate fields for duplicate searches into a column. The mapping process essentially will transfer this data into the repository for viewing.

To refresh, we need to set the reverse-pivot table of vendors as source and the vendor repository as destination. Let’s go to the ‘Map Fields/Values’ tab.

Adding fields and performing a join

We need to introduce new fields via addition and join actions at this point. All the fields from the original source vendor repository, except the ‘vendor id’ field need to be looked up by doing a join on the vendor repository. They are displayed as below.

Fig 1.8 – Manipulating Source Structure

Two fields need to be added manually.

Key Sequence: This stores the key sequence for which duplicates are being searched: VAT Registration, City Postal Code. Use the values in the table ‘Duplicate Key Sequence’ previously created as a reference for setting values in this field.

Score: The score as per the table of weights. In this case, it would be 55. The value of these fields is set by value conversion. Here’s a peek at the maps.

clip_image024

Fig 1.9 – Mapping

The 4 ‘looked up’ fields are the basis for matching records as we will see next.

Matching records

In the ‘Match Records’ tab, all the four fields available (the four looked up fields), are matched as a combination. Update all mapped fields for Match Class ‘Single’ as below.

Fig 1.10 – Matching

In case you see multiple matches cited for some records in the match class column, those records have a very high chance of being duplicates. You may choose to go a step back and, and update

the score to 100 and import the multiple matched records separately.

Almost There…

Execute import and check the records in the Data Manager. This activity can also be performed in a batch

process if the maps are available beforehand. Multiple import runs will not wipe out information updated in the qualified fields. Instead new qualified records will be created, preserving duplicate information of all the import runs.

clip_image026clip_image028clip_image036clip_image068[1]

Viewing Data

Start SAP MDM Data Manager and select the vendor repository. Search repository by the Search Parameters using the qualified lookup table ‘DuplicateCriteria’ as displayed. Use additional free form search criteria to narrow down to the duplicates.

Fig 1.11 – Searching Duplicates

Merge records as required.

clip_image070clip_image072

Related Content

The following discussion forum thread on SDN is related to this article:

https://www.sdn.sap.com/irj/sdn/forum?forumID=55

Disclaimer and Liability Notice

This document may discuss sample coding or other information that does not include SAP official interfaces and therefore is not supported by SAP. Changes made based on this information are not supported and can be overwritten during an upgrade. SAP will not be held liable for any damages caused by using or misusing the information, code or methods suggested in this document, and anyone using these methods does so at his/her own risk. SAP offers no guarantees and assumes no responsibility or liability of any type with respect to the content of this technical article or code sample, including any liability resulting from incompatibility between the content within this document and the materials and services offered by SAP. You agree that you will not hold, or seek to hold, SAP responsible or liable with respect to the content of this document.

SAP Developer Network Latest Updates