FINDING DUPLICATE MASTER DATA

The Approach

The pivoting and the reverse pivoting features of SAP MDM Import manager are useful to reorganize records. They allow us to view data both in terms of the traditional ‘values in columns’ and its converse: ‘column names associated with values’.

Applying pivot operations to data and then subsequently its reverse, while preserving key information for each record can lead to interesting results. Essentially, these dual operations constitute different ways to view the original record. We will see those shortly.

3 clip_image002

The Scenario

Let’s start at a point where all the data has already been loaded into MDM and the process for finding duplicates can commence. The repository in question is a vendor database.

First, we need to finalize the combination of fields and appropriate weight to each combination that can be used to search duplicates.

Fields for comparison

This document takes the same approach as the SAP ‘How-to’ document for this purpose. Additionally, we will create a lookup table (‘Duplicate Key Sequence’) which can store the combination of these fields which can be used to find duplicates. The table contents are displayed. This will help us view data in a more meaningful manner.

Fig 1.1 – Weights and Candidate Search Combinations

The Upload File

A file needs to be created with the following fields

1. A unique key for each of the records, say an Auto id field, ‘Vendor id’.

2. All fields in the fig 1.1

These fields can be syndicated out of MDM in the form of a file. As we will need to manipulate the file contents and create pivot tables, XML or Microsoft Access files are preferable.

Xml Schema / MS Access Design

Here’s the graphical view of the XML schema and the design of the MS Access alternative.

clip_image004clip_image006clip_image006[1]clip_image006[2]clip_image006[3]clip_image006[4]clip_image006[5]clip_image006[6]clip_image006[7]clip_image006[8]clip_image006[9]clip_image006[10]clip_image006[11]clip_image006[12]clip_image006[13]clip_image006[14]

Field

Weight

Vat Registration Number

34

Street 1

15

Name 1

30

Postal Code

21

Fig 1.2 – The Source Schema

Either file format will do. Let’s take the XML alternative further.

clip_image008clip_image010

The Target Repository

We’ll add a new multi-valued qualified field to the vendor repository which will help us view duplicates in a meaningful manner.

The new field added is ‘Duplicate Criteria’ as shown below.

Fig 1.3 – Introducing Field DuplicateCriteria

The new qualified lookup table ‘DuplicateCriteria’ is structured as follows.

Fig 1.4 – Storing Duplicate Citations

The ‘Name’ field stores the candidate field combination for which the ‘Criteria’ and ‘Score’ are applicable.

The ‘Name’ field links to the lookup table ‘Duplicate Key Sequence’ previously introduced. The criteria field will store the values for which the duplicate citation is applicable. The ‘Score’ field is filled by the weight information laid out previously. This is a way we can store multiple citations of duplicates.

The Importing process – Pre Mapping preparation

Here’s how the XML file looks when imported by MDM.

Fig 1.5 – Import File Preview

clip_image012clip_image014clip_image016

Let’s take the case of finding duplicates on the candidate combination Vat Registration Number and City

Postal Code fields.

Pivoting

Let’s now pivot this table on the two fields mentioned above. This requires that ‘vendor’ be selected as the source repository. Please note that the candidates have been combined as a part of the pivoting process.

Here’s how the pivot table looks.

Fig 1.5 – Pivoting

Complete the pivot table creation process by pressing the ‘OK’ button. Here is a snapshot of the data in the pivot table from the ‘Source Preview Pane’

.

Fig 1.6 – The Pivot Table

Note that each combination of city postal code and Vat registration number have now become columns, while vendor id continues to remain the unique key value per row. It is already seen that sorting by each column and looking for rows where the ‘Name1’ value is not empty (or NULL) would give you duplicates. Let’s go on to reverse piloting this data and observe outputs.

Reverse Pivoting

Choose the Vendor <Pivot> table as the source repository. Select all fields except vendor ID and create the reverse pilot. Note that the fields selected are all combination values of city postal code and Vat registration

numbers.

clip_image018clip_image020

Here’s how the reverse pivot table looks.

Fig 1.7 – Reverse Pivoting

Complete the process of creating the reverse pivot table by pressing the ‘OK’ button. Select this reverse-pivot table as the new source repository. Now we are ready to go on to the mapping process.

clip_image022

The Importing process – Mapping Steps

We have seen that by pivoting and reverse pivoting the same table, we have been able to combine the candidate fields for duplicate searches into a column. The mapping process essentially will transfer this data into the repository for viewing.

To refresh, we need to set the reverse-pivot table of vendors as source and the vendor repository as destination. Let’s go to the ‘Map Fields/Values’ tab.

Adding fields and performing a join

We need to introduce new fields via addition and join actions at this point. All the fields from the original source vendor repository, except the ‘vendor id’ field need to be looked up by doing a join on the vendor repository. They are displayed as below.

Fig 1.8 – Manipulating Source Structure

Two fields need to be added manually.

Key Sequence: This stores the key sequence for which duplicates are being searched: VAT Registration, City Postal Code. Use the values in the table ‘Duplicate Key Sequence’ previously created as a reference for setting values in this field.

Score: The score as per the table of weights. In this case, it would be 55. The value of these fields is set by value conversion. Here’s a peek at the maps.

clip_image024

Fig 1.9 – Mapping

The 4 ‘looked up’ fields are the basis for matching records as we will see next.

Matching records

In the ‘Match Records’ tab, all the four fields available (the four looked up fields), are matched as a combination. Update all mapped fields for Match Class ‘Single’ as below.

Fig 1.10 – Matching

In case you see multiple matches cited for some records in the match class column, those records have a very high chance of being duplicates. You may choose to go a step back and, and update

the score to 100 and import the multiple matched records separately.

Almost There…

Execute import and check the records in the Data Manager. This activity can also be performed in a batch

process if the maps are available beforehand. Multiple import runs will not wipe out information updated in the qualified fields. Instead new qualified records will be created, preserving duplicate information of all the import runs.

clip_image026clip_image028clip_image036clip_image068[1]

Viewing Data

Start SAP MDM Data Manager and select the vendor repository. Search repository by the Search Parameters using the qualified lookup table ‘DuplicateCriteria’ as displayed. Use additional free form search criteria to narrow down to the duplicates.

Fig 1.11 – Searching Duplicates

Merge records as required.

clip_image070clip_image072

Related Content

The following discussion forum thread on SDN is related to this article:

https://www.sdn.sap.com/irj/sdn/forum?forumID=55

Disclaimer and Liability Notice

This document may discuss sample coding or other information that does not include SAP official interfaces and therefore is not supported by SAP. Changes made based on this information are not supported and can be overwritten during an upgrade. SAP will not be held liable for any damages caused by using or misusing the information, code or methods suggested in this document, and anyone using these methods does so at his/her own risk. SAP offers no guarantees and assumes no responsibility or liability of any type with respect to the content of this technical article or code sample, including any liability resulting from incompatibility between the content within this document and the materials and services offered by SAP. You agree that you will not hold, or seek to hold, SAP responsible or liable with respect to the content of this document.

R/3-XI-MDM (Outbound Scenario)

This article will show you how to integrate SAP R/3 and Master Data Management (MDM) using PI in short framework. We will be discussing Outbound process i.e. Outbound to the SAP R/3
clip_image001

Figure Outbound to R/3

MDM Configuration Steps:
Step One-->Create Client System
Follow the same steps to create client system. Also we can use the same client system.
Step Two-->Create Port for Client System
Follow the same steps to create inbound port.
The Port Name is SIEBEL_IB_CUS_SIEBELCUS01, Code SIEBEL_IB_CUS_SIEBELCUS01, and Type as Inbound. See the figures below.
clip_image002

Figure Port for Inbound

MDM Server Side Configuration
In the SAP MDM Server, Look at the server folder structure for the Inbound folder which we have created in the MDM Console.
clip_image003

Figure MDM Server Folder

FTP Server Side Configuration
Step Three-->FTP Server Configuration
Configure/Specify the file folder path for pulling the file in FTP Server. This is screen refers the FTP server configuration (WS_FTP server)
clip_image004

Figure FTP Server

SAP XI / PI Side Configuration
Step Four --> SAP PI Receiver Communication Channel
clip_image005

Figure Receiver CC

Outbound Process Flow:
clip_image006

Figure Outbound Process Flow

>>>Here, I will be discussing Outbound process, the above figure shows the entire process
Process step in R/3
In this IDoc triggers to outbound port.
Process step in PI
The IDoc Communication Channel picks up in the PI System and starts the PI Process. In the PI, transformation and conversion happens and sends to the File Adaptor. The file adaptor pushes the file to the specified path in MDM server folder.
Process step in MDM
The Import Manager picks up the file which the file placed in server folder and imports the file to MDM repository main table.

New MDM Matching

The MDM matching functionality exposed in the MDM Data Manager supports interactive, strategy-based matching allowing identification of duplicate master data records. Coupled with the existing MDM merging functionality, MDM provides powerful capabilities for the deduplication process in a single environment.

In this blog, you will learn how easy it is to define matching rules, to compose matching strategies, and to run these matching strategies as an essential part of the deduplication process.

MDM Matching is Record-Centric

What this means is that each record has its own matching group of zero or more potential matches. When an MDM matching strategy is applied to a set of selected records, the strategy is applied individually to each selected record, and the potential matches are placed in the matching group for each record.

MDM Matching Occurs in a New Mode

clip_image007

Matching mode allows you to define matching strategies (design time), apply these strategies to a set of records, and view the matching results (runtime).

  • Interactive matching in matching mode is like searching: fully interactive, ad hoc, and without any form of persistence.
  • Automated matching occurs as part of the MDM Data Management Workflow, using a match/merge step that applies a matching strategy to the records in the workflow job (either on the records in the job or all records in the repository). Matching results are persistent during the execution of the workflow.

Matching mode is very similar to record mode and features the familiar three-pane layout including
(1) Search pane on the left
(2) Records pane on the top-right containing a grid with the search results and
(3) Matching pane on the bottom-right containing the following tabs:

clip_image008

The Matches tab shows the matching group of potential matches for the selected record:

clip_image009

The Merge tab contains an embedded version of the merge dialog for record mode, allowing you to merge the marked records in the matching group:

clip_image010

The Matches and Merge tabs together comprise the Runtime of matching mode.

Define a Matching Strategy

A matching strategy, the core of MDM matching, can be applied to selected records, to search results or to all records in the repository.

The Transformation tab (incl. substitution and synonym lists) allows you to create a virtual field that is a transformed version of a field in the repository, so that you can apply rules to the transformed values rather than to the original source values.

clip_image011

The Rules tab (with which you can specify a score for the success, failure and undefined conditions) allows you to define matching rules that are for comparing field values of record pairs. The matching functions include Equals (exact match) and Token Equals (match on one or more tokens). The set of functions will increase over time, making MDM matching highly extensible.

clip_image012

The Strategies tab allows you to define matching strategies consisting of matching rules. Each matching strategy is defined by the following properties: a name, lower & upper threshold, and one or more matching rules. The Min and Max Score is automatically calculated by MDM based on the selected rules.

clip_image013

The Design Time of matching mode comprises the Transformations, Rules and Strategies tabs.

Apply a Matching Strategy

You can use a menu command to execute a matching strategy on any subset of selected records against the subset (Selected vs. Selected), against the current search results (Selected vs. Results) or against the entire set of records in the repository (Selected vs. All):

clip_image014

You can also match the search results against search results (Results vs. Results) or against all records in the repository (Results vs. All).

After executing the matching, you can review the matching results by selecting a record in the Record pane and investigating potential matches for the selected record in the Matches tab. You can see the scores for every matching rule, the sum for each record, and the match class. Based on this information and the record data, you can decide if two records are identical.

If you have identified duplicates, you can merge them on the Merge tab, which provides the Merge dialog known from record mode.

clip_image015

New MDM Match/Merge Workflow Step

An sample workflow could look like the following:

clip_image016

The properties of the match/merge step include Name, Description, Matching Strategy and Match Records Scope (Records vs. Records or Records vs. All) as well as generic workflow step information like assigned users or roles.

The Workflows tab in Matching mode is the same as the Workflows tab in record mode. There you can find all the workflows that require attention by the user. Whereas Record mode lists all of the workflows available for a particular user, matching mode lists only those workflows which processed the match/merge step and are waiting for user interaction.

clip_image017

This is an initial preview of MDM matching, Details will follow ;-)

SAP Developer Network Latest Updates