Configuring Duplicate Search for MDG-M through SAP HANA
With the help of using SAP HANA as a database with SAP MDG, we can perform searches and duplicate checks on master data that is residing on HANA database. An SAP HANA search provider is delivered to enable these features.
SAP MDG 7.0 introduces an additional way for duplicate check on the master data using the strength of SAP HANA which proves to be quite flexible and easy to configure.
Comparing with the traditional approach of de-duplication using the Enterprise Search, SAP HANA based duplicate check has more advantages which are as below:
1) Improved fuzziness and accuracy
2) Concept of Search Rank introduced which gives a much clearer picture of duplicates present in the database.
3) Creation of Rule sets in SAP HANA studio provided more better and expected matches
4) Low TCO as it rules out the possibility of an extra installation of TREX or a third party search solution
5) Using HANA based duplicate check avoids the usage/integration of external Data Quality tools such as SAP BODS
Some Important Terms:
1) Duplicate Check:
This function validates data that you input, and allows you to control the creation of duplicate data records. When we provide some data to create a new change request, the system compares the data you have entered with data already in the system. If the data we have entered matches with one or more existing records, the system warns that you are about to create a duplicate. For example, if you are entering/creating a new material, you enter the Material Description, Material Type and Material Group. The system first compares the data from these fields with existing Material records in the database and then the duplicate check identifies any records that are potential duplicates of the record we are creating.
Each potential duplicate is given a score indicating the probability of it being a duplicate of the new record. You can choose to proceed with creating the new record or, if you agree that by continuing, you would create a duplicate, you can begin to work directly with the existing record – effectively canceling the creation of a new record.
In business scenarios, it is sometimes necessary to create duplicate data. For example a bank customer may have a personal account and a commercial account, for which the data would include common elements
2) Match profile:
You can specify a match profile to control which attributes the system compares to identify duplicates. For example, to compare name and address details, you can specify that the system considers the name fields, house number, street, city, postcode, and country of each record. You can specify that a field is mandatory for duplicate check. During a duplicate check, all fields that you specify as mandatory must contain a value for the check to be performed. You can also assign a relative weight to each field indicating the importance of that field in identifying duplicates. The system can then prioritize certain attributes for the purposes of the comparison. When the system has completed a duplicate check, it presents a score for each potential duplicate. This score is calculated based on the relative weights and indicates the probability that the new record is a duplicate. For example, two addresses with identical postcodes could be considered more likely to be duplicates than two addresses in which the values for country are identical.
You can define the sequence in which the system displays attributes compared for the duplicate check. To do this you enter a number for each attribute, indicating its position in the order, for example 1 indicates that an attribute is the first to be displayed, and 2 indicate the second and so on. If you do not want to define a sequence, you can enter the same value – 1– for each attribute.
3) Threshold :
You can specify a threshold for duplicate scores. The system deems as potential duplicates, only those records with a score in excess of this threshold and displays these records to the user.
Prerequisites:
1) SAP HANA based search is configured and working fine in the system using the help guide provided by SAP i.e. http://help.sap.com/erp_mdg_addon70/helpdata/en/72/93f8516599a060e10000000a44176d/content.htm
2) On generation of Search view, the Search view name is reflecting in the “Included Search help” field under MDGIMG->General Settings->Data Quality and Search->Search and Duplicate Check->Define Search Applications
3) The Match profile ID for Duplicate Check is also created be default on generating the Search view under MDGIMG->General Settings->Data Quality and Search->Search and Duplicate Check->Define Search Applications->Match Profile
4) You have configured the duplicate check, in Customizing for Master Data Governanceunder General SettingsData Quality and SearchSearch and Duplicate CheckConfigure Duplicate Check for Entity Types.
Scenario Configured:
In this blog, we will configure duplicate check functionality for Material Master Data residing in HANA database.
Here, we will create a match profile in which duplicate check will run only on those fields which are considered in the process of generating the Search View. The fields which are under duplicate check purview are Material Type (MTART), Material Group (MATKL), Base Unit of Measure (MEINS), Industry Sector (LABOR) etc. We will define a threshold Limit for it on the basis of which we can fetch the potential duplicates in Material Master
Steps for configuring SAP HANA-MDG duplicate check:
1) Create a user –defined Match profile ID for Duplicate Check:
After you have generated the Search View in the Create Search View step, you can use it to configure the match profile for duplicate checks.
1) Run transaction MDGIMG.
2) Navigate to
3) Select the row with the Search ModeHA(HANA) and click on ‘Allocation of Search help to Search Applications’. Here we will see the Search View name displayed in the field “Included Search Help”
4) Double-click on Match Profile. Create a new Match profile named “Match Profile Material Basic” in addition to already created Match Profile.
Click on “New Entries” and enter the values as shown:
On completion Save the Match profile.
5) Then select the newly created Match profile and click on Relevant Fields tab:
Enter the Resolved Attributes which are to be considered for Duplicate Check and make some of them Mandatory on the basis of which duplicate scores will be calculated.
6) Then click on “Save” button to save the entries added.
2) Configure Duplicate Check for Entity Types:
Run the customizing Master Data Governanceunder General SettingsData Quality and SearchSearch and Duplicate CheckConfigure Duplicate Check for Entity Types and enter the newly created match profile ID in the table as below.
Click on “Save” button to save the entries:
Testing:
1) Open NWBC and select the Role “SAP_MDGM_MENU_04”
2) Click on Create material.
3) Click on Continue and enter the data for mandatory fields as shown below:
4) After providing data for all mandatory fields, Click on “Check” button to check for duplicate data:
5) On clicking the “Check”, button, we get the below screen:
We get the match score as 81.99 which shows that the new material which we are creating is a potential duplicate for the already existing material number GK-102.
So, the user is left with two options in order to proceed further:
a) Press “Continue” button to continue with the creation of new material OR
b) Press Switch to Duplicate to abandon the creation of the New material
If option a) is considered, then a new material will be created in the system which will be approx 82% duplicate of the existing material.
If option b) is considered, then this new material creation will be stopped here and a probable duplicate will not be created.
This is how we can configure Duplicate Based functionality for MDG-M using HANA database.
Recommendation from SAP to improve De-Duplication:
1) Creation of Rule sets while generating the search view, which will provide an option to set more parameters in HANA studio in order to get the accurate and correct results.
2) Usage of “Consider Non Matching Tokens” parameter in the Rule sets can help to lower the score and we can get a more meaningful duplicate check id we want to only get the results that consider additional information as a relevant differentiation.
3) Consideration of more number of fields for duplicate check which will provide a much accurate score to get the duplicate material w/o the creation of rule sets.
4) SAP strongly recommends using SAP HANA as the database for ERP system and use the HANA based search and duplicate check functionality for all the domains.
---------------------------------------------------------------------------------------END OF DOCUMENT-------------------------------------------------------------------------------------------------------------