Combining HANA and Hadoop data with SAP Data Lifecycle Management Tool
Sr. SAP Basis/BI & Software Developer
Nowadays to achieve good results from analytics, you should analyze both your enterprise and historical data. SAP HANA is a great database for your enterprise system but for historical data, Hadoop is the most popular option to handle Big Data. Hadoop is also very cost efficient because you can run it on commodity hardware.
In this tutorial we will explore:
- How to relocate our historical data from enterprise HANA to Hadoop
- How to combine them together, and
- How to view the full data from HANA studio
In order to relocate the data from HANA to Hadoop we are going to use SAP HANA Data Lifecycle Management tool, which is part of the SAP HANA Data Warehousing Foundation (DWF).
The Data Lifecycle Manager (DLM) is an SAP HANA XS-based tool that helps to relocate less frequently used data in SAP HANA to the storage destinations such as the SAP HANA Dynamic Tiering option, Hadoop, or SAP IQ.
First, the DLM tool relocates the data and creates a virtual table in HANA studio for the historical data located in Hadoop. Then it combines with enterprise data located in the HANA table and finally, it creates the view for combined data.
- Create remote source from Hana studio for Hadoop
- Build Storage destination in DLM
- Build Lifecycle Profile in DLM
- Append data relocation rules
- Start data relocation
- View combined data in HANA Studio
Create a remote source from HANA Studio for Hadoop
We will use SAP Smart Data access to create remote source for Hadoop. Use the code bellow in your HANA studio’s SQL editor, fill the parameters best on your Hadoop cluster, and execute it.
CREATE REMOTE SOURCE “<Remote Source Name>”
CONFIGURATION ‘port=7860;ssl_mode=disabled; server=<Hadoop ip>;’
WITH CREDENTIAL TYPE ‘PASSWORD’
(ex <Remote Source Name> – CLAB_HADOOP)
After execution of the script you will see remote source for your Hadoop system in “Provisioning”-“Remote Sources” folder.
Execute the commands below in HANA Studio’s SQL editor to notify your index server and xsengine that you’ve created this remote source which will be used for data aging.
ALTER SYSTEM ALTER CONFIGURATION (‘indexserver.ini’, ‘SYSTEM’)
SET (‘data_aging’, ‘spark_remote_source’) = ‘<Remote Source Name>’ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘xsengine.ini’, ‘SYSTEM’)
SET(‘data_aging’,’spark_remot_source’) = <Remote Source Name>’ WITH RECONFIGURE;
Build a Storage Destination
To create new storage destination click on “+” button in “MANAGE STORAGE DESTINATIONS” tab
Enter the destination name , select SPARK SQL as Storage destination type, enter schema name, and click “save”
After saving the configurations click the “Test Connection” button to test the connection
After testing, activate the connection by clicking the “Activate” button
Build a life-cycle profile
To create a new life-cycle profile, click on “+” button in “MANAGE” LYFECYCLE PROFILES” tab
Enter name and storage destination type details and check Trigger type to schedule the execution or execute it manually.
In our case the source is HANA table (SALES_DATA_3) inside the COMERIT_DEMO1 schema.
The nominal key will be used to split data into two parts – Historical and Enterprise.
Select “Storage Destination” in Destination Attributes tab
Append data relocation rules
Open Rule Editor to set rules for data relocation.
Append the rule and click on validate syntax and it will show total and affected rows.
Total Records is the records count in Hana table and Affected records is the records that will be relocated.
This rule will relocate the data form SALES_DATA_3 table where TRANS_ID is less than 8000.
Click “Save” and then click “Activate”
Start data relocation
Now we will run the data relocation process. You can run it immediately or on a scheduled basis. For now, we will run it immediately but you can change “Trigger type” to scheduled in order to run it periodically.
Now click “Run” – “Trigger Manually”
After job execution, the Data Distribution graphs will be updated.
Source graph shows the data in HANA DB
clab_hadoop graph shows the data in Hadoop cluster.
Finally, the union view of historical and enterprise data will be created in HANA and you can view it from HANA studio.