Combining HANA and Hadoop data with SAP Data Lifecycle Management Tool

Posted on November 17, 2017 under Blog

By:
Arman Avetisyan
Sr. SAP Basis/BI & Software Developer
aavetisyan@comerit.com

Nowadays to achieve good results from analytics, you should analyze both your enterprise and historical data.  SAP HANA is a great database for your enterprise system but for historical data, Hadoop is the most popular option to handle Big Data.  Hadoop is also very cost efficient because you can run it on commodity hardware.

In this tutorial we will explore:

  • How to relocate our historical data from enterprise HANA to Hadoop
  • How to combine them together, and
  • How to view the full data from HANA studio

In order to relocate the data from HANA to Hadoop we are going to use SAP HANA Data Lifecycle Management tool, which is part of the SAP HANA Data Warehousing Foundation (DWF).

The Data Lifecycle Manager (DLM) is an SAP HANA XS-based tool that helps to relocate less frequently used data in SAP HANA to the storage destinations such as the SAP HANA Dynamic Tiering option, Hadoop, or SAP IQ.

First, the DLM tool relocates the data and creates a virtual table in HANA studio for the historical data located in Hadoop. Then it combines with enterprise data located in the HANA table and finally, it creates the view for combined data.

Union_of_HANA_and_hadoop_tables

 

Procedure:

  • Create remote source from Hana studio for Hadoop
  • Build Storage destination in DLM
  • Build Lifecycle Profile in DLM
  • Append data relocation rules
  • Start data relocation
  • View combined data in HANA Studio

Create a remote source from HANA Studio for Hadoop

We will use SAP Smart Data access to create remote source for Hadoop. Use the code bellow in your HANA studio’s SQL editor, fill the parameters best on your Hadoop cluster,  and execute it.

CREATE REMOTE SOURCE “<Remote Source Name>”

ADAPTER “sparksql”

CONFIGURATION ‘port=7860;ssl_mode=disabled; server=<Hadoop ip>;’

WITH CREDENTIAL TYPE ‘PASSWORD’

USING ‘user=<username>;password=<password>;

(ex <Remote Source Name> – CLAB_HADOOP)

After execution of the script you will see remote source for your Hadoop system in “Provisioning”-“Remote Sources” folder.

remot_source_for_hadoop

 

 

Execute the commands below in HANA Studio’s SQL editor to notify your index server and xsengine that you’ve created this remote source which will be used for data aging.

ALTER SYSTEM ALTER CONFIGURATION (‘indexserver.ini’, ‘SYSTEM’)

SET (‘data_aging’, ‘spark_remote_source’) = ‘<Remote Source Name>’ WITH RECONFIGURE;

ALTER SYSTEM ALTER CONFIGURATION (‘xsengine.ini’, ‘SYSTEM’)

SET(‘data_aging’,’spark_remot_source’) = <Remote Source Name>’ WITH RECONFIGURE;

Build a Storage Destination

To create new storage destination click on “+” button in “MANAGE STORAGE DESTINATIONS” tab

create_manage_storage_destionation_in_DLM

 

Enter the destination name , select SPARK SQL as Storage destination type, enter schema name,  and click “save”

fill_storage_information

 

After saving the configurations click the “Test Connection” button to test the connection

test_connection

 

After testing, activate the connection by clicking the “Activate” button

activate_connection

 

 

Build a life-cycle profile

To create a new life-cycle profile, click on “+” button in “MANAGE” LYFECYCLE PROFILES” tab

create_new_lifecycle_profile

 

Enter name and storage destination type details and check Trigger type to schedule the execution or execute it manually.

In our case the source is HANA table (SALES_DATA_3) inside the COMERIT_DEMO1 schema.

The nominal key will be used to split data into two parts – Historical and Enterprise.

lifecycle_profile_parameters

 

Select “Storage Destination” in Destination Attributes tab

select_storage_destionation_type

 

Append data relocation rules

Open Rule Editor to set rules for data relocation.

Append the rule and click on validate syntax and it will show total and affected rows.

Total Records is the records count in Hana table and Affected records is the records that will be relocated.

This rule will relocate the data form SALES_DATA_3 table where TRANS_ID is less than 8000.

append_data_relocation_rools

 

Click “Save” and then click “Activate”

Start data relocation

Now we will run the data relocation process. You can run it immediately or on a scheduled basis. For now, we will run it immediately but you can change “Trigger type” to scheduled in order to run it periodically.

Now click “Run” – “Trigger Manually”

run_data_relocation

 

After job execution, the Data Distribution graphs will be updated.

Source graph shows the data in HANA DB

clab_hadoop graph shows the data in Hadoop cluster.

data_distribution_graph

 

Finally, the union view of historical and enterprise data will be created in HANA and you can view it from HANA studio.

union_view

Sign up for updates
Follow Comerit