Combining HANA and Hadoop data with SAP Data Lifecycle Management Tool

Combining HANA and Hadoop data with SAP Data Lifecycle Management Tool

Arman Avetisyan
Sr. SAP Basis/BI & Software Developer

Nowadays to achieve good results from analytics, you should analyze both your enterprise and historical data.  SAP HANA is a great database for your enterprise system but for historical data, Hadoop is the most popular option to handle Big Data.  Hadoop is also very cost efficient because you can run it on commodity hardware.

In this tutorial we will explore:

  • How to relocate our historical data from enterprise HANA to Hadoop
  • How to combine them together, and
  • How to view the full data from HANA studio

In order to relocate the data from HANA to Hadoop we are going to use SAP HANA Data Lifecycle Management tool, which is part of the SAP HANA Data Warehousing Foundation (DWF).

The Data Lifecycle Manager (DLM) is an SAP HANA XS-based tool that helps to relocate less frequently used data in SAP HANA to the storage destinations such as the SAP HANA Dynamic Tiering option, Hadoop, or SAP IQ.

First, the DLM tool relocates the data and creates a virtual table in HANA studio for the historical data located in Hadoop. Then it combines with enterprise data located in the HANA table and finally, it creates the view for combined data.




  • Create remote source from Hana studio for Hadoop
  • Build Storage destination in DLM
  • Build Lifecycle Profile in DLM
  • Append data relocation rules
  • Start data relocation
  • View combined data in HANA Studio

Create a remote source from HANA Studio for Hadoop

We will use SAP Smart Data access to create remote source for Hadoop. Use the code bellow in your HANA studio’s SQL editor, fill the parameters best on your Hadoop cluster,  and execute it.

CREATE REMOTE SOURCE “<Remote Source Name>”

ADAPTER “sparksql”

CONFIGURATION ‘port=7860;ssl_mode=disabled; server=<Hadoop ip>;’


USING ‘user=<username>;password=<password>;

(ex <Remote Source Name> – CLAB_HADOOP)

After execution of the script you will see remote source for your Hadoop system in “Provisioning”-“Remote Sources” folder.




Execute the commands below in HANA Studio’s SQL editor to notify your index server and xsengine that you’ve created this remote source which will be used for data aging.


SET (‘data_aging’, ‘spark_remote_source’) = ‘<Remote Source Name>’ WITH RECONFIGURE;


SET(‘data_aging’,’spark_remot_source’) = <Remote Source Name>’ WITH RECONFIGURE;

Build a Storage Destination

To create new storage destination click on “+” button in “MANAGE STORAGE DESTINATIONS” tab



Enter the destination name , select SPARK SQL as Storage destination type, enter schema name,  and click “save”



After saving the configurations click the “Test Connection” button to test the connection



After testing, activate the connection by clicking the “Activate” button




Build a life-cycle profile

To create a new life-cycle profile, click on “+” button in “MANAGE” LYFECYCLE PROFILES” tab



Enter name and storage destination type details and check Trigger type to schedule the execution or execute it manually.

In our case the source is HANA table (SALES_DATA_3) inside the COMERIT_DEMO1 schema.

The nominal key will be used to split data into two parts – Historical and Enterprise.



Select “Storage Destination” in Destination Attributes tab



Append data relocation rules

Open Rule Editor to set rules for data relocation.

Append the rule and click on validate syntax and it will show total and affected rows.

Total Records is the records count in Hana table and Affected records is the records that will be relocated.

This rule will relocate the data form SALES_DATA_3 table where TRANS_ID is less than 8000.



Click “Save” and then click “Activate”

Start data relocation

Now we will run the data relocation process. You can run it immediately or on a scheduled basis. For now, we will run it immediately but you can change “Trigger type” to scheduled in order to run it periodically.

Now click “Run” – “Trigger Manually”



After job execution, the Data Distribution graphs will be updated.

Source graph shows the data in HANA DB

clab_hadoop graph shows the data in Hadoop cluster.



Finally, the union view of historical and enterprise data will be created in HANA and you can view it from HANA studio.