Friday, August 10, 2012
SAP HANA
Storage and Compression Techniques
SAP's HANA is a
technology which processes massive amount of real time data in the main memory to provide results from analyses and transactions.
It is an IMDB - Next generation database technology.
Storage and Compression Techniques
"High Performance Analytic Appliance"
It is an IMDB - Next generation database technology.
HANA defines it's advantage well with the help of the following Properties:
- BIG DATA (Ever growing DATA)
- Mobile (Real time)
- Cloud (On Demand)
- Groundbreaking yet not disruptive to existing landscapes.
To process massive amounts of data the HANA DB has to be able to hold that much data without compromising on performance. To combat this requirement, very efficient compression techniques have been put in place. Note that HANA DB uses a number of these techniques in succession to re-compresses already compressed data keeping the efficiency factor high.
Another point to be noted here is that, these compression techniques are so carefully chosen such that they work both on column based store as well as row based stored data.
Data Compression has a wide variety of uses in the IMDB (In Memory Database) Technology and keeps the IMCE(In Memory Computing Engine) fueled with data to avoid processor idle time.
With the help of compression, the whole database is kept in the Main Memory (RAM) which is the basis of HANA. Compression techniques can provide a compression rate of upto 20:1 i.e. 20 GB of data can fit into 1GB of Main memory.
Compressed data can be loaded into the CPU cache faster. This is because the limiting factor is the data transport between memory and CPU cache, and so the performance gain will exceed the additional computing time needed for decompression. Compression can speed up operations such as scans and aggregations if the operator is aware of the compression.
The compression techniques used are:
- Run Length Encoding
- Cluster Encoding
- Dictionary Encoding
They are explained below:
- Run Length Encoding: This algorithm consists of replacing large sequences of repeating data with only one item of this data followed by a counter showing how many times this item is repeated. The original column is replaced with a two-column list. The first column contains the values and the second column contains the counts of consecutive occurrences.
Note that from the new column the original column can be constructed back. This algorithm is particularly useful in data where a number of values are repeated.
- Cluster Encoding: Cluster Encoding works by searching for multiple occurrences of the same sequence of values within the original column.The compressed column consists of a two-column list with the first column containing the elements of a particular sequence and the second column containing the row numbers where the sequence starts in the original column. Note that this algorithm replaces strings of characters in memory.
- Dictionary Encoding: With dictionary encoding, the columns are stored as sequences of bit-coded integers. That means that a check for equality can be executed on the integers (for example during scans or join operations). This is much faster than comparing, for example, string values. This technique leads to high compression rates. e. g. in country codes or customer numbers.
Table columns which contain only a comparably small number of distinct values can be effectively be compressed by enumerating the distinct values and storing only their numbers.
This technique requires that an additional table, the dictionary, is maintained which in the first column contains the original values and in the second one the numbers representing the values.
- Cluster Encoding: Cluster Encoding works by searching for multiple occurrences of the same sequence of values within the original column.The compressed column consists of a two-column list with the first column containing the elements of a particular sequence and the second column containing the row numbers where the sequence starts in the original column. Note that this algorithm replaces strings of characters in memory.
- Dictionary Encoding: With dictionary encoding, the columns are stored as sequences of bit-coded integers. That means that a check for equality can be executed on the integers (for example during scans or join operations). This is much faster than comparing, for example, string values. This technique leads to high compression rates. e. g. in country codes or customer numbers.
Labels:
SAP HANA Compression
Location:
Manhattan, NY 10007, USA
Friday, August 3, 2012
SAP HANA Demystified: SAP HANA - BIG DATA
SAP HANA Demystified: SAP HANA - BIG DATA: SAP HANA - DATABASE Future Redefined SAP HANA - It is the next generation SAP's IMDB (In Memory Database) Technology. An In- Memory App...
Thursday, August 2, 2012
SAP HANA - BIG DATA
SAP HANA - DATABASE Future Redefined
SAP HANA - It is the next generation SAP's IMDB (In Memory Database) Technology. An In- Memory Appliance that combines Software and Hardware Components to provide power to crunch BIG DATA Challenges.
SAP has partnered with major hardware vendors across the globe to come up with this Ground Breaking yet Non-Disruptive technology. These include IBM, HP, Dell, Fujitsu, NEC, CISCO, Hitachi etc. These certified Hardware Partners ensure that this technology is harnessed to it's maximum potential with their high quality hardware. To leverage the full power from this technology one needs to get in touch with these vendors to get their hands on a SAP certified HANA Box. The HANA box has the SAP HANA Database fully deployed over optimized hardware to ensure you get started as you receive the HANA box. The HANA Box runs on Suse Linux Enterprise Server (SLES) 11 SP1 Operating System. This operating system comes along with the HANA BOX, fully deployed , ready to use. The HANA Box is well equipped to process massive volumes of transactional data in real time to provide with a clear insight into the performance of various business Initiatives. The HANA Box also has the HANA studio which provides the capability of Data Modelling, Deploying Security Policies, Data and Life Cycle Management to the user.
Real Time Business Analytics is a bi-product of this technology. Business Owners across the globe can expect to have Real Time Analytics and Speedy Growth to achieve their breakthrough targets.
The HANA Database is a hybrid in-memory database which leverages it's power by combining row based, coloumn based and object based technology to provide optimum performance environment. Queries run much faster in this database as compared to the traditional Databases. Big Data is the challenge the world faces today with data growing at enormous speeds. SAP HANA leverages the current hardware to solve this problem.
SAP HANA has a number of reasons behind it's Success:
1. Reducing Costs of Main Memory RAM chips. Main Memory is not expensive as it used to be. Currently HANA is equpped with 8TB Ram Chips.(Supporting 40TB of DATA)
2. With the advent of Multi Core Processors, the processing power has grown multiple folds today. Hard Drive technology is not well equipped to take up this challenge. We currently have 80 Core processor chips available today. 128 Cores on a single Chip are in the pipeline right now. Intel has plans to come out with them by next year.
3. Multi Engine Query Processing Environments.
4. HANA has the ability to combine row based and column based storage of data to optimize performance.
5. HANA DB is equipped with high compression techniques. It uses a combination of compression techniques to achieve compression of the order of 20:1 (Check my next post).
6. HANA DB is 100% ACID Compliant.
(A - Atomicity, C- Consistency, I- Isolation, D- Durability)
Check this Link for more Info: http://www.sap.com/uk/hana/index.epx
Subscribe to:
Posts (Atom)