OlapDB

High performance Online Analytical Processing Data Warehouse

OlapDB is an high performance OLAP (Online Analytical Processing) data warehouse tool based on Cube pre-computation technology and HBase implementation, which can provide efficient real-time analysis and fast query for ultra-large-scale and ultra-fast-growing data.

Welcome to OlapDB, the fastest and most efficient solution for multidimensional data analysis and business intelligence

OlapDB is an high performance analytical data warehouse software based on Hadoop and HBase, which can help you store, analyze and share large-scale data quickly, flexibly and securely. It uses a multidimensional data model (data cube) and a precomputation mechanism to manage and query large data sets, and supports rich aggregation operations such as summarization, filtering, sorting, grouping, and calculation to meet complex data analysis needs. OlapDB can be easily integrated into the existing data architecture, and provides rich management and monitoring functions, such as data warehouse management, user management, model management and data expiration management, etc. OlapDB uses a dedicated HBase coprocessor to deeply optimize the read, write and delete of aggregated data, thereby reducing the direct read and write operations of HBase and greatly improving the efficiency of data storage and processing. OlapDB is suitable for various big data scenarios, such as business intelligence, data mining, log analysis, and online advertising. In addition, OlapDB also provides accurate SQL queries based on Apache Calcite, and expands the processing capabilities of fact data update, fact data deletion, query based on index dimensions, time series data analysis, etc., suitable for real-time analysis of dynamic data, time series data, historical data and trajectory data.

Why choose OlapDB

OlapDB has the following three core capabilities, making your data analysis simpler, faster, and lower cost.

Best Productivity

Real-time Processing of Large-Scale Data

OlapDB supports ultra-large-scale data processing, can store PB-level data, and supports horizontal scaling of processing capabilities. OlapDB leverages the powerful distributed processing capabilities of the HBase cluster to perform real-time analysis and Cube construction on hundreds of millions of data per second. OlapDB supports data offsetting to achieve data deletion and update. OlapDB reduces data write pressure by directly writing storeFile files and bulkLoad for block merging. OlapDB directly clears expired segment data through minor compact, partitions do not merge or split, and achieves the best performance of HBase.

01
Ultra Fast Query Speed

Sub-Second Query of Trillions of Data

OlapDB uses Cube multidimensional cube pre-computation technology, which can obtain query results in sub-second response time even on trillions of scale data sets. OlapDB provides standard SQL query capabilities, which can be queried through JDBC or RESTFUL API; OlapDB can be seamlessly integrated with front-end BI tools such as Tableau and Superset. OlapDB only builds the Cuboids that need to be used, avoiding dimension combination explosion, and can support more dimension Cube models. OlapDB provides aggregation threshold, balances block size and number of blocks, and a single model can balance the processing of trillions of aggregation data.

02
Easy To Use

Flexible and Powerful Data Management Capabilities

OlapDB provides a Web management console that can view and manage information such as data warehouses, projects, models, etc. in real time. OlapDB provides native data statistics and verification mechanisms to ensure data processing correctness. OlapDB supports expiration data deletion based on time dimension, reducing invalid data. OlapDB provides Cuboid approval function to prevent inefficient Cuboid generation and avoid data over-expansion. OlapDB provides aggregation threshold, balances block size and number of blocks, ensuring that data processing and storage are efficiently executed concurrently on the cluster.

03

Powerful features

OlapDB is an efficient, concise, flexible, and easy-to-use analytical data warehouse that can meet the large-scale data analysis needs of different industries and scenarios. Compared with other data warehouses, OlapDB has obvious advantages and features

Ultra-large-scale data

OlapDB uses HBase as the data storage layer, which can store PB-level data and support horizontal expansion of processing capabilities. HBase is a distributed, scalable, highly available, and high-performance NoSQL database, suitable for storing massive amounts of structured or semi-structured data. Based on HBase, OlapDB can easily analyze and process trillions of data.

Extremely simple data format

OlapDB adopts JSON text format as the submission method of fact data, and adopts JSON array text format as the storage method of aggregated data dimension column combination, achieving a concise data format. JSON is a lightweight data exchange format that is easy to read and write. OlapDB uses JSON text format and ZSTD compression algorithm to achieve high compression ratio and high performance.

No dimension table design

OlapDB does not need dimension table design, and realizes dimension index and time index through text format RowKey. RowKey is the unique identifier of each row of data in HBase, which can be used for range scan and filter. OlapDB realizes fast query filtering by taking dimension values and time values as part of RowKey.

Optimize write efficiency

OlapDB reduces data write pressure by directly writing storeFile files and bulkLoad for block merging. storeFile file is the file format used by HBase to store data, and bulkLoad is the method of batch import data in HBase. OlapDB avoids multiple steps in the HBase write path by this way, improving write efficiency.

Support high-dimensional model

OlapDB only builds Cuboids that need to be used, avoiding dimension combination explosion, and can support more dimension Cube models. Cuboid is a concept used by OlapDB to store pre-calculated aggregated data, similar to star model or snowflake model in data warehouse. OlapDB generates a basic Cuboid list according to dimension grouping, supports joint dimensions, hierarchical dimensions, forced dimensions to improve Cuboid reuse, and reduces Cuboid quantity.

Support time dimension and index dimension

OlapDB's Cube model can specify time dimension and index dimension. The Cube model construction will optimize the storage of aggregated data according to time dimension and index dimension. Using queries based on time dimension and index dimension can greatly reduce the data scan range and quickly generate query results. OlapDB can also set data expiration time based on time dimension and automatically clean up expired data.

Support ad hoc query

OlapDB generates query results for ad hoc query Cuboids that are not included in the pre-built list according to the closest existing Cuboid, and adds them to the pre-built list. OlapDB provides Cuboid approval function to prevent inefficient Cuboid generation and avoid data over-expansion.

Support data update and delete

OlapDB supports data offsetting to achieve data deletion and update. Data offsetting is a technique that uses OLAP_OPERATION field in fact data to identify delete or update operations. OlapDB achieves dynamic data analysis statistics by this way.

Clean up invalid data

OlapDB uses minor compact to directly delete expired segment aggregated data, avoiding partition merge or split, improving HBase storage efficiency. minor compact is an operation in HBase to merge or clean up small files. OlapDB saves space and processing time for invalid data by this way.

Efficient real-time processing

OlapDB provides multiple workspace mechanisms to process submitted data in parallel, processing billions of data per second in real time. Workspace is a concept used by OlapDB to isolate data pollution between segments, corresponding to regions in the aggregated data table. OlapDB ensures the cleanliness and orderliness of data storage by this way, achieving high concurrency and high throughput.

Data integrity check

OlapDB provides native data statistics and verification mechanisms to ensure data processing integrity. OlapDB records the number of source aggregated data and generated aggregated data in each segment, and verifies them during the data processing process, preventing write failures and read failures during the processing process.

Cluster load balancing

OlapDB provides aggregation threshold, balances block size and block number, and can balance trillions of aggregated data for a single model. Aggregation threshold is a parameter used by OlapDB to control the block size and block number of each model. OlapDB achieves balanced distribution of storage and computation of aggregated data on HBase cluster by this way, improving concurrency.

Multiple integration methods

OlapDB provides standard SQL query capabilities, which can be queried through JDBC or RESTFUL API; OlapDB can easily integrate with front-end BI tools such as Tableau, Superset, etc. SQL is a common database query language that can be used to perform operations such as adding, deleting, modifying, querying, aggregating, sorting, grouping, etc. on data. OlapDB provides standard SQL query capabilities, allowing users to easily use their familiar languages and tools for data analysis and visualization.

Data management platform

OlapDB provides a web management console that can view and manage information such as data warehouse, project, model, etc. in real time. OlapDB provides a web management console that allows users to easily perform operations such as creating, deleting, editing, etc. on projects/models, view various statistical data of data warehouse, project, model, such as data volume, block number, query times, etc., view real-time data processing process and task load, such as submission time, processing time , processing status, etc.

OlapDB system diagram

Service & Components

OlapDB data warehouse needs to deploy the following components: a management service, a build service, one or more query services, a web management console, and a dedicated HBase coprocessor.

Management Service

Real-time monitoring of data warehouse storage distribution and cluster load, and creating and scheduling tasks to run distributed on HBase cluster.

Build Service

Periodically scan the Fact table, discover the incremental data of the model and perform incremental build on the model. And provide Rest interface for users to directly submit incremental data.

Query Service

Responsible for querying aggregated data, and providing standard SQL query capabilities through JDBC or RESTFUL API.

HBase Coprocessor

OlapDB offers dedicated coprocessors that are deployed on HBase clusters, enabling data processing tasks to run distributively on HBase clusters.

Web Management Console

OlapDB provides a web management console that can view and manage information such as data warehouse, project, model, etc. in real time.

Download and Run

Five steps to complete OlapDB deployment and start data analysis

//download OlapDB hbase Coprocessor
wget https://www.olapdb.com/downloads/olap-observer.jar

//download OlapDB services
wget https://www.olapdb.com/downloads/olap-master.jar
wget https://www.olapdb.com/downloads/olap-builder.jar
wget https://www.olapdb.com/downloads/olap-query.jar

//start OlapDB services
nohup java -jar olap-master.jar --hbase.zkQuorum=quorumServer &
nohup java -jar olap-builder.jar --hbase.zkQuorum=quorumServer &
nohup java -jar olap-query.jar --hbase.zkQuorum=quorumServer &

//download OlapDB Web Management Console and start nginx 
yum install nginx
wget https://www.olapdb.com/downloads/olap-manager-web.html.tar.gz
tar -zxvf olap-manager-web.html.tar.gz
cp html /usr/share/nginx/ -r
systemctl restart nginx
Step 1

Download and start OlapDB services

OlapDB is based on Hbase 2.x, so you need to install it first if you don't have it.

  • Download the OlapDB hbase coprocessor and move it to a directory that hbase can access, such as a hdfs directory. You will need to specify the path of this jar file when you initialize the OlapDB data warehouse.
  • Download and start the OlapDB service include master service, builder service and query service.
  • Download and start the OlapDB Web Management Console. (The web management console must be deployed on the same server as the master service.)

  • Note: quorumServer is the hbase connect address.

    Open browser and visit http://nginxServer:80, we can open the web management console page now.

    01
    Step 2

    Initialize OlapDB Data Warehouse

    OlapDB will prompt you to initialize the OlapDB data warehouse when you open the web management console for the first time, including setting the cluster owner, unique identification code, compression algorithm, data partition number and HBase coprocessor.

    02
    Step 3

    Create Project and Data Analysis Cube Model

    OlapDB can manage multiple projects, each project can contain multiple cube data models. We first need to add a project, then create a cube model and set the detailed parameters of the model, including dimensions, metrics, fact name, index dimension, data expiration time, etc.

    03
    Step 4

    Push Data To OlapDB

    By integrating the Java SDK provided by OlapDB, you can use various ways to push data to the OlapDB data warehouse, including large batch data submission, small batch data submission and submission through Rest interface. Large batch submission is the most efficient way, through parallel submission of multiple servers, OlapDB can process tens of millions of data per second in real time.

    04
    String cubeName = ... ;
    List<JSONObject> records = .....;
    
    //combine fact data to a list 
    List<String> facts = new Vector<String>();
    for(JSONObject record:records){
        facts.add(record.toString());
    }
    
    //sumbit data to OlapDB
    Olap.submitSegment(cubeName, facts);
    Step 5

    Integrate BI Tools to show Analysis Result

    There are various ways to query and analyze data from OlapDB, such as directly querying through JDBC or directly connecting to BI front-end tools to display and analyze the results, such as SuperSet, Tableau.

    05

    Free Trial License

    KEEP IN TOUCH WITH US

    If you have any questions, suggestions, or needs regarding our software or services, please feel free to contact us at any time. We attach great importance to your feedback and will provide you with a professional and satisfactory response as soon as possible. You can contact us through the following methods:

    Room 4-804, Lane 626, Chifeng Road
    Hongkou District, Shanghai, China

    Contact Phone:
    (+86)18918221910

    info@olapdb.com
    michaltina@hotmail.com

    WeChat: 18918221910
    QQ: 1170743

    Obtain a Free Trial License

    OlapDB

    High performance Online Analytical Processing Data Warehouse

    OlapDB is an high performance OLAP (Online Analytical Processing) data warehouse tool based on Cube pre-computation technology and HBase implementation, which can provide efficient real-time analysis and fast query for ultra-large-scale and ultra-fast-growing data.

    MORE INFO

    Contact Us

    Room 4-804, Lane 626, Chifeng Road
    Hongkou District, Shanghai, China

    Contact Phone:
    (+86)18918221910

    info@olapdb.com
    michaltina@hotmail.com

    WeChat: 18918221910
    QQ: 1170743