The Indexima Data Hub

High performance, AI Powered, simple and secure

BECOME A DATA-DRIVEN COMPANY

The challenges of Big Data

Who hasn’t suffered from the slowness of exchanges with the IT teams to obtain one or more data extracts in order to refine their analyses? Who hasn’t encountered any difficulties in understanding why sales figures in Germany fell last month on a particular product benchmark?

To meet the demands of Data Analytics teams and to guarantee them acceptable access times, IT teams are forced to provide fragmentary “extracts” of giant databases.
Data Analytics tools are unable to access the data quickly and easily to exploit large volumes.
The users see themselves reducing their fields of analysis by crossing only a few axes of data, when billions are present today in the information systems.
In the age of Big Data, being DATA-DRIVEN means being able to make all the available data available to users.

The Indexima Data Hub

The world's best solution for a Data-Driven company

INDEXIMA has developed DATA HUB to provide IT teams with the missing “layer” in an ANALYTICS Data-Driven landscape and Data Analytics solutions such as Tableau®, Excel®, Talend, Qlik®, Power BI®, DATAIKU® or MicroStrategy®). The DATA HUB makes it possible to query all Big Data directly at the source, in volumetrics of tens of billions of lines in just a few milliseconds.

The DATA HUB is based on three main components: HyperIndex, DataSpace and K-Store.

How to speed up performance by 1000
HyperIndex are multidimensionally-distributed and persistent “in-memory” indexes. They include pre-aggregations and thus avoid possible OLAP cubes to pre-compile. HyperIndex can answer queries in milliseconds regardless of the volume of the data. Complex queries with “distinct count” or “top N” give instant answers without approximation thanks to a revolutionary pre-aggregation system.

DataSpace is a single, secure access point to all indexed data. It allows, thanks to our AI algorithms, the automatic creation of HyperIndex. This avoids the creation of cubes and other data extracts or Datamarts. User queries are continuously analyzed to automatically propose the HyperIndex best suited to the actual use of the data. Machine Learning algorithms and AI adapt and improve performance as you go.

The K-Store is a new, fully indexable, column-oriented storage format optimized for S3 or HDFS to access detail data. Thus Indexima can respond effectively regardless of the type of request: global aggregation requests on all data or specific queries on fine data.

Comparison with Spark SQL "in-memory"

Example of the response time of a dashboard in production at Mappy with Tableau Software, where each user’s click generates 8 simultaneous SQL queries.

Spark SQL in-memory

Simultaneous requests

Request 1

28

Request 2

26

Request 3

24

Request 4

24

Request 5

21

Request 6

10

Request 7

15

Request 8

19

Indexima

Simultaneous requests

Request 1

0.9

Request 2

0.9

Request 3

0.7

Request 4

0.8

Request 5

0.1

Request 6

0.1

Request 7

0.1

Request 8

0.1

The problem is the OLAP cubes …

Without the INDEXIMA Data Hub

Many Data Analytics solutions require OLAP cubes to be built prior to any analysis. First of all, this may seem contradictory to the “analyst” and BI agile trades, which consists in being able to circulate in the data in order to understand it. On the other hand, and in a Big Data environment, the compilation of these cubes can take several days, thus creating a strong constraint for the users who will not be able to have all the data to generate their tables of analyses and reports.

With the INDEXIMA Data Hub

INDEXIMA proposes a new approach to interrogate Big Data by getting rid of OLAP Cubes by converging several techniques in the same tool: multidimensional indexes (multi-columns), pre-aggregations in-memory, distributed engine oriented column. This makes it possible to answer the majority of the requests in milliseconds no matter the volumetry of the data. For “count distinct” and “top N”, it is possible to include in the indexes aggregates which guarantee very fast answers while returning an exact value without approximation. When the query is for some unindexed columns, the column-oriented engine accesses the data on the disk by loading only the useful blocks with the other indexed columns.

With INDEXIMA, all the power of dataviz tools can be exploited to analyze big data.

Towards a DATA-DRIVEN company

The benefits for the ISD: Performance and flexibility of data access and budget saving

Scalability and performance of 1 giga to several peta data bytes

With the Data Hub, the performance of analytics is visible from a few million lines of data. The scalability of HyperIndex makes it possible to answer queries resulting from analytical solutions in milliseconds up to several petabytes.

Processing data streaming

The Data Hub can index data in a continuous flow or streaming thanks to results refreshed in real time.

Economy on data servers

HyperIndex technology allows scalability on Big Data and absorbs more than 3,000 queries per second for our customers. This results in savings on analytic execution servers of up to 90%. The SQL queries generated by the main dataviz tools on the market are supported by INDEXIMA: groups, joins, and subqueries.

Flexibility for BI teams

Artificial Intelligence automates the creation of HyperIndex. Thus it is not necessary to create data extracts or to compile cubes beforehand. On the other hand, it is possible to add indexes to the demand according to uses. Deletion and transactional update of data is possible without total reindexing.

Centralized security access

The Data Hub includes fine management of user roles and definition of access rights at the table, column and row level. From the workstation, the automatic filtering of data by user guarantees access to the only data that the company authorizes to him or her.

The benefits of INDEXIMA for the operational and business teams: Data Centric

You work directly on all of your data sources. This is the end of the data extracts

The analysis tools connected directly to the Big Data are usually too slow to make the data usable. IT teams need to prepare data “extracts” for acceptable response times. The analyzes then become limited by crossing only a few axes and the marketing / business teams become dependent on the IT for each analysis.

The Data Hub works with all the analytical solutions on the market

The DataHub connects instantly to all visualization and analysis products such as Tableau®, Excel®, QlikView®, Talend®, Dataiku® or MicroStrategy®. Your existing reports and graphs do not require any modification. You do not change your habits. You just have the ability to analyze a lot more data independently.

How to use indexima: on-premises or in the cloud

1

On a Hadoop Cluster

As a Yarn application, INDEXIMA is automatically deployed by Hadoop on multiple nodes in the cluster.
Support for Kerberos-secured clusters and Cloudera, Hortonworks, and MapR distributions.
Reading Hive / Impala tables in CSV, ORC, Parquet, JSON and flat file formats.
The INDEXIMA ODBC and JDBC drivers are the same as for Hive (HiveServer 2), so they support LDAP, Active Directory and Kerberos authentication.
2

In stand Alone

INDEXIMA can be deployed in "standalone" without the need of installing other components.
Linux and Windows support.
3

In the Cloud

INDEXIMA is easily deployed in public clouds. It is natively designed to exploit the elasticity of cloud architectures and especially optimized for direct storage on S3.
INDEXIMA is available on the AWS marketplace.
For Azure and Google Cloud, please contact us.