FAQ

Have a question about the data sector, or about Indexima's solution?
The FAQ definitely holds the answer for you.

Choose your theme
Data &
Analytics
The Indexima Technology
Quick Start &
Best Practices
Data & Analytics

What is the difference between a data lake and a data warehouse?

The data lake and the data warehouse are two types of data storage.

Their main distinction lies in the structure of the data they contain. A data lake generally stores raw, untransformed data. A data warehouse, on the other hand, stores transformed and cleaned data.

A data lake and a data warehouse are also distinguished by the nature of the data they contain. The raw data of a data lake is data whose purpose is still undetermined. The transformed data in a data warehouse has already been used for a specific purpose within the company.

Another difference between a data lake and a data warehouse is that they are not intended for the same users: the raw data of a data lake requires the expertise of a data scientist to be understood and used, while the structured data of a data warehouse is accessible to non-specialists.

Finally, the data lake and the data warehouse differ in their accessibility and ease of use. The data lake is easier to consult and modify because it is unstructured. On the other hand, the data warehouse is more rigid to manipulate.

What are the options for storing my data for my analytical use cases?

Your data can be stored in a database, usually a data lake or a data warehouse.

These databases can either be in the Cloud, or on-premises in physical servers.

What is a Cloud data lake?

A Cloud data lake is a data lake (a solution allowing to store data) operating only in the Cloud.

Amazon S3, Microsoft Azure Data Lake Storage and Google Cloud Platform are examples of Cloud data lake.

What is the purpose of a data visualization tool?

Data visualization (or DataViz) allows you to synthesize and give meaning to raw data through simple, clear and understandable visual representations. These representations can take the form of graphs, pie charts, timelines, infographics, diagrams etc.

DataViz tools are softwares allowing this synthesis. They are directly used by various business users seeking to analyze and understand the data, with the aim of transforming it into a decision-making tool.

Among the best-known DataViz tools are Tableau, Qlik, PowerBI, MicroStrategy and Excel.

Indexima is compatible with all DataViz tools.

What is ETL?

Extract Transform Load (ETL) is a type of software that allows to collect raw data from various sources, in order to restructure them and load them into a data warehouse.

ETL is composed of three steps:

  • Extraction: the collection of data from one or more sources.
  • Transformation: ETL reformats and transforms the data to make it compatible with the data warehouse
  • Loading: the transfer of the transformed data into the data warehouse

ETLs support structured and unstructured data, from both on-premise and cloud sources. They are scalable, flexible and secure platforms, allowing real-time ingestion and enrichment of large volumes of data.

What does “data engineering” mean?

Data engineering is the field of data science that focuses on the practical applications of data collection and analysis.

The data engineer is responsible for the creation and maintenance of the company’s data environment. He/she works on the design, creation and improvement of the infrastructure that enables data access and management.

The data engineer works upstream of the data analyst: it is the data engineer who prepares the data and makes it accessible and exploitable for the data analyst.

Who are the main Cloud platform providers?

The leaders of the Cloud platform market are Amazon with AWS, Google with Google Cloud Platform, Microsoft with Microsoft Azure and Oracle with Oracle Cloud Infrastructure.

What is the SQL language?

SQL is the computer language for manipulating data and operating relational databases.

The queries sent by DataViz tools to databases are in SQL language.

It is essential for any software within an enterprise’s data architecture to be expressed in SQL in order to communicate with other platforms, software, tools and databases. This is the language used by Indexima.

The Indexima Technology

How do HyperIndexes work?

Indexima HyperIndexes are subsets of the targeted data. They are pre-calculated and aggregated by Indexima, depending on data usage. They are totally transparent.

In concrete terms, the way in which data is queried by DataViz tools is detected by Indexima, which allows Indexima to anticipate requests and to precalculate them. Indexima will then aggregate the data necessary to answer these queries: this is how HyperIndexes are created. By aggregating the data, Indexima greatly reduces the response time to queries. Indeed, it is no longer necessary to read the entire target data before providing an answer; reading the aggregated and pre-calculated HyperIndex is sufficient.

For example, your users are used to cross-referencing product data with their prices and deducing sums or averages. Indexima will be able to pre-calculate these aggregations. These frequently used aggregations will be aggregated into a ready-to-use HyperIndex, allowing to answer in a few milliseconds to the query sent.

What is the difference between a datamart, a cube and Indexima HyperIndexes?

Unlike HyperIndexes, a datamart is not necessarily an aggregation. It can simply be a partial view of the data. Sometimes, a datamart implies large volumes, which do not favor a fast response time to queries, unlike HyperIndexes. The datamart segments a portion of the target data to meet a business need, regardless of its size; the HyperIndex seeks to be as small and as aggregated as possible to achieve this, for performance reasons.

The functioning of a cube and a HyperIndex is relatively similar. It is an aggregation of the data made in order to boost the speed of response to requests. The main difference lies in their creation: HyperIndexes are automatically and instantly created by Indexima thanks to our AI. Whereas the creation of cubes requires the intervention of a data engineer, which slows down their creation considerably (up to several weeks). Moreover, thanks to Machine Learning algorithms, HyperIndexes are self-learning. They automatically optimize themselves in a constant manner. If an HyperIndex that was created is the past is not useful anymore, it will be automatically deleted. This guarantee an always optimized layer of HyperIndexes.

How does Indexima connect to my data sources?

Indexima is able to connect to almost all available data sources. The connection is simple, performed in a few clicks through the Indexima interface, according to the authentication methods of each data source provider.

Indexima has two connection modes: the data ingestion mode and the external table mode.

The ingestion mode consists of Indexima replicating the data from the underlying source in column-oriented format.

The external table mode means that Indexima does not replicate the data. Indexima will exploit the data source directly, and only create HyperIndexes from the data stored in the underlying source. This method not only avoids copying the whole data, but also combines the scanning power of the source data (as it is the case for Snowflake) and the high performance in terms of BI queries by Indexima. Moreover, the external table mode also allows to take advantage of the scalability of the source when it is in the cloud.

The external table mode is available on four platforms: Microsoft Synapse, Amazon Redshift, Snowflake and Google Big Query.

Data synchronization can be done in two ways: scheduled or in auto-synchronization. You can program this synchronization via your data preparation tools. Depending on the use case, this synchronization is usually done once a day.

How does Indexima connect to my data visualization tools?

Indexima supports the SQL language and the Hive communication mode. Thus, Indexima is able to communicate with all data visualization tools that have an embedded Hive connector. This is the case with solutions such as Tableau, Microstrategy, Qlik, PowerBI…

How do you respond to queries if you don't copy the data?

The external table data connection mode allows Indexima to respond to queries without copying the data. When the data visualization tool sends a query to Indexima, Indexima is able to delegate it. Indexima thus acts as an intermediary between the data visualization tool and the database. Depending on the usage, Indexima will recognize similarities in the traffic it executes. In this case, Indexima will copy a very aggregated version of the data (up to 2% of the total data) and mount it in memory: this is the creation of a HyperIndex. Indexima then leaves its role of intermediary. The HyperIndex will allow the query to be answered more quickly than the underlying database, thus optimizing the performance of the BI.

What is the impact of HyperIndexes on a data warehouse in the Cloud?

The creation of HyperIndexes greatly reduces the cost of using the database. The database will not be requested as much (or at all) because Indexima will take care of answering queries for it. This will allow you to either make a net saving if the billing by the database is done on-demand, or to save money by decommissioning servers that have become useless as a result of the reduction in the number of requests (now collected by Indexima). For example, if you are using Snowflake, it will be possible for you to go from a L to an XS sizing thanks to Indexima, thus greatly reducing your costs with this provider. Generally speaking, Indexima can handle between 90 and 95% of your data traffic dedicated to visualization.

What are the main use cases for Indexima?

Indexima is optimized for modern BI contexts, where high-level aggregations are performed with an overview of many KPIs and metrics. Mainly, Indexima evolves on three main use cases:

  • Infrastructure costs are low and performance is good, but business time costs are particularly high: Data preparation and cube creation allow for good performance, but it requires a lot of time to create and maintain them. Indexima makes it possible to maintain this same level of performance by greatly accelerating the process and freeing up the teams’ time, thanks to its automatic indexing engine.
  • Performance is not up to par, with very slow responses to queries: Indexima accelerates BI queries by 1000, thus drastically reducing query response time and dashboard refresh time.
  • The performances are satisfactory but the costs linked to my current architecture are very high: Companies with satisfactory performances manage to reach them by increasing the power of their data warehouse on demand, but this has a price (up to several tens of thousands of euros.) Since Indexima does not work on a total data scanning paradigm, there will no longer be a need for as much power to provide the same results, our solution allows for a drastic reduction in the cost of data warehouses in the Cloud.

Is Indexima optimized for my industry?

Indexima is suitable for all business sectors. One of the conditions of use is that the company processes large volumes of data (from a hundred million rows) and performs analytics on its data.

How is Indexima used by data engineers?

Data engineers are in charge of preparing and making data available. Indexima is a tool with which they can centralize and connect the company’s data sources. Then, they monitor Indexima and make sure that the queries are optimized as agreed.

What kind of queries can Indexima handle?

Indexima is optimized to respond to analytical queries and thus responds instantly to the analytical tools that query it.

What language does Indexima use?

Indexima uses the SQL language. The communication protocol is the same as Hive. Thus, we are compatible with all data visualization tools that accept a Hive driver.

Does Indexima do ETL (Extract Transform Load)?

No, Indexima does not do ETL. Indexima is a complementary tool to ETL.

When doing ETL, some data engineers perform the process of extracting the data, then transforming it and reloading it several times in a row, in order to aggregate the data and make it faster to use. Indexima allows to replace these multiple transformations and reloads with its HyperIndexes.

Do I need any training to use Indexima?

There is no real training required, except for an onboarding phase following the purchase of the solution. In addition, you will have already attended a two-day PoC where a demo of Indexima will have been performed in your infrastructure. In addition, a documentation site is available to you. Our teams are also always available to help you.

Do you have documentation?

Yes, documentation on Indexima’s technology is available here. Do not hesitate to contact us for any questions.

Quick Start & Best practices

Can I access a trial version of Indexima?

You can try Indexima for 14 days. To do so, you can make a request to our teams and choose between two options: test the Indexima SaaS offer available on AWS, or install Indexima directly on your computer.

Do you have a SaaS offer?

Indexima is available both on-premise and as a SaaS. Indexima’s SaaS offering is 100% managed, and allows you to accelerate your analytical use cases with ease. Learn more here.

I am interested in Indexima, how can I contact you?

You can contact us directly by email, or by phone at +33 9 72 20 08 23. Our teams will answer you quickly.

Do you carry out PoCs (Proof of Concept)?

In order to demonstrate Indexima’s efficiency, a two-day PoC can be performed directly within your data architecture. This PoC takes place after an initial discussion with you to discuss your needs in terms of analytics. For more information or if you are interested, please contact our teams.