When we think about Green IT, we first think about governance. How can we implement a relevant and innovative environmental strategy in our organization? Then we think about hardware. How can we reduce the ecological impact of our devices, of our data centers? We also think about the IT infrastructure, including the cloud. But do we think about our data platform? We should. Here are some actions that you can take to make your data platform greener.
According to McKinsey, several factors can increase the ecological impact of a data platform. Among them, we have redundant communications, computation and storage. These issues can be solved by limiting the number of services involved in each task and merging microservices that are solving similar problems to reduce redundant communications.
To make a data platform greener, some of the work must be dedicated to the examination of the data. Indeed, it is paramount to carefully examine data sources to make sure that they match the business needs. Once the right datasets are chosen, it is important to pay attention to their size. As Amazon explains, it is good to compress, filter and aggregate the data before ingestion, so the latter mobilizes the least possible volume of resources. Once the data is ready, it is possible to use an event-driven serverless architecture for the ingestion. The resources will then only be used when they are needed. Now that we are done with the ingestion, let’s go through the compute.
Data partitioning and bucketing help reduce the volume of data scanned with each query, which reduces the volume of resources mobilized. This reduction can be augmented by batch data processing if it matches your goals. You can also consider demand shifting, which entails shifting the compute to regions or hours where the carbon intensity is lessened, or demand shaping, which is the “shaping” of the demand to match the available bandwidth. Finally, caching minimizes data movement.
The ingestion and the treatment aren’t everything you need to focus on when you want to make your data platform sustainable. You also must think about storage. As we said earlier, you can start by compressing your data. You also need to take time and think about backups, which can occupy storage space in vain. You must create the right backups and conduct purges regularly or automatically to cleanse the storage space. Also, continuously collecting logs can unnecessarily fill up your storage with unused data. Lastly, it is important to choose wisely where the data is stored. If you have a datalake, store your data in an online tier if it is frequently accessed and/or modified, and move it to an offline tier if it isn’t.
Finally, it is essential to put KPIs in place to measure the progress made on the ecological impact of the data platform and detect opportunities for improvement.
On a broader level, the greenification of the data platform is more and more supported by FinOps teams, who act as a bridge between management control and IT departments. Their mission is to respond to business needs at the lowest possible cost. By trying to reduce data platform costs, they can participate in the optimization of resources and practices, leading to a better energy performance.
On another note, regarding the architecture of the data platform, a base code applying pure functions and the limitation of the number of abstraction layers can reduce the volume of necessary computations. The company can also add a service to clean, validate and aggregate the data and thus reduce the risk of redundant tasks and storage.
At Indexima, that’s our mantra: reducing data scans as much as possible and reducing the query treatment time to decrease the cloud data warehouse consumption and thus the ecological impact of our clients’ data platform.