Site icon Steven Astorino

Powering your Business Intelligence with a Lakehouse

Defining Business Intelligence

In essence, Business Intelligence (BI) tools are software that ingests business data and presents it in user-friendly views, such as reports, dashboards, charts, and graphs. BI tools enable business users to access different types of data: historical and current, third-party and in-house, as well as semi-structured data and unstructured data such as social media. Users can analyze this information to gain insights into how a business is performing.

BI offers a way for people to examine data to understand trends and derive insights. Organizations can use the insights gained from BI and data analysis to improve business decisions, identify problems or issues, spot market trends, and find new revenue or business opportunities.

BI platforms traditionally rely on data warehouses for their baseline information. A data warehouse aggregates data from multiple data sources into one central system to support business analytics and reporting. BI software queries the warehouse and presents the results to the user in the form of reports, charts, and maps.

Some newer BI solutions can extract and ingest raw data directly using technology such as Hadoop, but warehouses have traditionally been the primary data source of choice in most cases.

Benefits of Business Intelligence

BI gives organizations the ability to ask questions in plain language and get answers they can understand. They can base decisions on what their business data is telling them, whether it relates to production, supply chain, customers, or market trends. Why are sales dropping in this region? Where do we have excess inventory? What are customers saying on social media? BI helps answer such critical questions as these.

BI provides past and current insights into a business. This is achieved via an array of technologies and practices that range from descriptive analytics and reporting to data mining, forecasting, and predictive analytics. By providing an accurate picture of the business at a specific point in time, BI enables an organization to design a business strategy based on factual data.

BI helps organizations become data-driven enterprises, improve performance, and gain competitive advantage. Organizations can:

Business Intelligence Best Practices

Organizations benefit when they can fully assess operations and processes, understand their customers, gauge the market, and drive improvement. They need the right tools to aggregate business information from anywhere, analyze it, discover patterns, and find solutions.

The best BI software supports this decision-making process by:

Advanced BI and analytics systems may also integrate AI and ML to automate and streamline complex tasks. These capabilities further accelerate the ability of enterprises to analyze their data and gain insights at a deep level.

Consider, for example, how IBM Cognos Analytics brings together data analysis and visual tools to support map creation for reports. The system uses AI to automatically identify geographical information. It can then refine visualizations by adding geospatial mapping of the entire globe, an individual neighborhood, or anything in between.

The Lakehouse: Unifying the Best of two Worlds

Vendors have attempted to create the best of two data worlds—data lakes and data warehouses—by combining them into the new technology of the lakehouse. This architecture is designed to provide the flexibility and cost effectiveness of a data lake with the performance and structure of a data warehouse. The lakehouse enables organizations to store data from the exploding number of new sources in a low-cost way and leverage built-in data management and governance capabilities, enabling organizations to power both BI and high-performance ML workloads efficiently and effectively.

Data lakes represent a way to store massive amounts of different data using cheap commoditized hardware and storage via Hadoop, HDFS, or Hive. Organizations recognize that large volumes of unstructured data, even if not suitable for data warehouses, also contain great value—if that data can be extracted. Unfortunately, when trying to analyze data that can be of poor quality, it can also be that the tools and analytical engines used to analyze the content in data lakes might not be as performant as those used for data warehouses. Many data lakes became more like data swamps with data becoming stale, difficult to maintain, and therefore untrustworthy. 

Data warehouses enabled organizations to look back at historical data, starting with maybe a six-month window, then a year, then longer, as processing and computing power became more accessible and affordable. Over time, as volumes continued to grow, it became more challenging to store and retain all the data in a data warehouse. Additionally, organizations may have only a few years’ worth of data or only a small slice of the operational data currently stored in their warehouses.

The need to scale compute and storage presents two different sets of needs across data lakes and data warehouses. 

Data lakes and data warehouses each provide their own set of capabilities. When combined, scaling and governance can become key challenges, as data lakes and warehouses are designed for different purposes. The market evolved toward cloud-based data warehouses, which offer separation of computing and storage. Technologies such as Red Hat OpenShift, Red Hat Ceph Storage, Amazon S3, and other warehouse engines help solve the problem, making storage and computing inexpensive, readily available, and easier to manage and scale. Compute and storage need to be elastic, able to scale on demand when needed so that organizations are charged only for what they have used over the billable period. 

A lakehouse, as shown in Figure 1, attempts to bridge these worlds by combining the best of both into one architecture. That said, these first-generation lakehouses have constraints that limit their ability to address cost and complexity challenges, such as these:

Figure 1: Lakehouses try to combine the best of data warehouses and data lakes

The Impact of the Lakehouse on BI

Combining the best features, capabilities, cost/performance characteristics, and other attributes of an enterprise data warehouse and a data lake, a lakehouse offering such as watsonx.data can make BI more effective. A lakehouse enables a BI solution to access more of an enterprise’s trusted and valued data so that AI systems can discover and reveal deeper business insights and opportunities.

How a lakehouse is built determines its effectiveness. An effective lakehouse architecture is shown in Figure 2 and should offer the key capabilities of:

These core capabilities are offered by wastonx.data.

Figure 2: An effective lakehouse architecture for BI

Summary

In closing, lakehouses that can offer the capabilities that form part of an integrated AI and data system such as watsonx provide the potential for organizations of any size to leverage deep insights through their AI based business intelligence solutions as a consumable service with which anyone should be able to interact. While this blog post skims the surface on lakehouses and BI, a more in-depth read is available in my new book “The Lakehouse Effect – A New Era for Data Insights and AI”, which is available for download at no cost.

Exit mobile version