Defining Business Intelligence
In essence, Business Intelligence (BI) tools are software that ingests business data and presents it in user-friendly views, such as reports, dashboards, charts, and graphs. BI tools enable business users to access different types of data: historical and current, third-party and in-house, as well as semi-structured data and unstructured data such as social media. Users can analyze this information to gain insights into how a business is performing.
BI offers a way for people to examine data to understand trends and derive insights. Organizations can use the insights gained from BI and data analysis to improve business decisions, identify problems or issues, spot market trends, and find new revenue or business opportunities.
BI platforms traditionally rely on data warehouses for their baseline information. A data warehouse aggregates data from multiple data sources into one central system to support business analytics and reporting. BI software queries the warehouse and presents the results to the user in the form of reports, charts, and maps.
Some newer BI solutions can extract and ingest raw data directly using technology such as Hadoop, but warehouses have traditionally been the primary data source of choice in most cases.
Benefits of Business Intelligence
BI gives organizations the ability to ask questions in plain language and get answers they can understand. They can base decisions on what their business data is telling them, whether it relates to production, supply chain, customers, or market trends. Why are sales dropping in this region? Where do we have excess inventory? What are customers saying on social media? BI helps answer such critical questions as these.
BI provides past and current insights into a business. This is achieved via an array of technologies and practices that range from descriptive analytics and reporting to data mining, forecasting, and predictive analytics. By providing an accurate picture of the business at a specific point in time, BI enables an organization to design a business strategy based on factual data.
BI helps organizations become data-driven enterprises, improve performance, and gain competitive advantage. Organizations can:
- Improve ROI by understanding the business and intelligently allocating resources to meet strategic objectives.
- Unravel customer behavior, preferences, and trends, and use the insights to better target prospects or tailor products to changing market needs.
- Monitor business operations, resolve problems, or make improvements on an ongoing basis, fueled by data insights.
- Improve supply-chain management by monitoring activity up and down the line and communicating results with partners and suppliers.
Business Intelligence Best Practices
Organizations benefit when they can fully assess operations and processes, understand their customers, gauge the market, and drive improvement. They need the right tools to aggregate business information from anywhere, analyze it, discover patterns, and find solutions.
The best BI software supports this decision-making process by:
- Connecting to a wide variety of different data systems and data sets, including databases and spreadsheets
- Providing deep analysis, thereby helping users uncover hidden relationships and patterns in their data
- Presenting answers in informative and compelling data visualizations like reports, maps, charts, and graphs
- Enabling side-by-side comparisons of data under different scenarios
- Providing drill-down, drill-up, and drill-through features, enabling users to investigate different levels of data
Advanced BI and analytics systems may also integrate AI and ML to automate and streamline complex tasks. These capabilities further accelerate the ability of enterprises to analyze their data and gain insights at a deep level.
Consider, for example, how IBM Cognos Analytics brings together data analysis and visual tools to support map creation for reports. The system uses AI to automatically identify geographical information. It can then refine visualizations by adding geospatial mapping of the entire globe, an individual neighborhood, or anything in between.
The Lakehouse: Unifying the Best of two Worlds
Vendors have attempted to create the best of two data worlds—data lakes and data warehouses—by combining them into the new technology of the lakehouse. This architecture is designed to provide the flexibility and cost effectiveness of a data lake with the performance and structure of a data warehouse. The lakehouse enables organizations to store data from the exploding number of new sources in a low-cost way and leverage built-in data management and governance capabilities, enabling organizations to power both BI and high-performance ML workloads efficiently and effectively.
Data lakes represent a way to store massive amounts of different data using cheap commoditized hardware and storage via Hadoop, HDFS, or Hive. Organizations recognize that large volumes of unstructured data, even if not suitable for data warehouses, also contain great value—if that data can be extracted. Unfortunately, when trying to analyze data that can be of poor quality, it can also be that the tools and analytical engines used to analyze the content in data lakes might not be as performant as those used for data warehouses. Many data lakes became more like data swamps with data becoming stale, difficult to maintain, and therefore untrustworthy.
Data warehouses enabled organizations to look back at historical data, starting with maybe a six-month window, then a year, then longer, as processing and computing power became more accessible and affordable. Over time, as volumes continued to grow, it became more challenging to store and retain all the data in a data warehouse. Additionally, organizations may have only a few years’ worth of data or only a small slice of the operational data currently stored in their warehouses.
The need to scale compute and storage presents two different sets of needs across data lakes and data warehouses.
Data lakes and data warehouses each provide their own set of capabilities. When combined, scaling and governance can become key challenges, as data lakes and warehouses are designed for different purposes. The market evolved toward cloud-based data warehouses, which offer separation of computing and storage. Technologies such as Red Hat OpenShift, Red Hat Ceph Storage, Amazon S3, and other warehouse engines help solve the problem, making storage and computing inexpensive, readily available, and easier to manage and scale. Compute and storage need to be elastic, able to scale on demand when needed so that organizations are charged only for what they have used over the billable period.
A lakehouse, as shown in Figure 1, attempts to bridge these worlds by combining the best of both into one architecture. That said, these first-generation lakehouses have constraints that limit their ability to address cost and complexity challenges, such as these:
- Single-query engines are set up to support limited workloads, typically just for BI and ML.
- Lakehouses are typically deployed only over the cloud, with no support for hybrid multi-cloud deployments.
- Lakehouses offer minimal governance and metadata capabilities to deploy across an entire ecosystem.
Figure 1: Lakehouses try to combine the best of data warehouses and data lakes
The Impact of the Lakehouse on BI
Combining the best features, capabilities, cost/performance characteristics, and other attributes of an enterprise data warehouse and a data lake, a lakehouse offering such as watsonx.data can make BI more effective. A lakehouse enables a BI solution to access more of an enterprise’s trusted and valued data so that AI systems can discover and reveal deeper business insights and opportunities.
How a lakehouse is built determines its effectiveness. An effective lakehouse architecture is shown in Figure 2 and should offer the key capabilities of:
- Scaling for BI across all data with multiple high-performance query engines optimized for different workloads (for example: Presto, Spark, Db2, Netezza, etc.)
- Enabling data sharing between these different engines
- Offering shared common data storage across data lake and data warehouse functions, avoiding unnecessary time-consuming ETL/ELT jobs
- Eradicating unnecessary data duplication and replication
- Leveraging an open and flexible architecture built on open source without vendor lock-in
- Deploying across hybrid cloud environments (on-premises, private, public clouds) on multiple hyperscalers
- Offering a wide range of prebuilt integration capabilities incorporating data-fabric capabilities.
- Offering organizations the flexibility to start their lakehouse implementations standalone and later expand to a bigger integrated AI and data platform
- Providing global governance and security across all data in the hybrid multi-cloud enterprise leveraging data-fabric capabilities
- Being extensible through APIs, strong value-add partner ecosystems, accelerators, and third-party solutions
These core capabilities are offered by wastonx.data.
Figure 2: An effective lakehouse architecture for BI
Summary
In closing, lakehouses that can offer the capabilities that form part of an integrated AI and data system such as watsonx provide the potential for organizations of any size to leverage deep insights through their AI based business intelligence solutions as a consumable service with which anyone should be able to interact. While this blog post skims the surface on lakehouses and BI, a more in-depth read is available in my new book “The Lakehouse Effect – A New Era for Data Insights and AI”, which is available for download at no cost.

