I’ve just released the second print of a book I co-authored titled “Artificial Intelligence – Evolution and Revolution”. Here’s chapter 8 – just as an extract that you might find interesting on the value of taking a hybrid multicloud platform approach for implementing Data and AI related projects.
If you enjoy the read, a complimentary copy of the book is available to download at the end of this blog post.
“Delivering a Hybrid Multicloud Data and AI Platform”
So far, we have discussed the value, as well as many aspects, challenges, and capabilities of ML and AI. But what good is something of value if it remains out of reach for the vast majority or is limited to just a select few of highly skilled people in the data science community?
To that end, the industry needs to make the aforementioned technologies more accessible and consumable. Some vendors or companies publish APIs to a range of ML and AI services, but that alone still infers a level of technical ability that might be out of reach for many. APIs are just one small aspect of the overall data science experience. While some people may like to build a vehicle from a kit or individual components, the vast majority of the public prefers to buy a ready-to-drive vehicle that meets their long-term needs to take them on their many journeys.
The Best Performers Are Data-Driven
While many organizations are struggling with the challenges of data complexity, some organizations are finding success as they embrace a modern data strategy. Data-savvy organizations are more likely to leverage data in a manner that informs decision-making and to strategically address unmet needs with new data-driven business models. When you provide organizationwide access to previously siloed data, configure governance policies, and address data-quality concerns, you are ready to make large strategic AI investments that can ultimately lead to outperforming revenue targets and thereby increase profitability.
The IBM Institute of Business Value (IBV) conducts regular surveys of organizations to identify market outperformers and looks for patterns that set them apart. The 20th edition of the C-Suite study was published in 2020 and draws input from over 13,000 respondents across multiple C-suite roles, industries, and countries. In this most recent edition of the study, companies are categorized based on their ability to create value from data and the degree to which they have integrated their data and business strategy. Identified as “torchbearers,” 9% of companies surveyed have shown the most leadership in this area. There are some striking numbers in this study about these “torchbearer” companies:
- They are 88% more likely to make data-driven decisions to advance their corporate strategies.
- They are 112% more likely to find gaps and fill them with data-driven business models.
- They are 300% more likely to enable the free sharing of data across silos and different business functions.
- They are 149% more likely to make large strategic investments in AI technologies.
- And most importantly, they are 178% more likely to outperform others in their industry in the areas of revenue and profitability.
Source: IBM Institute of Business Value Study of 13,000 c-suite leaders:
The bottom line: You must outperform your competitors or risk being outperformed by them.
What IBM has learned from countless AI projects is that every step of the journey is critical. AI is not magic; it requires a thoughtful and well-architected approach. For example, the majority of AI failures are due to problems in data preparation and data organization, not the AI models themselves. Success with AI models depends on achieving success first with how you collect and organize data.
The AI Ladder, shown in Figure 8.1, represents a prescriptive approach to help customers overcome data challenges and accelerate their journey to AI, no matter where they are on their journey. It enables them to simplify and automate how an organization turns data into insights by unifying the collection, organization, and analysis of data, regardless of where it lives. By climbing the ladder to AI, enterprises can build a governed, efficient, agile, and future-proof approach to AI.
The AI Ladder has four steps (often referred to as “rungs”):
Figure 8.1: Four steps of the AI Ladder
- Collect: Make data simple and accessible
Collect data of every type, regardless of where it lives, enabling flexibility in the face of ever-changing data sources. Note that “collect” does not mean put data all in one place. In fact, quite the opposite. It means virtualizing the data, allowing access to wherever it lives as if it were consolidated.
- Organize: Create a business-ready analytics foundation
Organize collected data into a trusted, business-ready foundation with built-in governance, protection, and compliance.
- Analyze: Build and scale AI with trust and transparency
Analyze data in automated ways and benefit from AI models that empower teams to gain new insights and make better, smarter decisions.
- Infuse: Operationalize AI throughout the business
Infuse AI throughout the business (across multiple departments and within various processes), drawing on predictions, automation, and optimization.
These steps can be further broken down into a set of key capabilities, shown in Figure 8.2.
Figure 8.2: AI Ladder capabilities
Supporting the AI Ladder is the concept of modernization, which is how customers can simplify and automate how they turn data into insights by unifying the collection, organization, and analysis of data, regardless of where it lives, within a secure hybrid cloud platform.
The following priorities are built into the IBM technologies that support this AI ladder:
- Simplicity: Different kinds of users can leverage tools that support their skill levels and goals, from “no code” to “low code” to programmatic.
- Integration: As users go from one rung of the ladder to the next, the transitions are seamless.
- Automation: The most common and important tasks have intelligence baked into them so that users focus on innovation rather than repetitive tasks.
Reducing Complexity with a Data Fabric
Enterprises face all sorts of complexities in implementing their uses cases using current approaches, such as providing a 360-degree view of the data and Master Data Management use cases, regulatory compliance, operational analytics, business intelligence, and data science, to name a few.
As many infrastructures grow, enterprises can often face higher compliance, security, and governance risks. This can result in complexity and a high level of effort to enforce policies and perform stewardship. Complex infrastructures can lead to higher costs of integrating data and stitching data pipelines across multiple platforms and tools. In turn, these can bring more reliance on IT, making collaboration more challenging and possibly slowing time to value, whereas business-led self-service analytics, insights, and democratization of data could help deliver greater business agility.
What’s needed is a new design or approach that provides an abstraction layer to share and use data, with data and AI governance, across a hybrid cloud landscape—without a massive pendulum swing to having everything de-centralized. It’s a balance between what needs to be logically or physically decentralized and what needs to be centralized. For example, an enterprise can have multiple catalogs, but there can be only one source of truth for the global catalog.
A data fabric is a data management architecture that helps optimize access to distributed data and intelligently curate and orchestrate it for self-service delivery to data consumers. Some of a data fabric’s key capabilities are listed below:
- Architected to help elevate the value of enterprise data by providing users with access to the right data just in time, regardless of where or how it is stored.
- Architecture agnostic to data environments, data processes, data use and geography, while integrating core data management capabilities.
- Automates data discovery, governance, and consumption, delivering business-ready data for analytics and AI.
- Helps business users and data scientists access trusted data faster for their applications, analytics, AI and machine learning models, and business process automation, helping to improve decision making and drive digital transformation.
- Helps technical teams use simplify data management and governance in complex hybrid and multicloud data landscapes while significantly reducing costs and risk.
The data fabric approach should enable organizations to better manage, govern, and use data to balance agility, speed, SLAs, and trust. Trust covers deep enforcement of governance, security, and compliance. There is also the total cost of ownership and performance (TCO/P). This covers integration costs, egress costs, bandwidth costs, processing costs vs. performance, etc. A data fabric could offer these benefits by orders of magnitude over the complexities often seen across many enterprise infrastructures.
IBM Cloud Pak for Data: A Hybrid Cloud Data and AI Platform
The IBM Cloud Pak for Data embodies everything you have just read about in a unified Enterprise Insight Platform (EIP) that runs on multiple vendors’ clouds and infrastructures. EIP is a term used by industry analysts and consultants as a category for describing integrated sets of data management, analytics, and development tools.
The first core tenet of Cloud Pak for Data is that you can run it anywhere. You can co-locate it where you are making your infrastructure investments. This means you can deploy Cloud Pak for Data on many major cloud vendor’s platforms, as well as the IBM Cloud. You can also deploy it on premises for the case in which you are developing a hybrid cloud approach. Finally, on IBM Cloud, you can subscribe to Cloud Pak for Data-as-a-Service if you need a fully managed option where you only pay for what you use. Cloud Pak for Data helps organizations to have deployment flexibility to run anywhere.
Figure 8.3: Cloud Pak for Data
Cloud Pak for Data is built on the foundation of Red Hat OpenShift. This provides the flexibility for customers to scale across any infrastructure using the leading open-source steward: Red Hat. Red Hat OpenShift is a Kubernetes-based platform that allows IBM to deploy software through a container-based model, delivering greater agility, control, and portability.
IBM’s Cloud Pak offerings all share a common control plane, which makes administration and integration of diverse services easy.
Cloud Pak for Data includes a set of preintegrated data services that allow you to collect information from any repository, such as databases, data lakes, data warehouses, etc. The design point here is for customers to leave the data in all the places where it already resides, but to its users it seems like the enterprise data is in one spot.
Once all of an enterprise’s data has been connected, industry-leading data organization services can be deployed that allow for the development of an enterprise data catalog. This capability enables a “shop for data” type of experience and enforces governance across all data sources, thereby enabling data consumers to have a single place to go for all their data needs.
With your enterprise data connected and cataloged, Cloud Pak for Data presents a wide variety of data analysis tools out of the box. For example, there is a wealth of data science capabilities that cater to all skill levels (meaning no-code, low-code, and all code). Users can quickly grab data from the catalog and instantly start working toward generating insights in a common workflow built around the “project” concept.
For additional capabilities, a large set of extended services is available for Cloud Pak for Data that presents more-specialized data management and analytics capabilities. These range from powerful IBM solutions, like Planning Analytics with Watson, to solutions from IBM Partners that offer business ontology creation, open-source databases, and more.
Automation: The Key to Agility
Cloud Pak for Data takes automation to the next level. Watson Query capabilities allow you to leave your data where it resides and connect to all structured or unstructured data sources in your enterprise without data movement. Building on that data collection, AutoCatalog and AutoPrivacy supercharge data discovery and ensure enforcement of governance policies across many sources and users. On top of this, AutoAI makes it easy for data analysts and data scientists to generate new models in a fast, low-code manner with an award-winning graphical interface and design. Figure 8.4 summarizes this.
Figure 8.4: Automation capabilities within Cloud Pak for Data
Let’s dive a little deeper into these automation-capabilities.
- Watson Query: A high-performance, universal query engine that simplifies the data landscape by enabling clients to use the same query across disparate data sources, including data warehouses, data lakes, and streaming data, thereby saving time and resources that would typically go into moving data and maintaining multiple query engines. In conjunction with the platform’s existing data virtualization capabilities, Watson Query empowers users to easily query data across hybrid, multi-cloud, and multi-vendor environments. Watson Query includes preintegrated data governance capabilities; thus data consumers are assured of the quality and validity of the data.
- AutoCatalog: Automates how data is discovered and classified to maintain a real-time catalog of data assets and their relationships across disparate data landscapes. A critical capability of the intelligent data fabric within the platform, AutoCatalog helps overcome the challenges faced by managing a complex hybrid and multi-cloud enterprise data landscape and helps ensure that data consumers can easily find and access the right data, at the right time, regardless of location.
- AutoPrivacy: Employs AI to intelligently automate the identification, the monitoring, and, subsequently, the enforcement of policies on sensitive data across the organization. AutoPrivacy is a key aspect of the universal data privacy framework available within IBM Cloud Pak for Data. Spanning the entire data and AI lifecycle, this framework allows business leaders to provide the self-service access data consumers need without sacrificing security or compliance. Build a better strategy for governance risk and compliance by eliminating compliance “blind spots” and minimizing risk.
- AutoAI: Automates data preparation, model development, and feature engineering to train and deploy top-performing models in minutes. Simplify AI lifecycle management to build models faster, accelerate deployment, and open up AI to broader skill sets.
Your Data and AI: How and Where You Want It
IBM’s open information architecture for AI is built upon Cloud Pak for Data on Red Hat OpenShift, built for a hybrid cloud world. What does this mean? In one word: flexibility. To further explain, consider the following:
- If your organization is in a place where you need to manage as little IT as possible, you can consume Cloud Pak for Data entirely through an as-a-service model by subscribing to the integrated family of data services on the IBM Cloud.
- If your organization needs the flexibility and control of running the data infrastructure in your own data center, or on IaaS (Infrastructure as a Service) from your preferred cloud vendor, you can deploy OpenShift and then Cloud Pak for Data on your local or cloud estate.
- If high performance and total control are needed, you can choose the Cloud Pak for Data System, which is a hyper-converged infrastructure (an optimized appliance) that combines compute, storage, and network services that are optimized for OpenShift and data and AI workloads.
Regardless of the form factor and the degree of management control needed, Cloud Pak for Data provides cloud-native data management services that modernize how businesses collect, organize, and analyze data and then infuse AI throughout their organizations.
In summary, Cloud Pak for Data is designed to provide a unified, integrated user experience to collect, organize, and analyze data and infuse AI throughout the enterprise. Much of the complexities of managing and orchestrating data and other artifacts can be abstracted through the data fabric architectural approach. Think of the data fabric as the “magic” that can help make more of an organization’s data, applications, and services ready for AI by automating and augmenting a lot of the steps that would otherwise have to be undertaken by large groups of architects, administrators, and data scientists.
Having read this exact, if you are interested in reading the whole book you can download it at no cost here.