At IBM THINK in May, IBM watsonx was previewed to much fanfare. I was present and witnessed firsthand the buzz and excitement from our customers as they learned about our new AI and data platform.
IBM watsonx is a new AI and data platform that empowers enterprises to scale and accelerate the impact of AI across the business by leveraging data wherever it resides. IBM software products are embedding watsonx capabilities across digital labor, IT automation, security, sustainability, and application modernization to help unlock new levels of business value for clients.
IBM watsonx offers an AI development studio with access to IBM-curated and trained foundation models and open-source models, access to a data store to enable the gathering and cleansing of training and tuning data, and a toolkit for the governance of AI into the hands of businesses that will provide a seamless end-to-end AI workflow that will make AI easier to adapt and scale.
The platform consists of three unique product sets to help address these needs as shown in the figure below.
IBM watsonx – Scaling and accelerating the impact of AI with trusted data.
On July 7 2023, IBM watsonx became generally available. Here, I highlight the watsonx.ai and watsonx.data components of the AI and data platform.
IBM watsonx.ai is a next generation enterprise studio for AI builders to train, test, tune, and deploy both traditional machine learning and new generative AI capabilities powered by foundation models through an open and intuitive user interface.
The AI studio provides a range of foundation models, training and tuning tools, and cost-effective infrastructure that facilitate the entire data and AI lifecycle, from data preparation to model development, deployment, and monitoring.
The studio also includes a foundation model library that gives users easy access to IBM curated and trained foundation models. Foundation models use a large, curated set of enterprise data, backed by a robust filtering and cleansing process and auditable data lineage. These models are being trained not just on language, but on a variety of modalities, including code, time-series data, tabular data, geospatial data, IT events data, and more.
The watsonx.ai studio builds upon Hugging Face’s open-source libraries and offers thousands of Hugging Face open models and datasets. This is part of IBM’s commitment to deliver an open ecosystem approach that allows organizations to leverage the best models and architecture for their unique business needs.
For many years, organizations have been trying to manage a combination of on-premises and cloud-native warehouses and bespoke data lakes – common for enterprise architectures today – juggling cost, siloed data, and data governance are constant challenges. A lakehouse attempts to combine the perceived cost benefits of a data lake with the data structure and data management capabilities of a data warehouse. That said, many first generation lakehouses have constraints that limit their ability to address cost and complexity challenges which often include:
- Single query engines set up to support limited workloads – typically just business intelligence or machine learning
- Deployed over cloud only with no support for hybrid multicloud deployments
- Minimal governance and metadata capabilities to deploy across your entire ecosystem
IBM watsonx.data is designed to address these limitations as a fit-for-purpose data store built on an open lakehouse architecture optimized for governed data and AI workloads, supported by querying, governance, and open data formats to access and share data. A high-level architecture is shown below.
The solution is designed to manage workloads both on-premise, across hybrid multi-cloud environments – leveraging internal and external datasets.
Through workload optimization, with this solution, an organization can expect to significantly reduce data warehouse costs. Savings may vary depending on configurations, workloads, and vendors.
IBM watsonx.data allows users to access robust data through a single point of entry while applying multiple fit-for-purpose query engines to uncover valuable insights.
It also provides built-in governance tools, automation, and integrations with an organization’s existing databases and tools to simplify set-up and user experience.
IBM watsonx.data high-level architecture
Some of the key capabilities of watsonx.data are:
- Scales for business intelligence across all data with multiple high performance query engines optimized for different workloads (For example: Presto, Spark, Db2, Netezza, etc.).
- Data sharing between these different engines
- Shared common data storage across data lake and data warehouse functions – avoiding unnecessary time-consuming ETL / ELT jobs.
- Reduces unnecessary data duplication and replication.
- Provides consistent governance, security and user experience across hybrid multiclouds.
- Leverages an open and flexible architecture built on open source without vendor lock-in.
- Can be deployed across hybrid cloud environments (on premises, private, public clouds) on multiple hyperscalers.
- Offers a wide range of prebuilt integration capabilities incorporating IBM data fabric capabilities.
- Offers organizations the flexibility to start their lakehouse implementation stand alone and later expand to the IBM Cloud Pak for Data platform configuration.
- Global governance across all data in the enterprise leveraging the IBM data fabric capabilities.
- Extensible through APIs, value-add partner ecosystem, accelerators, and 3rd party solutions.
I believe that IBM watsonx represents a major leap forward in simplifying how organizations consume AI, helping them accelerate their AI journeys and become AI value creators.
For more information check out these two IBM blogs: