IBM Introduces a Reference Architecture for On-Premise AI

This week IBM announced an AI infrastructure Reference Architecture for on-premises AI deployments. The architecture promises to address the challenges organizations face experimenting with AI PoCs, growing into multi-tenant production systems, and then expanding to enterprise scale while integrating into an organization’s existing IT infrastructure.

The reference architecture includes, according to IBM, a set of integrated software tools built on optimized, accelerated hardware for the purpose of enabling organizations to jump start. AI and Deep Learning projects, speed time to model accuracy, and provide enterprise-grade security, interoperability, and support.  IBM’s graphic above should give you the general picture.

Specifically, IBM’s AI reference architecture should support iterative, multi-stage, data-driven processes or workflows that entail specialized knowledge, skills, and, usually, a new compute and storage infrastructure. Still, these projects have many attributes that are familiar to traditional CIOs and IT departments.

The first of these is that the results are only as good as the data going into it, and model development is dependent upon having a lot of data and the data being in the format expected by the deep learning framework. Surprised? You have been hearing this for decades as GIGO (Garbage In Garbage Out).  The AI process also is iterative; repeatedly looping through data sets and tunings to develop more accurate models and then comparing new data in the model to the original business or technical requirements to refine the approach.  In this sense, AI reference model is no different than IT 101, an intro course for wannabe IT folks.

But AI doesn’t stay simplistic for long. As the reference architecture puts it, AI is a sophisticated, complex process that requires specialized software and infrastructure. That’s where IBM’s PowerAI Platform comes in. Most organizations start with small pilot projects bound to a few systems and data sets but grow from there.

As projects grow beyond the first test systems, however, it is time to bulk up an appropriate storage and networking infrastructure. This will allow it to sustain growth and eventually support a larger organization.

The trickiest part of AI and the part that takes inspired genius to conceive, test, and train is the model. The accuracy and quality of a trained AI model are directly affected by the quality and quantity of data used for training. The data scientist needs to understand the problem they are trying to solve and then find the data needed to build a model that solves the problem.

Data for AI is separated into a few broad sets; the data used to train and test the models and data that is analyzed by the models and the archived data that may be reused. This data can come from many different sources such as traditional organizational data from ERP systems, databases, data lakes, sensors, collaborators and partners, public data, mobile apps, social media, and legacy data. It may be structured or unstructured in many formats such as file, block, object, Hadoop Distributed File Systems (HDFS), or something else.

Many AI projects begin as a big data problem. Regardless of how it starts, a large volume of data is needed, and it inevitably needs preparation, transformation, and manipulation. But it doesn’t stop there.

AI models require the training data to be in a specific format; each model has its own and usually different format. Invariably the initial data is nowhere near those formats. Preparing the data is often one of the largest organizational challenges, not only in complexity but also in the amount of time it takes to transform the data into a format that can be analyzed. Many data scientists, notes IBM, claim that over 80% of their time is spent in this phase and only 20% on the actual process of data science. Data transformation and preparation is typically a highly manual, serial set of steps: identifying and connecting to data sources, extracting to a staging server, tagging the data, using tools and scripts to manipulate the data. Hadoop is often a significant source of this raw data, and Spark typically provides the analytics and transformation engines used along with advanced AI data matching and traditional SQL scripts.

There are two other considerations in this phase: 1) data storage and access and the speed of execution. For this—don’t be shocked—IBM recommends Spectrum Scale to provide multi-protocol support with a native HDFS connector, which can centralize and analyze data in place rather than wasting time copying and moving data. But you may have your preferred platform.

IBM’s reference architecture provides a place to start. A skilled IT group will eventually tweak IBM’s reference architecture, making it their own.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog. See more of his work at and here.

Tags: , , , , , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: