Posts Tagged ‘Big Data’

IBM teams with Cloudera and Hortonworks 

July 11, 2019

Dancing Dinosaur has a friend on the West coast who finally left IBM after years of complaining, swearing never to return, and has been happily working at Cloudera ever since. IBM and Cloudera this week announced a strategic partnership to develop joint go-to-market programs designed to bring advanced data and AI solutions to more organizations across the expansive Apache Hadoop ecosystem.

Graphic representing a single solution for big data analytics

Deploy a single solution for big data

The agreement builds on the long-standing relationship between IBM and Hortonworks, which merged with Cloudera this past January to create integrated solutions for data science and data management. The new agreement builds on the integrated solutions and extends them to include the Cloudera platform. “This should stop the big-data-is-dead thinking that has been cropping up,” he says, putting his best positive spin on the situation.

Unfortunately, my West coast buddy may be back at IBM sooner than he thinks. With IBM finalizing its $34 billion Red Hat acquisition yesterday, it is small additional money to just buy Horton and Cloudera and own them all as a solid big data-cloud capabilities block IBM owns.  

As IBM sees it, the companies have partnered to offer an industry-leading, enterprise-grade Hadoop distribution plus an ecosystem of integrated products and services – all designed to help organizations achieve faster analytic results at scale. As a part of this partnership, IBM promises to:

  • Resell and support of Cloudera products
  • Sell and support of Hortonworks products under a multi-year contract
  • Provide migration assistance to future Cloudera/Hortonworks unity products
  • Deliver the benefits of the combined IBM and Cloudera collaboration and investment in the open source community, along with commitment to better support analytics initiatives from the edge to AI.

IBM also will resell the Cloudera Enterprise Data Hub, Cloudera DataFlow, and Cloudera Data Science Workbench. In response, Cloudera will begin to resell IBM’s Watson Studio and BigSQL.

“By teaming more strategically with IBM we can accelerate data-driven decision making for our joint enterprise customers who want a hybrid and multi-cloud data management solution with common security and governance,” said Scott Andress, Cloudera’s Vice President of Global Channels and Alliances in the announcement. 

Cloudera enables organizations to transform complex data into clear and actionable insights. It delivers an enterprise data cloud for any data, anywhere, from the edge to AI. One obvious question: how long until IBM wants to include Cloudera as part of its own hybrid cloud? 

But IBM isn’t stopping here. It also just announced new storage solutions across AI and big data, modern data protection, hybrid multicloud, and more. These innovations will allow organizations to leverage more heterogeneous data sources and data types for deeper insights from AI and analytics, expand their ability to consolidate rapidly expanding data on IBM’s object storage, and extend modern data protection to support more workloads in hybrid cloud environments.

The key is IBM Spectrum Discover, metadata management software that provides data insight for petabyte-scale unstructured storage. The software connects to IBM Cloud Object Storage and IBM Spectrum Scale, enabling it to rapidly ingest, consolidate, and index metadata for billions of files and objects. It provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. Combining that with Cloudera and Horton on the IBM’s hybrid cloud should give you a powerful data analytics solution. 

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog, and see more of his work at technologywriter.com. 

 

Syncsort Drives IBMi Security with AI

May 2, 2019

The technology security landscape looks increasingly dangerous  The problem revolves around the possible impact of AI. the impact of which is not fully clear. The hope, of course, is that AI will make security more efficient and effective.  However, the security bad actors can also jump on AI to advance their own schemes. Like a cyber version of the nuclear arms race, this has been an ongoing battle for decades. The industry has to cooperate and, specifically, share information and hope the good guys can stay a step ahead.

In the meantime, vendors like IBM and most recently Syncsort have been stepping up to  the latest challengers. Syncsort, for example, earlier this month launched its Assure Security to address the increasing sophistication of cyber attacks and expanding data privacy regulations.  In surprising ways, it turns out, data privacy and AI are closely related in the AI security battle.

Syncsort, a leader in Big Iron-to-Big Data software, announced Assure Security, which combines access control, data privacy, compliance monitoring, and risk assessment into a single product. Together, these capabilities help security officers, IBMi administrators, and Db2 administrators address critical security challenges and comply with new regulations meant to safeguard and protect the privacy of data.

And it clearly is coming at the right time.  According to Privacy Rights Clearinghouse, a non-profit corporation with a mission to advocate for data privacy there were 828 reported security incidents in 2018 resulting in the exposure of over 1.37 billion records of sensitive data. As regulations to help protect consumer and business data become stricter and more numerous, organizations must build more robust data governance and security programs to keep the data from being exploited by bad security actors for nefarious purposes.  The industry already has scrambled to comply with GDPR and the New York Department of Financial Services Cybersecurity regulations and they now must prepare for the GDPR-like California Consumer Privacy Act, which takes effect January 1, 2020.

In its own survey Syncsort found security is the number one priority among IT pros with IBMi systems. “Given the increasing sophistication of cyber attacks, it’s not surprising 41 percent of respondents reported their company experienced a security breach and 20 percent more were unsure if they even had been breached,” said David Hodgson, CPO, Syncsort. The company’s new Assure Security product leverages the wealth of IBMi security technology and the expertise to help organizations address their highest-priority challenges. This includes protecting against vulnerabilities introduced by new, open-source methods of connecting to IBMi systems, adopting new cloud services, and complying with expanded government regulations.

Of course, IBM hasn’t been sleeping through this. The company continues to push various permutations of Watson to tackle the AI security challenge. For example, IBM leverages AI to gather insights and use reasoning to identify relationships between threats, such as malicious files, suspicious IP addresses,  or even insiders. This analysis takes seconds or minutes, allowing security analysts to respond to threats up to 60 times faster.

It also relies on AI to eliminate time-consuming research tasks and provides curated analysis of risks, which reduces the amount of time security analysts require to make the critical decisions and launch an orchestrated response to counter each threat. The result, which IBM refers to as cognitive security, combines the strengths of artificial intelligence and human intelligence.

Cognitive AI in effect, learns with each interaction to proactively detect and analyze threats and provides actionable insights to security analysts making informed decisions. Such cognitive security, let’s hope, combines the strengths of artificial intelligence with human judgement.

Syncsort’s Assure Security, specifically brings together best-in-class IBMi security capabilities acquired by Syncsort into an all-in-one solution, with the flexibility for customers to license individual modules. The resulting product includes:

  • Assure  Compliance Monitoring quickly identifies security and compliance issues with real-time alerts and reports on IBMi system activity and database changes.
  • Assure Access Control provides control of access to IBMi systems and their data through a varied bundle of capabilities.
  • Assure Data Privacy protects IBMi data at-rest and in-motion from unauthorized access and theft through a combination of NIST-certified encryption, tokenization, masking, and secure file transfer capabilities.
  • Assure Security Risk Assessment examines over a dozen categories of security values, open ports, power users, and more to address vulnerabilities.

It probably won’t surprise anyone but the AI security situation is not going to be cleared up soon. Expect to see a steady stream of headlines around security hits and misses over the next few years. Just hope will get easier to separate the good guys from the bad actors and the lessons will be clear.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog, and see more of his work at technologywriter.com.

IBM Joins with Harley-Davidson for LiveWire

March 1, 2019

I should have written this piece 40 years ago as a young man fresh out of grad school. Then I was dying for a 1200cc Harley Davidson motorcycle. My mother was dead set against it—she wouldn’t even allow me to play tackle football and has since been vindicated (You win on that, mom.). My father, too, was opposed and wouldn’t help pay for it. I had to settle for a puny little motor scooter that offered zero excitement.

In the decades since I graduated, Harley’s fortunes have plummeted as the HOG (Harley Owners Group) community aged out and few youngsters have picked up the slack. The 1200cc bike I once lusted after probably is now too heavy for me to handle. So, what is Harley to do? Redefine its classic American brand with an electric model, LiveWire.

Courtesy: Harley Davidson, IBM

With LiveWire, Harley expects to remake the motorcycle as a cloud-connected machine and promises to deliver new products for fresh motorcycle segments, broaden engagement with the brand, and strengthen the H-D dealer network. It also boldly proclaimed that Harley-Davidson will lead the electrification of motorcycling.

According to the company, Harley’s LiveWire will leverage H-D Connect, a service (available in select markets), built on thIBM AI, analytics, and IoTe IBM Cloud. This will enable it to deliver new mobility and concierge services today and leverage an expanding use of IBM AI, analytics, and IoT to enhance and evolve the rider’s experience. In order to capture this next generation of bikers, Harley is working with IBM to transform the everyday experience of riding through the latest technologies and features IBM can deliver via the cloud.

Would DancingDinosaur, an aging Harley enthusiast, plunk down the thousands it would take to buy one of these? Since I rarely use my smartphone to do anything more than check email and news, I am probably not a likely prospect for LiveWire.

Will LiveWire save Harley? Maybe; it depends on what the promised services will actually deliver. Already, I can access a wide variety of services through my car but, other than Waze, I rarely use any of those.

According to the joint IBM-Harley announcement, a fully cellular-connected electric motorcycle needed a partner that could deliver mobility solutions that would meet riders’ changing expectations, as well as enhance security. With IBM, Harley hopes to strike a balance between using data to create both intelligent and personal experiences while maintaining privacy and security, said Marc McAllister, Harley-Davidson VP Product Planning and Portfolio in the announcement.

So, based on this description, are you ready to jump to LiveWire? You probably need more details. So far, IBM and Harley have identified only three:

  1. Powering The Ride: LiveWire riders will be able to check bike vitals at any time and from any location. Information available includes features such as range, battery health, and charge level. Motorcycle status features will also support the needs of the electric bike, such as the location of charging stations. Also riders can see their motorcycle’s current map location.  Identifying charging stations could be useful.
  2. Powering Security: An alert will be sent to the owner’s phone if the motorcycle has been bumped, tampered, or moved. GPS-enabled stolen-vehicle assistance will provide peace of mind that the motorcycle’s location can be tracked. (Requires law enforcement assistance. Available in select markets).
  3. Powering Convenience: Reminders about upcoming motorcycle service requirements and other care notifications will be provided. In addition, riders will receive automated service reminders as well as safety or recall notifications.

“The next generation of Harley-Davidson riders will demand a more engaged and personalized customer experience,” said Venkatesh Iyer, Vice President, North America IoT and Connected Solutions, Global Business Services, IBM. Introducing enhanced capabilities, he continues, via the IBM Cloud will not only enable new services immediately, but will also provide a roadmap for the journey ahead. (Huh?)

As much as DancingDinosaur aches for Harley to come roaring back with a story that will win the hearts of the HOG users who haven’t already drifted away Harley will need more than the usual buzzwords, trivial apps, and cloud hype.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog, and see more of his work at technologywriter.com.

Are Quantum Computers Even Feasible

November 29, 2018

IBM has toned down its enthusiasm for quantum computing. Even last spring it already was backing off a bit at Think 2018. Now the company is believes that quantum computing will augment classical computing to potentially open doors that it once thought would remain locked indefinitely.

First IBM Q computation center

With its Bristlecone announcement Google trumped IBM with 72 qubits. Debating a few dozen qubits more or less may prove irrelevant. A number of quantum physics researchers have recently been publishing papers that suggest useful quantum computing may be decades away.

Mikhail Dyakonov writes in his piece titled: The Case Against Quantum Computing, which appeared last month in Spectrum IEEE.org. Dyakonov does research in theoretical physics at Charles Coulomb Laboratory at the University of Montpellier, in France.

As Dyakonov explains: In quantum computing, the classical two-state circuit element (the transistor) is replaced by a quantum element called a quantum bit, or qubit. Like the conventional bit, it also has two basic states. But you already know this because DancingDinosaur covered it here and several times since.

But this is what you might not know: With the quantum bit, those two states aren’t the only ones possible. That’s because the spin state of an electron is described as a quantum-mechanical wave function. And that function involves two complex numbers, α and β (called quantum amplitudes), which, being complex numbers, have real parts and imaginary parts. Those complex numbers, α and β, each have a certain magnitude, and, according to the rules of quantum mechanics, their squared magnitudes must add up to 1.

Dyakonov continues: In contrast to a classical bit a qubit can be in any of a continuum of possible states, as defined by the values of the quantum amplitudes α and β. This property is often described by the statement that a qubit can exist simultaneously in both of its ↑ and ↓ states. Yes, quantum mechanics often defies intuition.

So while IBM, Google, and other classical computer providers quibble about 50 qubits or 72 or even 500 qubits, to Dyakonov this is ridiculous. The real number of qubits will be astronomical as he explains: Experts estimate that the number of qubits needed for a useful quantum computer, one that could compete with your laptop in solving certain kinds of interesting problems, is between 1,000 and 100,000. So the number of continuous parameters describing the state of such a useful quantum computer at any given moment must be at least 21,000, which is to say about 10300. That’s a very big number indeed; much greater than the number of subatomic particles in the observable universe.

Just in case you missed the math, he repeats: A useful quantum computer [will] need to process a set of continuous parameters that is larger than the number of subatomic particles in the observable universe.

Before you run out to invest in a quantum computer with the most qubits you can buy you would be better served joining IBM’s Q Experience and experimenting with it on IBM’s nickel. Let them wrestle with the issues Dyakonov brings up.

Then, Dyakonov concludes: I believe that such experimental research is beneficial and may lead to a better understanding of complicated quantum systems.  I’m skeptical that these efforts will ever result in a practical quantum computer. Such a computer would have to be able to manipulate—on a microscopic level and with enormous precision—a physical system characterized by an unimaginably huge set of parameters, each of which can take on a continuous range of values. Could we ever learn to control the more than 10300 continuously variable parameters defining the quantum state of such a system? My answer is simple. No, never.

I hope my high school science teacher who enthusiastically introduced me to quantum physics has long since retired or, more likely, passed on. Meanwhile, DancingDinosaur expects to revisit quantum regularly in the coming months or even years.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog, and see more of his work at technologywriter.com.

GAO Blames Z for Government Inefficiency

October 19, 2018

Check out the GAO report from May 2016 here.  The Feds spent more than 75 percent of the total amount budgeted for information technology (IT) for fiscal year 2015 on operations and maintenance (O&M). In a related report, the IRS reported it used assembly language code and COBOL, both developed in the 1950s, for IMF and IDRS. Unfortunately, the GAO conflates the word “mainframe” to refer to outdated UNISYS mainframes with the modern, supported, and actively developed IBM Z mainframes, notes Ross Mauri, IBM general manager, Z systems.

Mainframes-mobile in the cloud courtesy of Compuware

The GAO repeatedly used “mainframe” to refer to outdated UNISYS mainframes alongside the latest advanced IBM Z mainframes.  COBOL, too, maintains active skills and training programs at many institutions and receives investment across many industries. In addition to COBOL, the IBM z14 also runs Java, Swift, Go, Python and other open languages to enable modern application enhancement and development. Does the GAO know that?

The GAO uses the word “mainframe” to refer to outdated UNISYS mainframes as well as modern, supported, and actively developed IBM Z mainframes. In a recent report, the GAO recommends moving to supported modern hardware. IBM agrees. The Z, however, does not expose mainframe investments to a rise in procurement and operating costs, nor to skilled staff issues, Mauri continued.

Three investments the GAO reviewed in the operations and maintenance clearly appear as legacy investments facing significant risks due to their reliance on obsolete programming languages, outdated hardware, and a shortage of staff with critical skills. For example, IRS reported that it used assembly language code and COBOL (both developed in the 1950s) for IMF and IDRS. What are these bureaucrats smoking?

The GAO also seems confused over the Z and the cloud. IBM Cloud Private is designed to run on Linux-based Z systems to take full advantage of the cloud through open containers while retaining the inherent benefits of Z hardware—security, availability,  scalability, reliability; all the ities enterprises have long relied on the z for. The GAO seems unaware that the Z’s automatic pervasive encryption immediately encrypts everything at rest or in transit. Furthermore, the GAO routinely addresses COBOL as a deficiency while ISVs and other signatories of the Open Letter consider it a modern, optimized, and actively supported programming language.

The GAO apparently isn’t even aware of IBM Cloud Private. IBM Cloud Private is compatible with leading IT systems manufacturers and has been optimized for IBM Z. All that you need to get started with the cloud is the starter kit available for IBM OpenPOWER LC (Linux) servers, enterprise Power Systems, and Hyperconverged Systems powered by Nutanix. You don’t even need a Z; just buy a low cost OpenPOWER LC (Linux) server online and configure it as desired.

Here is part of the letter that Compuware sent to the GAO, Federal CIOs, and members of Congress. It’s endorsed by several dozen members of the IT industry. The full letter is here:

In light of a June 2018 GAO report to the Internal Revenue Service suggesting the agency’s mainframe- and COBOL-based systems present significant risks to tax processing, we the mainframe IT community—developers, scholars, influencers and inventors—urge the IRS and other federal agencies to:

  • Reinvest in and modernize the mainframe platform and the mission-critical applications which many have long relied upon.
  • Prudently consider the financial risks and opportunity costs associated with rewriting and replacing proven, highly dependable mainframe applications, for which no “off-the-shelf” replacement exists.
  • Understand the security and performance requirements of these mainframe applications and data and the risk of migrating to platforms that were never designed to meet such requirements.

The Compuware letter goes on to state: In 2018, the mainframe is still the world’s most reliable, performant and securable platform, providing the lowest cost high-transaction system of record. Regarding COBOL it notes that since 2017 IBM z14 supports COBOL V6.2, which is optimized bi-monthly.

Finally, about attracting new COBOL workers: COBOL is as easy to work with it as any other language. In fact, open source Zowe has demonstrated appeal to young techies, providing solutions for development and operations teams to securely manage, control, script, and develop on the mainframe like any other cloud platform. What don’t they get?

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog, and see more of his work at technologywriter.com.

Can IBM find a place for Watson?

September 7, 2018

After beating 2 human Jeopardy game champions three times in a row in 2011 IBM’s Watson has been hard pressed to come up with a comparable winning streak. Initially IBM appeared to expect its largest customers to buy richly configured Power Servers to run Watson on prem. When they didn’t get enough takers the company moved Watson to the cloud where companies could lease it for major knowledge-driven projects. When that didn’t catch on IBM started to lease Watson’s capabilities by the drink, promising to solve problems in onesies and twosies.

Jeopardy champs lose to Watson

Today Watson is promising to speed AI success through IBM’s Watson Knowledge Catalog. As IBM puts it: IBM Watson Knowledge Catalog powers intelligent, self-service discovery of data, models, and more; activating them for artificial intelligence, machine learning, and deep learning. Access, curate, categorize and share data, knowledge assets, and their relationships, wherever they reside.

DancingDinosaur has no doubt that Watson is stunning technology and has been rooting for its success since that first Jeopardy round seven years ago. Over that time, Watson and IBM have become a case study in how not to price, package, and market powerful yet expensive technology. The Watson Knowledge Catalog is yet another pricing and packaging experiment.

Based on the latest information online, Watson Knowledge Catalog is priced according to number of provisioned catalogs and discovery connections. There are two plans available: Lite and Professional. The Lite plan allows 1 catalog and 5 free discovery connections while the Professional plan provides unlimited of both. Huh? This statement begs for clarification and there probably is a lot of information and fine print required to answer the numerous questions the above description raises, but life is too short for DancingDinosaur to rummage around on the Watson Knowledge Catalog site to look for answers. Doesn’t this seem like something Watson itself should be able to clarify with a single click?

But no, that is too easy. Instead IBM takes the high road, which DancingDinosaur calls the education track.  Notes Jay Limburn, Senior Technical Staff Member and IBM Offering Manager: there are two main challenges that might impede you from realizing the true value of your data and slowing your journey to adopting artificial intelligence (AI). They are 1) inefficient data management and 2) finding the right tools for all data users.

Actually, the issues start even earlier. In attempting AI most established organizations start at a disadvantage, notes IBM. For example:

  • Most enterprises do not know what and where their data is
  • Data science and compliance teams are handicapped by the lack of data accessibility
  • Enterprises with legacy data are even more handicapped than digitally savvy startups
  • AI projects will expose problems with limited data and poor quality; many will simply fail just due to that.
  • The need to differentiate through monetization increases in importance with AI

These are not new. People have been whining about this since the most rudimentary data mining attempts were made decades ago. If there is a surprise it is that they have not been resolved by now.

Or maybe they finally have with the IBM Watson Knowledge Catalog. As IBM puts it, the company will deliver what promises to be the ultimate data Catalog that actually illuminates data:

  • Knows what data your enterprise has
  • Where it resides
  • Where it came from
  • What it means
  • Provide quick access to it
  • Ensure protection of use
  • Exploit Machine Learning for intelligence and automation
  • Enable data scientists, data engineers, stewards and business analysts
  • Embeddable everywhere for free, with premium features available in paid editions

OK, after 7 years Watson may be poised to deliver and it has little to do with Jeopardy but with a rapidly growing data catalog market. According to a Research and Markets report, the data catalog market is expected to grow from $210 million in 2017 to $620 million by 2022. How many sales of the Professional version gets IBM a leading share.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog. See more of his work at technologywriter.com and here.

Travelport and IBM launch industry AI travel platform

August 24, 2018

Uh oh, if you have been a little sloppy with travel expenses, it’s time to clean up your travel act before AI starts monitoring your reimbursed travel. IBM and Travelport are teaming up to offer the industry’s first AI-based travel platform to intelligently manage corporate travel spend while leveraging IBM Watson capabilities to unlock previously unavailable data insights.

As IBM explains it, the new travel platform will be delivered via the IBM Cloud and exploits IBM Watson capabilities to intelligently track, manage, predict and analyze travel costs to fundamentally change how companies manage and optimize their travel programs. Typically, each work group submits its own travel expenses and reconciliation and reimbursement can be handled by different groups.

With annual global business travel spend estimated to reach a record $1.2 trillion this year, as projected by the Global Business Travel Association, corporate travel managers need new ways to reduce costs. That requires consolidating and normalizing all the information. Currently for businesses to get a full picture of travel patterns a travel manager might have to sift through data silos from travel agencies, cards, expense systems, and suppliers for end-to-end visibility of spend and compliance across all travel subcategories.  This, however, is usually undertaken from an historical view rather than in real time, which is one reason why reimbursement can take so long. As an independent contractor, DancingDinosaur generally has to submit travel expenses at the end of the project and wait forever for payment.

IBM continues: The new platform, dubbed Travel Manager,  features advanced artificial intelligence, and provides cognitive computing and predictive data analytics using what-if type scenarios, while integrated with travel and expense data to help travel management teams, procurement category managers, business units, finance, and human resource departments optimize their travel program, control spend, and enhance the end-traveler experience.  Maybe they will even squeeze independent contractors into the workflow.

The special sauce in all of this results from how IBM combines data with Travelport, a travel commerce platform on its own, to produce IBM Travel Manager as an AI platform that oversees corporate travel expenses. In the process, IBM Travel Manager gives users complete, unified access to previously siloed information, which, when combined with travel data from the Travelport global distribution system (GDS), can then be used to create real-time predictive analytics recommending how, say, adjustments in travel booking behavior patterns can positively impact a company’s travel budget.

Travelport, itself, is a heavyweight in the travel industry. It relies on technology to make the experience of buying and managing travel better. Through its travel commerce platform it provides distribution, technology, payment and other capabilities for the $7 trillion global travel and tourism industry. The platform facilitates travel commerce by connecting the world’s leading travel providers with online and offline travel buyers in a proprietary (B2B) travel marketplace.

The company helps with all aspects of the travel supply chain from airline merchandising, hotel content and distribution, mobile commerce to B2B payments. Last year its platform processed over $83 billion of travel spend, helping its customers maximize the value of every trip.

IBM Travel Manager combines and normalizes data from diverse sources, allowing for more robust insights and benchmarking than other reporting solutions. It also taps AI to unlock previously unavailable insights from multiple internal and external data sources. The product is expected to be commercially available to customers through both IBM and Travelport.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog. See more of his work at technologywriter.com and here.

IBM Introduces a Reference Architecture for On-Premise AI

June 22, 2018

This week IBM announced an AI infrastructure Reference Architecture for on-premises AI deployments. The architecture promises to address the challenges organizations face experimenting with AI PoCs, growing into multi-tenant production systems, and then expanding to enterprise scale while integrating into an organization’s existing IT infrastructure.

The reference architecture includes, according to IBM, a set of integrated software tools built on optimized, accelerated hardware for the purpose of enabling organizations to jump start. AI and Deep Learning projects, speed time to model accuracy, and provide enterprise-grade security, interoperability, and support.  IBM’s graphic above should give you the general picture.

Specifically, IBM’s AI reference architecture should support iterative, multi-stage, data-driven processes or workflows that entail specialized knowledge, skills, and, usually, a new compute and storage infrastructure. Still, these projects have many attributes that are familiar to traditional CIOs and IT departments.

The first of these is that the results are only as good as the data going into it, and model development is dependent upon having a lot of data and the data being in the format expected by the deep learning framework. Surprised? You have been hearing this for decades as GIGO (Garbage In Garbage Out).  The AI process also is iterative; repeatedly looping through data sets and tunings to develop more accurate models and then comparing new data in the model to the original business or technical requirements to refine the approach.  In this sense, AI reference model is no different than IT 101, an intro course for wannabe IT folks.

But AI doesn’t stay simplistic for long. As the reference architecture puts it, AI is a sophisticated, complex process that requires specialized software and infrastructure. That’s where IBM’s PowerAI Platform comes in. Most organizations start with small pilot projects bound to a few systems and data sets but grow from there.

As projects grow beyond the first test systems, however, it is time to bulk up an appropriate storage and networking infrastructure. This will allow it to sustain growth and eventually support a larger organization.

The trickiest part of AI and the part that takes inspired genius to conceive, test, and train is the model. The accuracy and quality of a trained AI model are directly affected by the quality and quantity of data used for training. The data scientist needs to understand the problem they are trying to solve and then find the data needed to build a model that solves the problem.

Data for AI is separated into a few broad sets; the data used to train and test the models and data that is analyzed by the models and the archived data that may be reused. This data can come from many different sources such as traditional organizational data from ERP systems, databases, data lakes, sensors, collaborators and partners, public data, mobile apps, social media, and legacy data. It may be structured or unstructured in many formats such as file, block, object, Hadoop Distributed File Systems (HDFS), or something else.

Many AI projects begin as a big data problem. Regardless of how it starts, a large volume of data is needed, and it inevitably needs preparation, transformation, and manipulation. But it doesn’t stop there.

AI models require the training data to be in a specific format; each model has its own and usually different format. Invariably the initial data is nowhere near those formats. Preparing the data is often one of the largest organizational challenges, not only in complexity but also in the amount of time it takes to transform the data into a format that can be analyzed. Many data scientists, notes IBM, claim that over 80% of their time is spent in this phase and only 20% on the actual process of data science. Data transformation and preparation is typically a highly manual, serial set of steps: identifying and connecting to data sources, extracting to a staging server, tagging the data, using tools and scripts to manipulate the data. Hadoop is often a significant source of this raw data, and Spark typically provides the analytics and transformation engines used along with advanced AI data matching and traditional SQL scripts.

There are two other considerations in this phase: 1) data storage and access and the speed of execution. For this—don’t be shocked—IBM recommends Spectrum Scale to provide multi-protocol support with a native HDFS connector, which can centralize and analyze data in place rather than wasting time copying and moving data. But you may have your preferred platform.

IBM’s reference architecture provides a place to start. A skilled IT group will eventually tweak IBM’s reference architecture, making it their own.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog. See more of his work at technologywriter.com and here.

Is Your Enterprise Ready for AI?

May 11, 2018

According to IBM’s gospel of AI “we are in the midst of a global transformation and it is touching every aspect of our world, our lives, and our businesses.”  IBM has been preaching its gospel of AI of the past year or longer, but most of its clients haven’t jumped fully aboard. “For most of our clients, AI will be a journey. This is demonstrated by the fact that most organizations are still in the early phases of AI adoption.”

AC922 with NIVIDIA Tesla V100 and Enhanced NVLink GPUs

The company’s latest announcements earlier this week focus POWER9 squarely on AI. Said Tim Burke, Engineering Vice President, Cloud and Operating System Infrastructure, at Red Hat. “POWER9-based servers, running Red Hat’s leading open technologies offer a more stable and performance optimized foundation for machine learning and AI frameworks, which is required for production deployments… including PowerAI, IBM’s software platform for deep learning with IBM Power Systems that includes popular frameworks like Tensorflow and Caffe, as the first commercially supported AI software offering for [the Red Hat] platform.”

IBM insists this is not just about POWER9 and they may have a point; GPUs and other assist processors are taking on more importance as companies try to emulate the hyperscalers in their efforts to drive server efficiency while boosting power in the wake of declines in Moore’s Law. ”GPUs are at the foundation of major advances in AI and deep learning around the world,” said Paresh Kharya, group product marketing manager of Accelerated Computing at NVIDIA. [Through] “the tight integration of IBM POWER9 processors and NVIDIA V100 GPUs made possible by NVIDIA NVLink, enterprises can experience incredible increases in performance for compute- intensive workloads.”

To create an AI-optimized infrastructure, IBM announced the latest additions to its POWER9 lineup, the IBM Power Systems LC922 and LC921. Characterized by IBM as balanced servers offering both compute capabilities and up to 120 terabytes of data storage and NVMe for rapid access to vast amounts of data. IBM included HDD in the announcement but any serious AI workload will choke without ample SSD.

Specifically, these new servers bring an updated version of the AC922 server, which now features recently announced 32GB NVIDIA V100 GPUs and larger system memory, which enables bigger deep learning models to improve the accuracy of AI workloads.

IBM has characterized the new models as data-intensive machines and AI-intensive systems, LC922 and LC921 Servers with POWER9 processors. The AC922, arrived last fall. It was designed for the what IBM calls the post-CPU era. The AC922 was the first to embed PCI-Express 4.0, next-generation NVIDIA NVLink, and OpenCAPI—3 interface accelerators—which together can accelerate data movement 9.5x faster than PCIe 3.0 based x86 systems. The AC922 was designed to drive demonstrable performance improvements across popular AI frameworks such as TensorFlow and Caffe.

In the post CPU era, where Moore’s Law no longer rules, you need to pay as much attention to the GPU and other assist processors as the CPU itself, maybe even more so. For example, the coherence and high-speed of the NVLink enables hash tables—critical for fast analytics—on GPUs. As IBM noted at the introduction of the new machines this week: Hash tables are fundamental data structure for analytics over large datasets. For this you need large memory: small GPU memory limits hash table size and analytic performance. The CPU-GPU NVLink2 solves 2 key problems: large memory and high-speed enables storing the full hash table in CPU memory and transferring pieces to GPU for fast operations; coherence enables new inserts in CPU memory to get updated in GPU memory. Otherwise, modifications on data in CPU memory do not get updated in GPU memory.

IBM has started referring to the LC922 and LC921 as big data crushers. The LC921 brings 2 POWER9 sockets in a 1U form factor; for I/O it comes with both PCIe 4.0 and CAPI 2.0.; and offers up to 40 cores (160 threads) and 2TB RAM, which is ideal for environments requiring dense computing.

The LC922 is considerably bigger. It offers balanced compute capabilities delivered with the P9 processor and up to 120TB of storage capacity, again advanced I/O through PCIe 4.0/CAPI 2.0, and up to 44 cores (176 threads) and 2TB RAM. The list price, notes IBM is ~30% less.

If your organization is not thinking about AI your organization is probably in the minority, according to IDC.

  • 31 percent of organizations are in [AI] discovery/evaluation
  • 22 percent of organizations plan to implement AI in next 1-2 years
  • 22 percent of organizations are running AI trials
  • 4 percent of organizations have already deployed AI

Underpinning both servers is the IBM POWER9 CPU. The POWER9 enjoys a nearly 5.6x improved CPU to GPU bandwidth vs x86, which can improve deep learning training times by nearly 4x. Even today companies are struggling to cobble together the different pieces and make them work. IBM learned that lesson and now offers a unified AI infrastructure in PowerAI and Power9 that you can use today.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.

IBM Introduces Skinny Z Systems

April 13, 2018

Early this week IBM unveiled two miniaturized mainframe models, dubbed skinny mainframes, it said are easier to deploy in a public or private cloud facility than their more traditional, much bulkier predecessors. Relying on all their design tricks, IBM engineers managed to pack each machine into a standard 19-inch rack with space to spare, which can be used for additional components.

Z14 LinuxONE Rockhopper II, 19-inch rack

The first new mainframe introduced this week, also in a 19-inch rack, is the Z14 model ZR1. You can expect subsequent models to increment the model numbering.  The second new machine is the LinuxONE Rockhopper II, also in a 19-inch rack.

In the past, about a year after IBM introduced a new mainframe, say the z10, it was introduced what it called a Business Class (BC) version. The BC machines were less richly configured, less expandable but delivered comparable performance with lower capacity and a distinctly lower price.

In a Q&A analyst session IBM insisted the new machines would be priced noticeably lower, as were the BC-class machines of the past. These are not comparable to the old BC machines. Instead, they are intended to attract a new group of users who face new challenges. As such, they come cloud-ready. The 19-inch industry standard, single-frame design is intended for easy placement into existing cloud data centers alongside other components and private cloud environments.

The company, said Ross Mauri, General Manager IBM Z, is targeting the new machines toward clients seeking robust security with pervasive encryption, cloud capabilities and powerful analytics through machine learning. Not only, he continued, does this increase security and capability in on-premises and hybrid cloud environments for clients, IBM will also deploy the new systems in IBM public cloud data centers as the company focuses on enhancing security and performance for increasingly intensive data loads.

In terms of security, the new machines will be hard to beat. IBM reports the new machines capable of processing over 850 million fully encrypted transactions a day on a single system. Along the same lines, the new mainframes do not require special space, cooling or energy. They do, however, still provide IBM’s pervasive encryption and Secure Service Container technology, which secures data serving at a massive scale.

Ross continued: The new IBM Z and IBM LinuxONE offerings also bring significant increases in capacity, performance, memory and cache across nearly all aspects of the system. A complete system redesign delivers this capacity growth in 40 percent less space and is standardized to be deployed in any data center. The z14 ZR1 can be the foundation for an IBM Cloud Private solution, creating a data-center-in-a-box by co-locating storage, networking and other elements in the same physical frame as the mainframe server.  This is where you can utilize that extra space, which was included in the 19-inch rack.

The LinuxONE Rockhopper II can also accommodate a Docker-certified infrastructure for Docker EE with integrated management and scale tested up to 330,000 Docker containers –allowing developers to build high-performance applications and embrace a micro-services architecture.

The 19-inch rack, however, comes with tradeoffs, notes Timothy Green writing in The Motley Fool. Yes, it takes up 40% less floor space than the full-size Z14, but accommodates only 30 processor cores, far below the 170 cores supported by a full size Z14, , which fills a 24-inch rack. Both new systems can handle around 850 million fully encrypted transactions per day, a fraction of the Z14’s full capacity. But not every company needs the full performance and capacity of the traditional mainframe. For companies that don’t need the full power of a Z14 mainframe, notes Green, or that have previously balked at the high price or massive footprint of full mainframe systems, these smaller mainframes may be just what it takes to bring them to the Z. Now IBM needs to come through with the advantageous pricing they insisted they would offer.

The new skinny mainframe are just the latest in IBM’s continuing efforts to keep the mainframe relevant. It began over a decade ago with porting Linux to the mainframe. It continued with Hadoop, blockchain, and containers. Machine learning and deep learning are coming right along.  The only question for DancingDinosaur is when IBM engineers will figure out how to put quantum computing on the Z and squeeze it into customers’ public or private cloud environments.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog. See more of his work at technologywriter.com and here.


%d bloggers like this: