Posts Tagged ‘hadoop’

Industrial Strength SDS for the Cloud

June 12, 2014

The hottest thing in storage today is software defined storage (SDS). Every storage vendor is jumping on the SDS bandwagon.

The presentation titled Industrial -Strength SDS for the Cloud, by Sven Oehme, IBM Senior Research Scientist, drew a packed audience at Edge 2014 and touched on many of the sexiest acronyms in IBM’s storage portfolio.  These included not just GPFS but also GSS (also called GPFS Storage Server), GNR, LROC (local read-only cache), and even worked in Linear Tape File System (LTFS).

The session promised to outline the customer problems SDS solves and show how to deploy it in large scale OpenStack environments with IBM GPFS.  Industrial strength generally refers to large-scale, highly secure and available multi-platform environments.

The session abstract explained that the session would show how GPFS enables resilient, robust, reliable, storage deployed on low-cost industry standard hardware delivering limitless scalability, high performance, and automatic policy-based storage tiering from flash to disk to tape, further lowering costs. It also promised to provide examples of how GPFS provides a single, unified, scale-out data plane for cloud developers across multiple data centers worldwide. GPFS unifies OpenStack VM images, block devices, objects, and files with support for Nova, Cinder, Swift and Glance (OpenStack components), along with POSIX interfaces for integrating legacy applications. C’mon, if you have even a bit of IT geekiness, doesn’t that sound tantalizing?

One disclaimer before jumping into some of the details; despite having written white papers on SDS and cloud your blogger can only hope to approximate the rich context provided at the session.

Let’s start with the simple stuff; the expectations and requirements for cloud  storage:

  • Elasticity, within and across sites
  • Secure isolation between tenants
  • Non-disruptive operations
  • No degradation by failing parts as components fail at scale
  • Different tiers for different workloads
  • Converged platform to handle boot volumes as well as file/object workload
  • Locality awareness and acceleration for exceptional performance
  • Multiple forms of data protection

Of course, affordable hardware and maintenance is expected as is quota/usage and workload accounting.

Things start getting serious with IBM’s General Parallel File System (GPFS). This what IBMers really mean when they refer to Elastic Storage, a single name space provided across individual storage resources, platforms, and operating systems. Add in different classes of storage devices (fast or slow disk, SSD, Flash, even LTFS tape), storage pools, and policies to control data placement and you’ve got the ability to do storage tiering.  You can even geographically distribute the data through IBM’s Active Cloud Engine, initially a SONAS capability sometimes referred to as Active File Manager. Now you have a situation where users can access data by the same name regardless of where it is located. And since the system keeps distributed copies of the latest data it can handle a temporary loss of connectivity between sites.

To protect the data add in declustered software RAID, aka GNR or even GSS (GPFS Storage Server). The beauty of this is it reduces the space overhead of replication through declustered parity (80% vs. 33% utilization) while delivering extremely fast rebuild.  In the process you can remove hardware storage controllers from the picture by doing the migration and RAID management in software on your commodity servers.

dino industrial sds 1

In the above graphic, focus on everything below the elongated blue triangle. Since it is being done in software, you can add an Object API for object storage. Throw in encryption software. Want Hadoop? Add that too. The power of SDS.  Sweet

The architecture Oehme lays out utilizes generic servers with direct-attached switched JBOD (SBOD). It also makes ample use of LROC, which provides a large read cache that benefits many workloads, including SPECsfs, VMware, OpenStack, other virtualization, and database workloads.

A key element in Oehme’s SDS for the cloud is OpenStack. From a storage standpoint OpenStack Cinder, which provides access to block storage as if it were local, enables the efficient sharing of data between services. Cinder supports advanced features, such as snapshots, cloning, and backup. On the back end, Cinder supports Linux servers with iSCSI and LVM; storage controllers; shared filesystems like GPFS, NFS, GlusterFS; and more.

Since Oehme’s  is to produceindustrial-strength SDS for the Cloud it needs to protect data. Data protection is delivered through backups, snapshots, cloning, replication, file level encryption, and declustered RAID, which spans all disks in the declustered array and results in faster RAID rebuild (because there are more disks available for RAID rebuild.)

The result is highly virtualized, industrial strength SDS for deployment in the cloud. Can you bear one more small image that promises to put this all together? Will try to leave it as big as can fit. Notice it includes a lot of OpenStack components connecting storage elements. Here it is.

dino industrial sds 2

DancingDinosaur is Alan Radding. Follow DancingDinosaur on Twitter @mainframeblog

Learn more about Alan Radding at technologywriter.com

Happy 50th System z

April 11, 2014

IBM threw a delightful anniversary party for the mainframe in NYC last Tuesday, April 8. You can watch video from the event here

About 500 people showed up to meet the next generation of mainframers, the top winners of the global Master of the Mainframe competition. First place went to Yong-Sian Shih, Taiwan; followed by Rijnard van Tonder, South Africa; and Philipp Egli, United Kingdom.  Wouldn’t be surprised if these and the other finalists at the event didn’t have job offers before they walked out of the room.

The System z may be built on 50-year old technology but IBM is rapidly driving the mainframe forward into the future. It had a slew of new announcements ready to go at the anniversary event itself and more will be rolling out in the coming months. Check out all the doings around the Mainframe50 anniversary here.

IBM started the new announcements almost immediately with Hadoop on the System z. Called  zDoop, the industry’s first commercial Hadoop for Linux on System z, puts map reduce big data analytics directly on the z. It also announced Flash for mainframe, consisting of the latest generation of flash storage on the IBM DS8870, which promises to speed time to insight with up to 30X the performance over HDD. Put the two together and the System z should become a potent big data analytics workhorse.

But there was even more. Mobile is hot and the mainframe is ready to play in the mobile arena too. Here the problem z shops experience is cost containment. Mainframe shops are seeing a concurrent rise in their costs related to integrating new mobile applications. The problem revolves around the fact that many mobile activities use mainframe resources but don’t generate immediate income.

The IBM System z Solution for Mobile Computing addresses this with new pricing for mobile workloads on z/OS by reducing the cost of the growth of mobile transaction volumes that can cause a spike in software charges. This new pricing will provide up to a 60% reduction on the processor capacity reported for Mobile activity, which can help normalize the rate of transaction growth that generates software charges. The upshot: much mobile traffic volume won’t increase your software overhead.

And IBM kept rolling out the new announcements:

  • Continuous Integration for System z – Compresses the application delivery cycle from months to weeks or days.   Beyond this IBM suggested upcoming initiatives to deliver full DevOps capabilities for the z
  • New version of IBM CICS Transaction Server – Delivers enhanced mobile and cloud support for CICS, able to handle more than 1 billion transactions per day
  • IBM WebSphere Liberty z/OS Connect—Rapid and secure enablement of web, cloud, and mobile access to z/OS assets
  • IBM Security zSecure SSE – Helps prevent malicious computer attacks with enhanced security intelligence and compliance reporting that delivers security events to QRadar SIEM for integrated enterprise- wide security intelligence dashboarding

Jeff Frey, an IBM Fellow and the former CTO of System z, observed that “this architecture was invented 50 years ago, but it is not an old platform.”  It has evolved over those decades and continues evolve. For example, Frey expects the z to accommodate 22nm chips and a significant increase in the increase in the number of cores per chip. He also expects vector technology, double precision floating point and integer capabilities, and FPGA to be built in. In addition, he expects the z to include next generation virtualization technology for the cloud to support software defined environments.

“This is a modern platform,” Frey emphasized. Other IBMers hinted at even more to come, including ongoing research to move beyond silicon to maintain the steady price/performance gains the computing industry has enjoyed the past number of decades.

Finally, IBM took the anniversary event to introduce a number of what IBM calls first-in-the-enterprise z customers. (DancingDinosaur thinks of them as mainframe virgins).  One is Steel ORCA, a managed service provider putting together what it calls the first full service digital utility center.  Based in Princeton, NJ, Phase 1 will offer connections of less than a millisecond to/from New York and Philadelphia. The base design is 300 watts per square foot and can handle ultra-high density configurations. Behind the operation is a zEC12. Originally the company planned to use an x86 system but the costs were too high. “We could cut those costs in half with the z,” said Dave Crocker, Steel ORCA chairman.

Although the Mainframe50 anniversary event has passed, there will be Mainframe50 events and announcements throughout the rest of the year.  Again, you can follow the action here.

Coming up next for DancingDinosaur is Edge2014, a big infrastructure innovation conference. Next week DancingDinosaur will look at a few more of the most interesting sessions, and there are plenty. There still is time to register. Please come—you’ll find DancingDinosaur in the bloggers lounge, at program sessions, and at the Sheryl Crow concert.

Follow DancingDinosaur on Twitter, @mainframeblog

 

Enterprise 2013 Details System z and Power Technology and New Capabilities

October 25, 2013

IBM announced a lot of goodies for z and Power users at Enterprise 2013 wrapping up in Orlando today. There were no blockbuster announcements, like a new z machine—we’re probably 12-18 months away from that and even then the first will likely focus on Power8—but it brought a slew of announcements nonetheless. For a full rundown on what was announced click here.

Cloud and analytics—not surprisingly—loom large. For example, Hadoop and a variety of other capabilities have been newly cobbled together, integrated, optimized, and presented as new big data offerings or as new cloud solutions.  This was exemplified by a new Cognos offering for CFOs needing to create, analyze and manage sophisticated financial plans that can provide greater visibility into enterprise profitability or the lack thereof.

Another announcement featured a new IBM Entry Cloud Configuration for SAP on zEnterprise. This is a cloud-enablement offering combining high-performance technology and services to automate, standardize and accelerate day-to-day SAP operations for reduced operational costs and increased ROI. Services also were big at the conference.

Kicking off the event was a dive into data center economics by Steve Mills, Senior Vice President & Group Executive, IBM Software & Systems. Part of the challenge of optimizing IT economics, he noted, was that the IT environment is cumulative. Enterprises keep picking up more systems, hardware and software, as new needs arise but nothing goes away or gets rationalized in any meaningful way.

Between 2000 and 2010, Mills noted, servers had grown at a 6x rate while storage grew at a 69x rate. Virtual machines, meanwhile, were multiplying at the rate of 42% per year. Does anyone see a potential problem here?

Mills’ suggestion: virtualize and consolidate. Specifically, large servers are better for consolidation. His argument goes like this: Most workloads experience variance in demand. But when you consolidate workloads with variance on a virtualized server the variance of the sum is less due to statistical multiplexing (which fits workloads into the gaps created by the variances). Furthermore, the more workloads you can consolidate, the smaller the variance of the sum. His conclusion: bigger servers with capacity to run more workloads can be driven to higher average utilization levels without violating service level agreements, thereby reducing the cost per workload. Finally, the larger the shared processor pool is the more statistical benefit you get.

On the basis of statistical multiplexing, the zEnterprise and the Power 795 are ideal choices for this. Depending on your workloads, just load up the host server, a System z or a big Power box, with as many cores as you can afford and consolidate as many workloads as practical.

Mills’ other cost savings tips: use flash to avoid the cost and complexity of disk storage. Also, eliminate duplicate applications—the fewer you run, the lower the cost. In short, elimination is the clearest path to saving money in the data center.

To illustrate the point, Jim Tussing from Nationwide described how the company virtualized and consolidated 60% on their 10,500 servers on a few mainframes and saved $46 million over five years. It also allowed the company to delay the need for an additional data center for 4 years.

See, if DancingDinosaur was an actual data center manager it could have justified attendance at the entire conference based on the economic tips from just one of the opening keynotes and spent the rest of the conference playing golf. Of course, DancingDinosaur doesn’t play golf so it sat in numerous program sessions instead, which you will hear more about in coming weeks.

You can follow DancingDinosaur on twitter, @mainframeblog

Big Data as a Game Changing Technology at IBM Edge 2013

June 11, 2013

If you ever doubted that big data was going to become important, there should be no doubt anymore. Recent headlines from the past couple of weeks of the government capturing and analyzing massive amounts of daily phone call data should convince you. That this report was shortly followed by more reports of the government tapping the big online data websites like Google, Yahoo, and such for even more data should alert you to three things:

1—There is a massive amount of data out there that can be collected and analyzed.

2—Companies are amassing incredible volumes of data in the normal course of serving people who readily and knowingly give their data to these organizations. (This blogger is one of those tens of million .)

3—The tools and capabilities are mature enough for someone to sort through that data and connect the dots to deliver meaningful insights.

Particularly with regard to the last point this blogger thought the industry was still five years away from generating meaningful results from that amount of data coming in at that velocity. Sure, marketers have been sorting and correlating large amounts of data for years, but it was mostly structured data and not at nearly this much. BTW, your blogger has been writing about big data for some time.

If the news reports weren’t enough it became clear at Edge 2013 that big data analytics is happening and companies like Constant Contact and many others are succeeding at it now. It also is clear that there is sufficient commercial off-the-shelf computing power from companies like IBM and analytics tools to sort through massive amounts of data and make sense of it fast.

Another interesting point came up in one of the many discussions touching on big data. Every person’s personal data footprint is as unique as a fingerprint or other bio-metrics. We all visit different websites and interact with social media and use our credit and debit cards in highly individual ways. Again, marketers have sensed this at some level for years, but they haven’t yet really honed it down to the actual individual on a mass scale, although there is no technical reason one couldn’t.

Subsequent blogs will take up other topics from Edge 2013, such as software defined everything.

Although there were over a dozen sessions on System z topics, the mainframe did not have a big presence at the conference. However, Enterprise Systems 2013 was being promoted at IBM Edge. It will take place Oct. 21-25 in Orlando, Fl. It will combine the System z and the Power System Technical University along with a new executive-focused Enterprise Systems event. It will include new announcements, peeks into trends and directions, over 500 expert technical sessions across 10 tracks, and a comprehensive solution center.

IBM Technical Edge 2013 Tackles Flash – Big Data – Cloud & More

June 3, 2013

IBM Edge 2013 kicks off in just one week, 6/10 and runs through 6/14. Still time to register.  This blogger will be there through 6/13.  You can follow me on Twitter for conference updates @Writer1225.  I’ll be using hashtag #IBMEdge to post live Twitter comments from the conference. As noted here previously I’ll buy a drink for the first two people who come up to me and say they read DancingDinosaur.  How’s that for motivation!

The previous post looked at the Executive track. Now let’s take a glimpse at the technical track, which ranges considerably wider, beyond the System z to IBM’s other platforms, flash, big data, cloud, virtualization, and more

Here’s a sample of the flash sessions:

Assessing the World of Flash looks at the key competitors, chief innovators, followers, and leaders. You’ll quickly find that not all flash solutions are the same and why IBM’s flash strategy stands at the forefront of this new and strategic technology.

There are many ways to deploy flash. This session examines Where to Put Flash in the Data Center.  It will focus particularly on the new IBM FlashSystem products and other technologies from IBM’s Texas Memory Systems acquisition. However, both storage-based and server-based flash technologies will be covered with an eye toward determining what works best for client performance needs.

The session on IBM’s Flash Storage Future will take a look at how IBM is leveraging its Texas Memory Systems acquisition and other IBM technologies to deliver a flash portfolio that will play a major role across not only IBM’s storage products but its overall solution portfolio and its roadmap moving forward.

The flash sessions also will look at how Banco Azteco, Thompson Reuters, and Sprint are deploying and benefiting from flash.

In the big data track, the Future of Analytics Infrastructure looks interesting. Although most organizations understand the value of business analytics many don’t understand how the infrastructure choices they make will impact the success or failure of their analytics projects.  The session will identify the key requirements of any analytical environment: agility, scalability, multipurpose, compliance, cost-effective, and partner-ready; and how they can be met within a single, future-ready analytics infrastructure to meet the needs of current and future analytics strategies.

Big data looms large at the conference. A session titled Hadoop…It’s Not Just about Internal Storage explores how the Hadoop MapReduce approach is evolving from server internal disks to external storage. Initially, Hadoop provided massively scalable, distributed file storage and analytic capabilities. New thinking, however, has emerged that looks at a tiered approach for implementing the Hadoop framework with external storage. Understanding the workload architectural considerations is important as companies begin to integrate analytic workloads to drive higher business value. The session will review the workload considerations to show why an architectural approach makes sense and offer tips and techniques, and share information about IBM’s latest offerings in this space.

An Overview of IBM’s Big Data Strategy details the company’s industrial-strength big data platform to address the full spectrum of big data business opportunities. This session is ideal for those who are just getting started with big data.

And no conference today can skip the cloud. IBM Edge 2013 offers a rich cloud track. For instance, Building the Cloud Enabled Data Center explains how to get maximum value out of an existing virtualized environment through self-service delivery and optimization along with virtualization optimization capabilities. It also describes how to enable business and infrastructure agility with workload optimized clouds that provide orchestration across the entire data center and accelerate application updates to respond faster to stakeholder demands and competitive threats. Finally it looks at how an open and extensible cloud delivery platform can fully automate application deployment and lifecycle management by integrating compute, network, storage, and server automation.

A pair of sessions focus on IBM Cloud Storage Architectures and Understanding IBM’s Cloud Options. The first session looks at several cloud use cases, such as storage and systems management.  The other session looks at IBM SmartCloud Entry, SmartCloud Provisioning, and ServiceDelivery Manager.  The session promises to be an excellent introduction for the cloud technical expert who desires a quick overview of what IBM has to offer in cloud software and the specific value propositions for its various offerings, along with their architectural features and technical requirements.

A particularly interesting session will examine Desktop Cloud through Virtual Desktop Infrastructure and Mobile Computing. The corporate desktop has long been a costly and frustrating challenge complicated even more by mobile access. The combination of the cloud and Virtual Desktop Infrastructure (VDI) provides a way for companies to connect end users to a virtual server environment that can grow as needed while mitigating the issues that have frustrated desktop computing, such as software upgrades and patching.

There is much more in the technical track. All the main IBM platforms are featured, including PureFlex Systems, the IBM BladeCenter, IBM’s Enterprise X-Architecture, the IBM XIV storage system, and, for DancingDinosaur readers, sessions on the DS8000.

Have you registered for IBM Edge 2013 yet?  There still is time. As noted above, find me in the Social Media Lounge at the conference and in the sessions.  You can follow me on Twitter for conference updates @Writer1225.  I’ll be using hashtag #IBMEdge to post live Twitter comments from the conference. I’ll buy a drink for the first two people who come up to me and say they read DancingDinosaur.  How much more motivation do you need?

Next Generation zEnterprise Developers

April 19, 2013

Mainframe development keeps getting more complicated.  The latest complication can be seen in Doug Balog’s reference to mobile and social business on the zEnterprise, reported by DancingDinosaur here a few weeks ago. That is what the next generation of z developers face.

Forget talk about shortages of System z talent due to the retirement of mainframe veterans.  The bigger complication comes from need for non-traditional mainframe development skills required to take advantage mobile and social business as well as other recent areas of interest such as big data and analytics. These areas entail combining new skills like JSON, Atom, Rest, Hadoop, Java, SOA, Linux, hybrid computing along with traditional mainframe development skills like CICS and COBOL, z/VM, SQL, VSAM, and IMS. This combination is next to impossible to find in one individual. Even assembling a coherent team encompassing all those skills presents a serious challenge.

The mainframe industry has been scrambling to address this in various ways.  CA Technologies added GUI to its various tools and BMC has similarly modernized its various management and DB2 tools. IBM, of course, has been steadily bolstering the Rational RDz tool set.   RDz is a z/OS Eclipse-based software IDE.  RDz streamlines and refactors z/OS development processes into structured analysis, editing, and testing operations with modern GUI tools, wizards, and menus that, IBM notes, are perfect for new-to the-mainframe twenty- and thirty-something developers, the next generation of z developers.

Compuware brings its mainframe workbench, described as a modernized interactive developer environment that introduces a new graphical user interface for managing mainframe application development activities.  The interactive toolset addresses every phase of the application lifecycle.

Most recently, Micro Focus announced the release of its new Enterprise Developer for IBM zEnterprise.  The product enables customers to optimize all aspects of mainframe application delivery and promises to drive down costs, increase productivity, and accelerate innovation. Specifically, it enables both on- and off-mainframe development, the latter without consuming mainframe resources, to provide a flexible approach to the delivery of new business functions. In addition, it allows full and flexible customization of the IDE to support unique development processes and provides deep integration into mainframe configuration management and tooling for a more comprehensive development environment. It also boasts of improved application quality with measurable improvement in delivery times.  These capabilities together promise faster developer adoption.

Said Greg Lotko, Vice President and Business Line Executive, IBM System z, about the new Micro Focus offering:  We are continually working with our technology partners to help our clients maximize the value in their IBM mainframes, and this latest innovation from Micro Focus is a great example of that commitment.

Behind all of this development innovation is an industry effort to cultivate the next generation of mainframe developers. Using a combination of trusted technology (COBOL and mainframe) and new innovation (zEnterprise, hybrid computing, expert systems, and Eclipse), these new developers; having been raised on GUI and mobile and social, can leverage what they learned growing up to build the multi-platform, multi-device mainframe applications that organizations will need going forward.

As these people come on board as mainframe-enabled developers organizations will have more confidence in continuing to invest in their mainframe software assets, which currently amount to an estimated 200-300 billion lines of source code and may even be growing as mainframes are added in developing markets, considered a growth market by IBM.  It only makes sense to leverage this proven code base than try to replace it.

This was confirmed in a CA Technologies survey of mainframe users a year ago, which found that 1) the mainframe is playing an increasingly strategic role in managing the evolving needs of the enterprise; 2) the machine is viewed as an enabler of innovation as big data and cloud computing transform the face of enterprise IT—now add mobile; and 3) companies are seeking candidates with cross-disciplinary skill sets to fill critical mainframe workforce needs in the new enterprise IT thinking.

Similarly, a recent study by the Standish Group showed that 70 percent of CIOs saw their organizations’ mainframes as having a central and strategic role in their overall business success.  Using the new tools noted above organizations can maximize the value of the mainframe asset and cultivate the next generation mainframe developers.

IBM Big Data Innovations Heading to System z

April 4, 2013

Earlier this week IBM announced new technologies intended to help companies and governments tackle Big Data by making it simpler, faster and more economical to analyze massive amounts of data. Its latest innovations, IBM suggested, would drive reporting and analytics results as much as 25 times faster.

The biggest of IBM’s innovations is BLU Acceleration, targeted initially for DB2. It combines a number of techniques to dramatically improve analytical performance and simplify administration. A second innovation, referred to as the enhanced Big Data Platform, improves the use and performance of the InfoSphere BigInsights and InfoSphere Streams products. Finally, it announced the new IBM PureData System for Hadoop, designed to make it easier and faster to deploy Hadoop in the enterprise.

BLU Acceleration is the most innovative of the announcements, probably a bona fide industry first, although others, notably Oracle, are scrambling to do something similar. BLU Acceleration enables much faster access to information by extending the capabilities of in-memory systems. It allows the loading of data into RAM instead of residing on hard disks for faster performance and dynamically moves unused data to storage.  It even works, according to IBM, when data sets exceed the size of the memory.

Another innovation included in BLU Acceleration is data skipping, which allows the system to skip over irrelevant data that doesn’t need to be analyzed, such as duplicate information. Other innovations include the ability to analyze data in parallel across different processors; the ability to analyze data transparently to the application, without the need to develop a separate layer of data modeling; and actionable compression, where data no longer has to be decompressed to be analyzed because the data order has been preserved.   Finally, it leverages parallel vector processing, which enables multi-core and SIMD (Single Instruction Multiple Data) parallelism.

During testing, IBM reported, some queries in a typical analytics workload ran more than 1000x faster when using the combined innovations of BLU Acceleration. It also resulted in 10x storage space savings during beta tests. BLU acceleration will be used first in DB2 10.5 and Informix 12.1 TimeSeries for reporting and analytics. It will be extended for other data workloads and to other products in the future.

BLU Acceleration promises to be as easy to use as load-and-go.  BLU tables coexist with traditional row tables; using the same schema, storage, and memory. You can query any combination of row or BLU (columnar) tables, and IBM assures easy conversion of conventional tables to BLU tables.

DancingDinosaur likes seeing the System z included as an integral part of the BLU Acceleration program.  The z has been a DB2 workhorse and apparently will continue to be as organizations move into the emerging era of big data analytics. On top of its vast processing power and capacity, the z brings its unmatched quality of service.

Specifically, IBM has called out the z for:

  • InfoSphere BigInsights via the zEnterprise zBX for data exploration and online archiving
  • IDAA (in-memory Netezza technology) for reporting and analytics as well as operational analytics
  • DB2 for SQL and NoSQL transactions with enhanced Hadoop integration in DB2 11 (beta)
  • IMS for highest performance transactions with enhanced Hadoop integration  in IMS 13 (beta)

Of course, the zEnterprise is a full player in hybrid computing through the zBX so zEnterprise shops have a few options to tap when they want to leverage BLU Accelerator and IBM’s other big data innovations.

Finally, IBM announced the new IBM PureData System for Hadoop, which should simplify and streamline the deployment of Hadoop in the enterprise. Hadoop has become the de facto open systems approach to organizing and analyzing vast amounts of unstructured as well as structured data, such as posts to social media sites, digital pictures and videos, online transaction records, and cell phone location data. The problem with Hadoop is that it is not intuitive for conventional relational DBMS staff and IT. Vendors everywhere are scrambling to overlay a familiar SQL approach on Hadoop’s map/reduce method.

The new IBM PureData System for Hadoop promises to reduce from weeks to minutes the ramp-up time organizations need to adopt enterprise-class Hadoop technology with powerful, easy-to-use analytic tools and visualization for both business analysts and data scientists. It also provides enhanced big data tools for management, monitoring, development, and integration with many more enterprise systems.  The product represents the next step forward in IBM’s overall strategy to deliver a family of systems with built-in expertise that leverages its decades of experience in reducing the cost and complexity associated with information technology.

IBM Technical Computing Tackles Big Data

October 26, 2012

IBM Technical Computing, also referred to as high performance computing (HPC), bolstered its Platform Computing Symphony product for big data mainly by adding enterprise-ready InfoSphere BigInsights Hadoop capabilities. The Platform Symphony product now includes Apache Hadoop, map/reduce and indexing capabilities, application accelerators, and development tools. IBM’s recommended approach to simplifying and accelerating big data analytics entails the integration of Platform Symphony, General Parallel File System (GPFS), Intelligent Cluster, and DCS3700 storage.

This is not to say that IBM is leaving the traditional supercomputing and HPC market. Its Sequoia supercomputer recently topped the industry by delivering over 16 petaflops of performance.  Earlier this year it also unveiled the new LRZ SuperMUC system, built with IBM System x iDataPlex direct water cooled dx360 M4 servers encompassing more than 150,000 cores to provide a peak performance of up to three petaflops.  SuperMUC, run by Germany’s Bavarian Academy of Science’s Leibniz Supercomputing Centre, will be used to explore the frontiers of medicine, astrophysics, quantum chromodynamics, and other scientific disciplines.

But IBM is intent on broadening the scope of HPC by pushing it into mainstream business. With technical computing no longer just about supercomputers the company wants to extend technical computing to diverse industries. It already has a large presence in the petroleum, life sciences, financial services, automotive, aerospace, defense, and electronics for compute-intensive workloads. Now it is looking for new areas where a business can exploit technical computing for competitive gain.  Business analytics and big data are the first candidates that come to mind.

When it comes to big data, the Platform Symphony product already has posted some serious Hadoop benchmark results:

  • Terasort , a big data benchmark that tests the efficiency MapReduce clusters in handling very large datasets—Platform Symphony used 10x less cores
  • SWIM, a benchmark developed at UC Berkley that simulates real-world workload patterns on Hadoop clusters—Platform Symphony ran 6x faster
  • Sleep, a standard measure to compare core scheduling efficiency of MapReduce workloads—Platform Symphony came out 60x faster.

Technical computing at IBM involves System x, Power, System i, and PureFlex—just about everything except z. And it probably could run on the z too through x or p blades in the zBX.

Earlier this month IBM announced a number of technical computing enhancements including a high-performance, low-latency big data platform encompassing IBM’s Intelligent Cluster, Platform Symphony, IBM GPFS, and System Storage DCS3700. Specifically for Platform Symphony is a new low latency Hadoop multi-cluster capability that scales to 100,000 cores per application and shared memory logic for better big data application performance.

Traditionally, HPC customers coded their own software to handle the nearly mind-boggling complexity of the problems they were trying to solve. To expand technical computing to mainstream business, IBM has lined up a set of ISVs to provide packaged applications covering CAE, Life Science, EDA, and more. These include Rogue Wave, ScaleMP, Ansys, Altair, Accelrys, Cadence, Synopsys, and others.

IBM also introduced the new Flex System HPC Starter Configuration, a hybrid system that can handle both POWER7 and System x.  The starter config includes the Flex Enterprise Chassis, an Infiniband (IB) chassis switch, Power7 compute node, and an IB expansion card for Power or x86 nodes. Platform Computing software handles workload management and optimizes resources. IBM describes it as a high density, price/performance offering but hasn’t publicly provided any pricing. Still, it should speed time to HPC.

As technical computing goes mainstream it will increasingly focus on big data and Hadoop.  Compute-intensive, scientific-oriented companies already do HPC. The newcomers want to use big data techniques to identify fraud, reduce customer churn, make sense of customer sentiment, and similar activities associated with big data. Today that calls for Hadoop which has become the de facto standard for big data, although that may change going forward as a growing set of alternatives to Hadoop gain traction.

Hadoop—new possibilities for the z196

November 8, 2010

Even before IBM introduced the zEnterprise as a hybrid mainframe it was thinking about Hadoop on the mainframe as part of its Blue Cloud initiative in 2007. That would include Xen and PowerVM virtualized Linux operating system images and Hadoop parallel workload scheduling.

Although Blue Cloud wasn’t specifically a mainframe initiative, even in 2007 the mainframe running Linux and z/VM could act as a Hadoop platform. More recently IBM turned to Hadoop for its InfoSphere BigInsights,  an analytics platform built on top of the Apache Hadoop open framework for storing, managing and gaining insights from Internet-scale data.

Hadoop uses a programming model and software framework called Map/Reduce for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. A second piece is the Hadoop Distributed File System (HDFS).

As Apache explains it, HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. When a program initiates a data search, each node looks at its data and processes it as instructed.

Hadoop, needs some form of grid computing. By 2010, grid computing had morphed into cloud computing. At IOD a few weeks back, IBM introduced a development test cloud for trying out Hadoop applications that, according to IBM, fit well with cloud principles and technology.

With the z196 the possibilities of Hadoop for mainframe shops become much more interesting. To begin, the z196 has the resource capacity to run hundreds, if not thousands, of virtualized Linux servers, each with designated storage capacity. That lays the foundation for a Hadoop platform.

Through a flexible, fault resistant, Hadoop-based infrastructure organizations can run different types of workloads. Analytics of massive amounts of data has emerged as a primary enterprise workload for Hadoop on the mainframe.

For analytics, the z196 can run Cognos on Linux on z. Better yet, you could attach the zBX extension cabinet and populate it with IBM Smart Analytics Optimizer cards running against Hadoop data sets. Or use some other POWER-based analytics running on POWER cards inside the zBX.

For z196 shops, the plan would be to use the machine as the platform for a private cloud that captured, stored, managed, and analyzed massive amounts of data generated from web applications, meters and sensors, POS systems, clickstreams, and such using Hadoop. Already, one midsize z196 user has deployed the machine to serve images and video to online shoppers. It is not too big a stretch to imagine tapping Hadoop capabilities to do more with the data.

Whether any of this happens depends on pricing. To begin, IBM has to reduce the cost of a z196 Hadoop environment because today companies are building these out of the cheapest commodity components.

The necessary cost reductions probably won’t happen until late 2011, when IBM plans to introduce Solution Edition discounts for the z196 comparable to the z10 Enterprise Linux Solution Edition discounts last year. It also will depend on how the pricing shakes out for the various zBX blades. To date IBM has talked a good Hadoop game but has given no hints of any readiness to cut pricing enough in the future to make it practical on the z196.

 

IBM tools up z196 for BI analytics

November 1, 2010

Last week was a business intelligence (BI) love fest at IBM’s Information On Demand (IOD) conference in Las Vegas. DB2 for z/OS was featured, as was Cognos 10 and Smart Analytics. IBM is pushing BI heavily for the zEnterprise; note that one of the first blades coming for the zEnterprise is a Smart Analytics Optimizer blade to pop into the zBX extension cabinet. But IOD went beyond that; even enterprise content management, InfoSphere, and IMS got to share the spotlight.

The overall theme of the conference seemed more about business analytics for the masses, as IBM addressed extending BI and analytics to social networking, collaboration and mobile workforces.  For sure, this is not traditional BI for the enterprise and certainly not for the mainframe, at least not yet.

At IOD, IBM introduced Cognos 10,  which sports a new user interface. The goal is to combine social collaboration and analytics for the purpose of delivering real-time intelligence both online and through mobile devices such as iPad, iPhones, and Blackberry devices.

DB2 also got a boost with DB2 10, which promises to deliver a 40% performance improvement. IBM InfoSphere Server got a remake through what IBM describes as new software that redefines how an organization handles data behind the scenes and provides faster and more accurate integration of diverse forms of data.

More interesting promised to be a technology preview of Hadoop-based big data analytics software running on the IBM Test Development cloud. Hadoop, a project of the Apache Software Foundation, has potential in large enterprises, especially for private clouds.

Hadoop is a distributed computing platform for processing extremely large amounts of data.  To date, it has not been an enterprise data center or mainframe game, but as private clouds proliferate and need to scale to extreme levels, don’t be surprised to see Hadoop in the mainframe data center, maybe running on thousands of virtualized Linux machines.

IOD even encompassed unstructured data. Here IBM is aiming to address content-centric processes and manage unstructured content, such as scanned images, electronic documents, web pages, video, email and text messaging. IBM expects 650% enterprise data growth over the next five years. Of this data 80% will be unstructured, generated from forms, web content, chat transcripts, and such.

Of specific interest at IOD to System z shops was IBM ECM for z. ECM typically is handled on distributed platforms.  At IOD IBM was touting nine ECM products for the z, including Content Integrator for z/OS, Content Manager for z/OS, DB2 ImagePlus for z/OS, and more.

BI is one thing, analytics is another. In the past organizations ran analytics, especially real time analytics, on processors like POWER that are optimized for compute-intensive processing. That meant doing the analytics processing on Power Systems machines. Going forward, z196 shops will be able to run the IBM Smart Analytics Optimizer on POWER7 blades within the new extension cabinet to streamline the analytics process. Now all that’s missing are the actual blades and z extension cabinet to ship.

 

 


Follow

Get every new post delivered to your Inbox.

Join 651 other followers

%d bloggers like this: