IBM Research Cracks Code on Accelerating Key Machine Learning Algorithms
Note: There is an update to SnapML, that is covered in this blog.
Deep learning is well known to be very amenable to GPU acceleration. Accelerating “traditional” machine learning methods like logistic regression, linear regression, and support vector machines with GPUs at scale, has, however, been challenging. Today I am very proud to share a major breakthrough that IBM Research has made in this critical area.
A team out of our Zurich IBM Research lab beat a previous performance benchmark set for a machine learning workload by Google by 46 times. The research team trained a logistic regression classifier to predict clicks on advertisements using a Terabyte-scale data set that consists of online advertising click-thru data, containing 4.2 billion training examples and 1 million features. The IBM run took 91.5 seconds versus a Google run on the same data set that took 70 minutes (see Arxiv paper for technical details).

The IBM team used four IBM Power Systems AC922 POWER9 servers, each with two POWER9 CPUs and four NVIDIA Tesla V100 GPUs. Google used 60 worker Intel x86 CPU-only machines, and 29 parameter machines in Google Cloud. So, we went from ~90 machines used by Google and 70 minutes, down to 4 POWER9 servers with NVIDIA GPUs, and just 91.5 seconds.
We built a new machine learning called “Snap ML” that consists of this logistic regression algorithm, along with GPU-accelerated versions of linear regression and support vector machine algorithms and supporting functions for scaling to multiple servers and smart memory management.
Broad Business Use Cases of Logistic and Linear Regression
What’s important to emphasize here is that we aren’t just presenting a benchmark result — we are dramatically accelerating some of the most relevant machine learning problems in the industry. In fact, Kaggle surveyed over 16,000 data science and machine learning professionals and found that logistic regression was the most common tool of their trade, used by over 60% of them in their work.

Logistic regression helps solve binary classification problems, like business questions which have a yes/no answer or which involve picking between two choices. For example, “Will this user click on the ad which appears in front of them?” or “Is this transaction fraudulent or not fraudulent?”, or “Is this email spam or not spam?”
But, that’s not all. The Snap ML library also offers GPU-acceleration and distributed-computing speedups to two additional popular machine learning algorithms — support vector machines and linear regression. These two algorithms help solve an even broader set of industry problems, pushing into use cases in forecasting sales or other events, evaluation of trends, and prediction of consumer behavior.
Making Machine Learning a “Snap”
Snap ML is built on a set of primitives for generalized linear models that are optimized for GPU acceleration called libGLM, as shown in the figure below. Snap ML also distributes the ML model training across multiple servers using either Apache Spark or the high-performance computing MPI protocol. Future work includes adding a distributed hyper-parameter optimizer.

Data scientists can use Snap ML as a standalone library with Python, with Scikit-learn, or with Apache Spark. We are also working on integration with TensorFlow.
3 Major Breakthroughs Fuel Snap ML Innovations

We co-optimized the software and hardware to build the Snap ML library. There are three major breakthroughs that enable the performance benefits in Snap ML:
1. GPU Acceleration: The IBM Research team built specialized solvers that take advantage of the massively parallel architecture of GPUs to get the biggest benefit in these algorithms.
2. Dynamic Memory Management: Snap ML gets a 3.5 times speedup because of the high-speed next-generation NVIDIA NVLink connection between the POWER9 CPU and the NVIDIA Tesla V100 GPUs, due to the ability to move the large data set from the system memory to the GPU memory much faster. A dynamic data transfer algorithm runs on the CPU to determine which data to move next to the GPU.
3. Efficient Cluster Scaling: We built Snap ML as a data-parallel framework, which enables us to scale out and train with massive datasets by distributing the data across multiple servers.
Snap ML will be more broadly available later this year as a technology preview in our Watson ML CE (aka PowerAI) machine and deep learning software distribution. We are currently looking for a few lead clients, who want to work with us to take advantage of Snap ML.
If the massive acceleration of any of these three algorithms — logistic regression, linear regression, and SVMs — has a large impact on your work, please leave a comment below and let me know. The new version of SnapML, now also accelerates decision trees & random forests (link).
Learn More about Watson ML CE (PowerAI)
Watson ML CE (PowerAI) is a software suite based on open-source AI frameworks like TensorFlow, PyTorch, etc. Learn more in these blogs: