Snap ML: 2x to 40x Faster Machine Learning than Scikit-Learn

5 min readMar 14, 2019

Last year, we announced Snap ML, a python-based machine learning framework that is designed to be a high-performance machine learning software framework. Snap ML is bundled as part of the WML Community Edition or WML CE (aka PowerAI) software distribution that is available for free on Power systems.

Take advantage of this 7-day Free Remote Trial of Snap ML and PowerAI Vision.

Snap ML is upto 46x Faster when using GPUs vs Scikit-learn & TensorFlow

The first release of Snap ML enabled GPU-acceleration of generalized linear models (GLMs) and also enabled scaling these models to multiple GPUs and multiple servers. GLMs are popular machine learning algorithms, which include logistic regression, linear regression, ridge and lasso regression, and support vector machines (SVMs).

Our previous results showed that Logistic Regression from Snap ML running is 46 times faster than other machine learning frameworks, which rely on CPUs alone.

Our latest version of Snap ML adds high performance implementations of Decision Trees and Random Forests. Our implementation of these algorithms takes advantage of multiple CPU cores and threads (but so far do not take advantage of GPUs). Click here for documentation on Snap ML.

Snap ML 2 to 4x Faster than Scikit-Learn for Decision Trees and Random Forests

Our implementation of decision trees and random forests in Snap ML is between 2 times to 4 times faster than the implementations in scikit-learn, which is the most popular machine learning scikit-learn software.

This comparison is between a single Power9 CPU against a single latest Intel x86 CPU. This speed-up is primarily because our implementation of these algorithms is designed for high-performance.

The linear models in Snap ML like logistic regression, linear regression, and SVMs are accelerated by GPUs (read our earlier Snap ML blog to learn more).

3x Speedup due to NVLink between Power CPUs & NVIDIA GPUs

We also compared running a linear regression model in Snap ML on a Power9-based AC922 server to an x86-based, both of which have the latest NVIDIA V100 GPU accelerator. We find that the Power9 AC922 server is 3 times faster than the x86 server.

This speedup is due to the unique NVLink high-speed connection that is embedded in the Power9 CPU and the NVIDIA V100 GPU, that is up to 5 times faster data transfer between the CPU and GPU compared to x86 based systems. The slow PCI-e interface in servers with x86 CPUs and NVIDIA GPUs becomes a bottleneck for machine learning and deep learning training jobs.

Machine Learning Methods Still Rule the Roost

A recent survey the data science community website Kaggle.com found that the three most popular AI algorithms used by data scientist for “real” problems are logistic regression, decision trees, and random forests. All three of these are now accelerated by the IBM Snap ML high-performance machine learning framework.

Data scientists continue to use machine learning algorithms more often than deep learning algorithms because:

1. Fast training: These machine learning methods train in minutes or hours with datasets with billions of examples and features, compared to days of training that most deep learning models take.

2. Explainable AI: Linear models are easily interpretable since they explicitly assign an importance to each feature. Tree models are interpretable as they explicitly illustrate the path to a decision.

3. Less data required: Machine learning models can create fairly accurate models with a small amount of data. Deep learning models require a very large amount of data to get reasonable accuracy.

Interpretability or explainable AI is particularly important in industries like finance and healthcare where the reasons behind decisions have to be easily explainable to regulators and consumers. Imagine not being able to tell someone why their home loan was rejected!

Try Snap ML today on Power AI DevBox or AI Starter Kit

You can try all our software including Snap ML for free by taking advantage of this 7-day Free Remote Trial of Snap ML and PowerAI Vision.

I am also delighted to announce that we are making it even easier for companies to get started with AI, by introducing two things:

- Power AI DevBox by Raptor: A Power9 and NVIDIA GPU based desktop PC by Raptor Systems designed as a developer system for AI applications. Developers can use the 30-day free software licenses for our Auto-AI vision software, called PowerAI Vision. This software is also available for free via our academic program. You can buy a Raptor Developer box here.

- IBM AI Starter Kit: A set of GPU-accelerated AC922 Power9 servers with the WML Accelerator software pre-installed. This includes all the major open-source AI software frameworks like TensorFlow, pyTorch, Keras, et cetera, and the management software for managing multiple data scientists and the AI training tasks that they run.

These two simple and inexpensive AI hardware and software infrastructure options enable developers, data scientists, and enterprises to easily get the right hardware systems and software tools that they need to start developing their first AI application.

The auto-AI PowerAI Vision software can enable students and researchers in biology, physics, and chemistry at universities to easily build AI models for images and videos, without any knowledge of machine and deep learning.

I hope these advances in machine learning and the ease of access to systems and software will enable many more data scientists and enterprises to build their first application powered by AI.

Please leave a comment and let us know how PowerAI Vision or Snap ML will help you in your next AI project.