Programming, Technical, Uncategorized

Installing scikit-learn; Python Data Mining Library

Update: The instructions of this post are for Python 2.7. If you are using Python 3, the process is simplified. The instructions are here:

Starting with a Python 3.6 environment.

Assumptions (What I expect to already be installed):

  1. Install numpy
  2. Install scipy
  3. Install sklearn
pip install numpy
pip install scipy
pip install sklearn

Test installation by opening a python interpreter and importing sklearn:

python
import sklearn

If it successfully imports (no errors), then sklearn is installed correctly.

Introduction

Scikit-learn is a great data mining library for Python. It provides a powerful array of tools to classify, cluster, reduce, select, and so much more. I first encountered scikit-learn when I was developing prototypes for my first business venture. I wanted to use something that was easy and powerful. Scikit-learn was just that tool.

The only problem with scikit-learn is that it builds off of some powerful-yet-finicky libraries, and you will need to install those libraries, NumPy and SciPy, before you can proceed with installing scikit-learn.

To a novice, this can be a frustrating task since the order of installation matters and many Google searches will only produce unhelpful and long-winded responses. Thus, my motivation to set the record straight and provide a quick tutorial on how to install scikit-learn — mostly on Windows, but I have provided links and notes on both Linux and Mac installations as well.

In the process of this tutorial, you will install (or already have) the following — in this order:

NOTE: I have provided the links unlabeled above because, like all tech/installation tutorials, over time they become obsolete. By providing the links as they are, it is my hope that even if new versions come out, you will be able to use this tutorial to find the resources you need.

Step 1: Install Python

If you do not already have Python, install it now at the address provied above (https://www.python.org/downloads/). I will be using Python 2.7 for this tutorial.

The installer for python is quick and good. Once installed, we will need to check to see if Python is available on the command line. Open a terminal by searching for ‘cmd’ or running C:\Windows\System32\cmd.exe. I would recommend creating a shortcut if you are doing this a lot.

in the command line, enter:

python --version

something similar to “Python 2.7.6” should display. That shows that python is working and accessible from the cmd line.

Step 2: Install NumPy

NumPy is a powerful library for Python that contains advanced numerical capabilities.

Install NumPy by downloading the correct installer using the link provided above (http://sourceforge.net/projects/numpy/files/NumPy/1.10.2/) then run the installer.

NOTE: There are a few installers based on your OS version AND the version of Python you have. It is important that you find the right installer for your OS and Python version!

Step 3: Install SciPy

Download the SciPy installer using the link provided above (http://sourceforge.net/projects/scipy/files/scipy/0.16.1/) and run it.

NOTE: There are a few installers based on your OS version AND the version of Python you have. It is important that you find the right installer for your OS and Python version!

Step 4: Install Pip

Pip is a package manager specifically for Python. It comes in handy so much that I highly recommend that you install it to help manage python packages.

Go to the link provided above (https://pip.pypa.io/en/stable/installing/).

The easiest way to install pip on Windows is by using the ‘get_pip.py’ script and then running it in your command line:

python get_pip.py

If you are on Linux you can use apt-get (or whatever package manager you have):

sudo apt-get install python-pip

Step 5: Install scikit-learn

NOTE: More information on installing scikit-learn at the link provided above (http://scikit-learn.org/stable/install.html)

On Windows: use pip to install scikit-learn:

pip install scikit-learn

On Linux: Use the package manager or follow the build instructions at http://www.bogotobogo.com/python/scikit-learn/scikit-learn_install.php

Step 6: Test Installation

Now we must see if everything installed correctly. Open up a command line terminal and type:

python

This will open a python interpreter. You will know this because there will be some text and three chevrons, “>>>”, prompting input. Type:

import sklearn

If nothing happens and another prompt appears scikit-learn has been installed correctly.

If an error occurs, there might have been a mis-step in the process. Go back through the tutorial to see if any steps were missed or follow the error message that was given.

Standard