Update: The instructions of this post are for Python 2.7. If you are using Python 3, the process is simplified. The instructions are here:
Starting with a Python 3.6 environment.
Assumptions (What I expect to already be installed):
- Python 3.6 installed
- Pip installed (If it is not already installed, download and install pip: https://pip.pypa.io/en/stable/installing/)
- Install numpy
- Install scipy
- Install sklearn
pip install numpy pip install scipy pip install sklearn
Test installation by opening a python interpreter and importing sklearn:
python
import sklearn
If it successfully imports (no errors), then sklearn is installed correctly.
Introduction
Scikit-learn is a great data mining library for Python. It provides a powerful array of tools to classify, cluster, reduce, select, and so much more. I first encountered scikit-learn when I was developing prototypes for my first business venture. I wanted to use something that was easy and powerful. Scikit-learn was just that tool.
The only problem with scikit-learn is that it builds off of some powerful-yet-finicky libraries, and you will need to install those libraries, NumPy and SciPy, before you can proceed with installing scikit-learn.
To a novice, this can be a frustrating task since the order of installation matters and many Google searches will only produce unhelpful and long-winded responses. Thus, my motivation to set the record straight and provide a quick tutorial on how to install scikit-learn — mostly on Windows, but I have provided links and notes on both Linux and Mac installations as well.
In the process of this tutorial, you will install (or already have) the following — in this order:
- Python 2.7+ (https://www.python.org/downloads/)
- NumPy 1.6.1+ (http://sourceforge.net/projects/numpy/files/NumPy/1.10.2/)
- SciPy 0.9+ (http://sourceforge.net/projects/scipy/files/scipy/0.16.1/)
- Pip (https://pip.pypa.io/en/stable/installing/)
- scikit-learn (http://scikit-learn.org/stable/install.html)
NOTE: I have provided the links unlabeled above because, like all tech/installation tutorials, over time they become obsolete. By providing the links as they are, it is my hope that even if new versions come out, you will be able to use this tutorial to find the resources you need.
Step 1: Install Python
If you do not already have Python, install it now at the address provied above (https://www.python.org/downloads/). I will be using Python 2.7 for this tutorial.
The installer for python is quick and good. Once installed, we will need to check to see if Python is available on the command line. Open a terminal by searching for ‘cmd’ or running C:\Windows\System32\cmd.exe. I would recommend creating a shortcut if you are doing this a lot.
in the command line, enter:
python --version
something similar to “Python 2.7.6” should display. That shows that python is working and accessible from the cmd line.
Step 2: Install NumPy
NumPy is a powerful library for Python that contains advanced numerical capabilities.
Install NumPy by downloading the correct installer using the link provided above (http://sourceforge.net/projects/numpy/files/NumPy/1.10.2/) then run the installer.
NOTE: There are a few installers based on your OS version AND the version of Python you have. It is important that you find the right installer for your OS and Python version!
Step 3: Install SciPy
Download the SciPy installer using the link provided above (http://sourceforge.net/projects/scipy/files/scipy/0.16.1/) and run it.
NOTE: There are a few installers based on your OS version AND the version of Python you have. It is important that you find the right installer for your OS and Python version!
Step 4: Install Pip
Pip is a package manager specifically for Python. It comes in handy so much that I highly recommend that you install it to help manage python packages.
Go to the link provided above (https://pip.pypa.io/en/stable/installing/).
The easiest way to install pip on Windows is by using the ‘get_pip.py’ script and then running it in your command line:
python get_pip.py
If you are on Linux you can use apt-get (or whatever package manager you have):
sudo apt-get install python-pip
Step 5: Install scikit-learn
NOTE: More information on installing scikit-learn at the link provided above (http://scikit-learn.org/stable/install.html)
On Windows: use pip to install scikit-learn:
pip install scikit-learn
On Linux: Use the package manager or follow the build instructions at http://www.bogotobogo.com/python/scikit-learn/scikit-learn_install.php
Step 6: Test Installation
Now we must see if everything installed correctly. Open up a command line terminal and type:
python
This will open a python interpreter. You will know this because there will be some text and three chevrons, “>>>”, prompting input. Type:
import sklearn
If nothing happens and another prompt appears scikit-learn has been installed correctly.
If an error occurs, there might have been a mis-step in the process. Go back through the tutorial to see if any steps were missed or follow the error message that was given.