Installing scikit-learn; Python Data Mining Library

Update: The instructions of this post are for Python 2.7. If you are using Python 3, the process is simplified. The instructions are here:

Starting with a Python 3.6 environment.

Assumptions (What I expect to already be installed):

  1. Install numpy: pip install numpy
  2. Install scipy: pip install scipy
  3. Install sklearn: pip install sklearn

Test installation by opening a python interpreter and importing sklearn:
import sklearn

If it successfully imports (no errors), then sklearn is installed correctly.


Scikit-learn is a great data mining library for Python. It provides a powerful array of tools to classify, cluster, reduce, select, and so much more. I first encountered scikit-learn when I was developing prototypes for my first business venture. I wanted to use something that was easy and powerful. Scikit-learn was just that tool.

The only problem with scikit-learn is that it builds off of some powerful-yet-finicky libraries, and you will need to install those libraries, NumPy and SciPy, before you can proceed with installing scikit-learn.

To a novice, this can be a frustrating task since the order of installation matters and many Google searches will only produce unhelpful and long-winded responses. Thus, my motivation to set the record straight and provide a quick tutorial on how to install scikit-learn — mostly on Windows, but I have provided links and notes on both Linux and Mac installations as well.

In the process of this tutorial, you will install (or already have) the following — in this order:

NOTE: I have provided the links unlabeled above because, like all tech/installation tutorials, over time they become obsolete. By providing the links as they are, it is my hope that even if new versions come out, you will be able to use this tutorial to find the resources you need.

Step 1: Install Python

If you do not already have Python, install it now at the address provied above ( I will be using Python 2.7 for this tutorial.

The installer for python is quick and good. Once installed, we will need to check to see if Python is available on the command line. Open a terminal by searching for ‘cmd’ or running C:\Windows\System32\cmd.exe. I would recommend creating a shortcut if you are doing this a lot.

in the command line, enter:

python –version

something similar to “Python 2.7.6” should display. That shows that python is working and accessible from the cmd line.

Step 2: Install NumPy

NumPy is a powerful library for Python that contains advanced numerical capabilities.

Install NumPy by downloading the correct installer using the link provided above ( then run the installer.

NOTE: There are a few installers based on your OS version AND the version of Python you have. It is important that you find the right installer for your OS and Python version!

Step 3: Install SciPy

Download the SciPy installer using the link provided above ( and run it.

NOTE: There are a few installers based on your OS version AND the version of Python you have. It is important that you find the right installer for your OS and Python version!

Step 4: Install Pip

Pip is a package manager specifically for Python. It comes in handy so much that I highly recommend that you install it to help manage python packages.

Go to the link provided above (

The easiest way to install pip on Windows is by using the ‘’ script and then running it in your command line:


If you are on Linux you can use apt-get (or whatever package manager you have):

sudo apt-get install python-pip

Step 5: Install scikit-learn

NOTE: More information on installing scikit-learn at the link provided above (

On Windows: use pip to install scikit-learn:

pip install scikit-learn

On Linux: Use the package manager or follow the build instructions at

Step 6: Test Installation

Now we must see if everything installed correctly. Open up a command line terminal and type:


This will open a python interpreter. You will know this because there will be some text and three chevrons, “>>>”, prompting input. Type:

import sklearn

If nothing happens and another prompt appears scikit-learn has been installed correctly.

If an error occurs, there might have been a mis-step in the process. Go back through the tutorial to see if any steps were missed or follow the error message that was given.


An Experiment on PasteBin

A while ago I was browsing the public pastes on PasteBin and I came across a few e-mail/password dumps from either malware or some hacker trying to make a name for himself.

As I perused the information, I was shocked to find usernames, emails, passwords, social security numbers, credit card numbers, and more in these dumps. I reported the posts as credit card info and SSNs are nothing to trifle with, but the thought lingered as to why they were public in the first place. There must be a way to automate the process of reporting these posts, I thought, usernames and especially passwords hold a very unique signature: at least one upper-case letter, at least one lower-case letter, at least one digit, and at least 8 characters long.

How many words in the english language have that particular combination?

This question inevitably led to an experiment.

The parameters were quite simple: How accurate can I identify a password that is surrounded by junk text in a post?
This is actually harder than is seems as we can’t simply assume that the posts will be in English, or that they will be a human language at all (code). This presented an interesting problem to work with and I started development of a framework to solve it.

The solution

The system to test my question is quite simple. It includes a web page scraper and an analysis engine.

The scraper is simple enough and goes to pastebin’s public post archive and pulls all of the links to “pastes” contained therein. It then grabs only the paste text from each page and adds them to a list. This list is sent to the analysis engine.

The analysis engine uses a spam filter-like merit score to help identify interesting pastes and discard pastes that do not have anything interesting in them.

It uses a series of filters to affect the merit score:

The first one is a simple password identification. It uses a master list of popular passwords and searches each paste for them. If a keyword is found, the post’s merit score is increased.

The second filter is keyword identification. This is similar to the password identification but it includes words and phrases that are not passwords but might signal a paste that is more likely to have passwords in it. These keywords are held in a dictionary that also stores the associated merit value (positive or negative).

The third filters are the basic password rules:

  • Must have at least one capital letter
  • Must have at least one lower-case letter
  • Must have at least one digit
  • Must be at least 8 characters long

The analysis engine then returns a list of all of the links sorted by “most-likely to have a password” — Highest probability at the top.

Results and Conclusion

I initially found that the basic filters I had created were getting less fast positives than the basic password filter (#3) but still wouldn’t get promising results. The accuracy of the identification would have to be improved before I attempted any sort of automation for reporting. So I have open-sourced the software and made it available on pip (as it is written in python):

The project is called “Pastebin Password Scraper” or PBPWScraper:

Here is the PBPWScraper Github

You can use pip to install the latest release version of the library by entering:

pip install PBPWScraper

It was an interesting experiment and it is fun to tweak the filters to improve certain aspects of the analysis. I will continue to work on the system and see if I am able to decrease its false-positive count enough to warrant an automated reporting module.

Privacy: A How-To


With the leak of classified NSA documents and their entailing revelations, Edward Snowden has become a household name. He single-handedly caused millions of people to rethink their electronic lives – and their assumptions of privacy. Now, those people (and businesses) are scrambling to find solutions to a problem they didn’t know existed, or chose to remain blissfully unaware, a number of months ago.

There have been numerous blog posts and documents about enhancing your systems to increase privacy protection, and I thought that I would summarize many of them from the perspective of someone who works in the industry. The sections of this article are organized in order of complexity (and tinfoil hattiness). The easiest and most basic measures will be in section 1 while the most complex and restrictive measures will be in the last.

Before we begin, it is important to talk a bit about expected threats and mitigations. Mitigations are simply the measures you take to deal with a threat satisfactorily – Hopefully completely, but not always. A threat is anything that is considered an opponent to your security and privacy in this case. It is important to figure out what kind of threat you are dealing with and take the appropriate actions to mitigate it.

For example, mitigations that stop basic malware and bots from getting your information may not be as effective against, say, a skilled and motivated attacker – such as an NSA operative, or hacker, or cleverly-designed system.

It is unlikely, honestly, if they really wanted your information, that you could mitigate the NSA threat. The NSA is an enormous government agency that is well-funded and extremely motivated. They employ intelligent and educated people who do this for a living. The goal is to raise the difficulty in tracking you just enough to exceed the minimum effort level that their automated systems will take for granted. Automated systems include bots and malware, along with other classified technologies, that gather information automatically – with no human in the loop. These threats we can mitigate.

Now that we have that out of the way, let’s dive in.

[Disclaimer]: These suggestions are a combination of sources (listed at the end) and my own. As such, this information is not fully my original content and I did not create it. I am simply listing it here for your convenience. Sources are cited as to the origin of suggestions.

Section 1: Basic Measures

Tin Foil Hat Level: “I read an article once about privacy and it scared me. I need a list of things I may, or may not, do.”

Threats: Basic email scams, scraping bots, potential job prospects, your mom

Be careful about what websites you go to and what you download. This includes e-mails and popups. If you don’t know it don’t click it. Also, don’t post anything that you wouldn’t want exposed. There is an old saying: “Once it’s on the internet, it’s forever”. This includes social media websites. Even if their terms of use say that they won’t use it, what is to stop them from changing it later on?

Don’t post identifying information if you don’t have to. In fact, don’t provide any information that isn’t needed. So you want to sign up for a music website? Why do they require you to include your mother’s maiden name, age, location, phone number, and birthdate? This includes mobile apps!

Google yourself. See what comes up. Try Bing or other search engines. If something comes up that you don’t like, try to take it offline and add new content with the same keywords that you used to find the offending item. It takes time. There are professionals that do this.

And lastly, don’t share passwords and account information with anyone!

No, that prince from Nigeria doesn’t need your account info to deposit millions of cash. No, you won’t win a free trip to Hawaii if you click that link that goes to No, you shouldn’t look at that attachment from a person you’ve never heard of before – from an email address you’ve never seen before. If the deal looks too good to be true, it almost always is. Sorry.

Now that wasn’t too hard! This works decently if your information isn’t on the internet already. Unfortunately, if you want to protect any information that is already online, this may not help.

Section 2: Novice Measures

Tin Foil Hat Level: “I read this article about privacy and the NSA and I need some help to protect my information! …Only if it’s not too intrusive though.”

Threats: most bots, scams, most malware, viruses, basic hacking attempts, account username/password attacks

OK, so you are already doing the basic measures but still don’t feel safe. Fair enough. There are lots of threats out there that can easily get past those mitigations if your information is already online. Let’s take it to the next level.

If you haven’t already, install antivirus software, malware protection, and cleaning tools.

For Windows, I use Spybot Search and Destroy 1.6.2 (or Malwarebytes), CCleaner, and Windows Security Essentials (or Windows Defender). Spybot does not prevent malware from getting on your computer, it simply removes it once it is on there. CCleaner cleans up your temporary files including cookies, etc. MS Security Essentials is an integrated system that “guards against viruses, spyware, and other malicious software. It provides real-time protection for your home or small business PCs”. Really, any antivirus software will be good, but you can look at reviews to see which one best suits your type of usage.

The key here is to layer. Defense in depth. MS Security Essentials may not get everything so you need Spybot or some other mitigation.

Update often. Honestly, you should be doing this already. This is a security tip, but security and privacy are inherently linked as preventing a breach in one helps prevent breaches in the other. This includes (for Windows) Windows Update and any software that you have installed (Java, Flash, browsers, etc).

Make sure you have a firewall. Windows has one built in. At least use that one.

Create strong passwords. Yeah the website asks for minimum 8 characters, but really, computers are wicked-fast. Brute-forcing passwords is getting easier. And there’s no reason not to make stronger passwords including longer strings of characters, numbers, capitals, etc. Also, stop using the same password for all of your accounts. If someone hacks one account, they get the keys to all accounts. Bad news.

Configure your browsers to delete history and cookies on close. This prevents a lot of cookies from hanging around after you’re done with them for no reason.While you’re at it, take a look at the security and privacy settings in your browser. Make sure that things are not being tracked and that add-ons can’t be installed without your consent.

Install a well-reputed security app on your smartphone. Malware for mobile devices is on the rise and you don’t want to get caught up in it.

Try to use HTTPS as much as possible (will show instead of, and learn what a certificate is, what it is used for in HTTPS, and why it is important. Avoid accepting less-than-reputable certificates.

Start reducing the amount of information you provide to social media sites such as Facebook, Twitter, Pintrest, Google Plus, etc. Does that information really need to be on there? Here’s a question, why is Facebook worth so much if it provides a free service? How about, why does Google give you so much for free (e-mail, documents, social media, etc) without charging anything? Fun fact: Google is an advertising company. A note about Google: “You are not their customer, you are their product”.

Section 3: Intermediate Measures

Tin Foil Hat Level: “The NSA is out there and I need to protect myself!”

Threats: bots, scams, malware, viruses, hacking attempts, account username/password attacks, XSS, Session Hijacking

Start installing browser add-ons!

Install “HTTPS Everywhere”, which forces HTTPS sessions with all websites that you go to. What does this do? HTTPS is the protocol for secure communication over the internet. HTTPS ensures that attackers can’t listen in on your communicaitions over the internet.

Install NoScript to your browser. NoScript will default-deny all scripts from running until you allow them. This can be very annoying at first, but once you have allowed the “elements” from the sites that you usually go to, it’s not that bad – Just make sure to check the icon if a movie isn’t playing or a page doesn’t load correctly. Also, you get to see what, exactly, is run behind the scenes on all of your favourite websites!

Install “AdBlock Plus” to your browser. This – you guessed it – blocks ads. Ads can be the vehicle that delivers malware. Don’t let them near you.

Install “Self-Destructing Cookies” to your browser. This add-on removes cookies as soon as they are not required.

Install the “Disconnect” add-on to your browser and to your phone. “Disconnect lets you visualize & block the invisible websites that track you”.

Install the “Better Privacy” add-on to your browser. “Remove or manage a new and uncommon kind of cookies, better known as LSO’s. The BetterPrivacy safeguard offers various ways to handle Flash-cookies set by Google, YouTube, Ebay and others…”

Your web browser is the window to the internet. It can be a benefit as well as a curse. These add-ons mitigate much of that “curse” aspect.

Section 4: Advanced and Restrictive Measures

Tin Foil Hat Level: “The NSA is just the tip of the iceburg, man! They’re watching everything! Nobody’s safe!!!”. Also, people complement you on the size of your tinfoil hat. You are the tinfoil-hattiest!

bots, scams, malware, viruses, hacking attempts, account username/password attacks, XSS, session hijacking, motivated attackers, attackers who may be able to gain physical access to your computer

These measures will require technical skills, and they will restrict what you can do online significantly, but they will provide the best defense of your privacy in comparison to the previous measures suggested.

The Phone:

Install ‘Replicant’ or ‘CyanogenMod’ on your phone. These are replacement operating systems for your phone. They will give you far better control of what information is sent to ‘the outside’.

Install SecDroid (for Android). This app controls what apps can use the internet.

Use F-Droid instead of the Google Play Store. The goal is to avoid Google products.

Look into making a custom case/”glove” for your phone that blocks out electronic signals (

Use Chromium (Open-source browser – is not Google Chrome), or Mozilla Firefox – with the add-ons suggested above.

The Computer:

Ditch Windows and Mac altogether. Go Linux: Ubuntu (a linux operating system) is a great alternative. There may be a bit of a learning curve, but it is not as bad as you may think! There are plenty of distributions of linux to suit your needs.

Encrypt your hard drive. Look into TrueCrypt or other similar tools. Encryption ensures that, even if they get your physical computer, the attacker can not access your files without your password.

Look into using VPNs (Virtual Private Networks) such as those provided by “Private Internet Access” (PIA), and see if they are right for you.

Look into “The Onion Router” (TOR). See if it is right for you.

Use Chromium (Open-source browser – is not Google Chrome), or Mozilla Firefox – with the add-ons suggested above.

Wrapping It Up

Many of these suggestions are extreme, and the list is far from complete. These are simply a great place to start no matter the size of you tinfoil hat.

I won’t judge.


Helpful hints about privacy from Microsoft:

What is information and internet privacy?:

Microsoft Security Essentials:

Detailed discussion about advanced mitigations for privacy:

“HTTPS Everywhere” browser addon:





Private Internet Access:



The Onion Router (TOR):



Git: List all branches, who made them, and when they were created

I ran into an interesting problem today that took a bit longer to figure out, and so I will post it here in the hopes that others will find the solution faster than I did.

My problem: I am inspecting a rather large and complex piece of software and need to look at its history – more specifically, I need to figure out when certain branches were created and who created them.

Now, I could have just hacked together a throw-away solution, but that’s not who I am. As I was looking, I found that there didn’t seem to be much on this – aside from the Stack Overflow questions.

So I whipped up a little script that gets all of the branches in the git repo and lists the details of their first commit. These details include the date, and creator – which is what I needed. (Bash script)

# Author: Caleb Shortt
# For some reason, lists the files and directories'
# history as well...(They're not all branches)
# Sources:
# Stack Overflow: Question: "What is the easiest/fastest way to find out when
# a git branch was created"

readarray -t branch_list <<< $(git branch -a)

for branch in ${branch_list[@]}
    echo '---------------------------------------------'
    echo "Branch: $branch"
    echo '---------------------------------------------'
    git show --summary `git merge-base master $branch`

Now there is a little hiccup with the script also printing out the details of whatever is in the directory, but the details for the branches are in there. I’ll update this if I figure out why it’s doing that.

Auto-generate HTML using a tree in Python

Here’s something I’ve been working on and thought was interesting.

I needed to dynamically generate HTML with varying degrees of nesting and attributes in Python. All I found was a few Stack Overflow questions on generating HTML – and some blogs talking about hard-coding the tags.

This is far from what I needed, so I started to look into making my own – besides, it looked fun!

I started by looking at what I actually wanted:

  • Needs to be easily extensible
  • Needs to be as tightly coupled as possible
  • Needs to be structured in a way that makes sense

With these requirements in mind, I elected to go the tree approach. HTML works much like a hierarchy: since the <html> tag is the root node and the <head> and <body> nodes are children of <html> and so on. Obviously it would be an n-ary tree as a node can have lots of children – or none.

Also, I would have to set up a specific tree traversal that would execute commands pre and post traversal – Namely printing out the beginning and ending tags.

This is what I came up with for the tree structure:

class DefTreeNode(object):
        A Protocol Definition Tree Node.
        It will contain a single payload (ex: a tag in HTML),
        a list of child nodes, a label, and what to do prefix
        and postfix during traversal.

        The payload is of type 'Generic_HTML_Tag'

        The label is a unique string that identifies the node.
        Both the label and the payload are required to initialize
        a node.

    def __init__(self, label, payload, contents=""):
        self.children = []
        self.payload = payload
        self.label = label
        self.contents = contents

    def addChild(self, child):
        if child:
            return True
        return False

    def setPayload(self, payload):
        if payload:
            self.payload = payload
            return True
        return False

    def setLabel(self, label):
        self.label = label

    def setContents(self, contents):
        self.contents = contents

    def getChildren(self):
        return self.children

    def getPayload(self):
        return self.payload

    def getLabel(self):
        return self.label

    def getPrefix(self):
        if self.payload:
            return self.payload.getPrefix()
            return None

    def getPostfix(self):
        if self.payload:
            return self.payload.getPostfix()
            return None

    def getContents(self):
        return self.contents

Note that the payload is not a string but (as the class descriptor comment says) is of type ‘Generic_HTML_Tag’. We will get to that in a second. Let’s finish the tree structure first.

Now that I have a node to work with, let’s make the tree. The tree class will contain the traversal code and a “find node” function, and it will hold the root node for the structure:

class DefinitionTree(object):
    def __init__(self, node):
        self.root = node

    def getRoot(self):
        return self.root

    def findNode(self, label):
        return self.recursive_findNode(label, self.root)

    def traverse(self):
        if self.root:
            return self.recursive_traverse(self.root)
            return ""

    def recursive_traverse(self, node, construction=""):
            Traverse the tree.
            This algorithm will run a pre-order traversal. When
            the algorithm encounters a new node it immediately
            calls its 'prefix' function, it then appends the
            node's payload, and traverses the node's children.
            After visiting the node's children, its 'postfix'
            function is called and the traversal for this node
            it complete.

            Returns a complete construction of the nodes' prefix,
            content, and postfix in a pre-order traversal.

            construction = construction + node.getPrefix() + node.getContents()

            for child in node.getChildren():
                construction = self.recursive_traverse(child, construction)

            return construction + node.getPostfix()

    def recursive_findNode(self, label, node):
        Executes a search of the tree to find the node
        with the specified label. This algorithm finds the first
        label that matches the search and is case insensitive.

        Returns the node with the specified label or None.

        if(node is None or node.getLabel().lower() == label.lower()):
            return node

        for child in node.getChildren():
            node = self.recursive_findNode(label, child)
            if node is not None:
                return node

Now, let's create the payload for the nodes. This will be done by creating a file to store our 'html tag classes':


class Generic_HTML_Tag(object):
        A Generic HTML tag class
        This class should not be called directly, but contains 
        the information needed to create HTML tag subclasses

    def __init__(self):
        self.prefix = ""
        self.postfix = ""
        self.indent = 0
        self.TAB = " "

    def getPrefix(self):
        return self.prefix

    def getPostfix(self):
        return self.postfix

    def setPrefix(self, prefix):
        self.prefix = prefix

    def setPostFix(self, postfix):
        self.postfix = postfix

class HTML_tag(Generic_HTML_Tag):
    def __init__(self, indent_level=0):
        self.indent = indent_level

    def generateHTMLPrefix(self):
        self.prefix = self.indent*self.TAB + "<html>"

    def generateHTMLPostfix(self):
        self.postfix = self.indent*self.TAB + "</html>"

class HEAD_tag(Generic_HTML_Tag):
    def __init__(self, indent_level=0):
        self.indent = indent_level

    def generateHTMLPrefix(self):
        self.prefix = self.indent*self.TAB + "<head>"

    def generateHTMLPostfix(self):
        self.postfix = self.indent*self.TAB + "</head>"

class TITLE_tag(Generic_HTML_Tag):
    def __init__(self, indent_level=0):
        self.indent = indent_level

    def generateHTMLPrefix(self):
        self.prefix = self.indent*self.TAB + "<title>"

    def generateHTMLPostfix(self):
        self.postfix = self.indent*self.TAB + "</title>"

class BODY_tag(Generic_HTML_Tag):
    def __init__(self, indent_level=0):
        self.indent = indent_level

    def generateHTMLPrefix(self):
        self.prefix = self.indent*self.TAB + "<body>"

    def generateHTMLPostfix(self):
        self.postfix = self.indent*self.TAB + "</body>"

class P_tag(Generic_HTML_Tag):
    def __init__(self, indent_level=0):
        self.indent = indent_level

    def generateHTMLPrefix(self):
        self.prefix = self.indent*self.TAB + "<p>"

    def generateHTMLPostfix(self):
        self.postfix = self.indent*self.TAB + "</p>"


Great! Obviously this is quite basic and incomplete, but the idea is there. The ‘generateHTMLPrefix() / Postfix() functions are the modifiers. You can add extra parameters and such here without crashing your code elsewhere. Also, you can add additional logic to, say, only add a specific attribute if the tag has it.

Ex: only a few tags have an onload attribute


Now that you have, let’s say, a file with your tree and node called and a file with your html markup classes called, let’s bring it all together:

import HTML_Tags
import HTMLTree

if __name__ == "__main__":

    html = DefTreeNode("html_tag", HTML_Tags.HTML_tag())
    head = DefTreeNode("head_tag", HTML_Tags.HEAD_tag())

    head.addChild(DefTreeNode("title_tag", HTML_Tags.TITLE_tag(), "basic title here!"))

    body = DefTreeNode("body_tag", HTML_Tags.BODY_tag())
    body.addChild(DefTreeNode("p_tag", HTML_Tags.P_tag(), "paragraph here"))

    htmltree = DefinitionTree(html)

    searchlabel = "p_tag"
    print 'Searching for a node labeled: ' + searchlabel

    node = htmltree.findNode(searchlabel)
        print "\n\n" + node.getPrefix() + node.getContents() + node.getPostfix()
        print '\n\nNode with label \"' + searchlabel + '\" not found!'

    print '\n\nTree Traversal:\n'

    print htmltree.traverse()

The output should look similar to this:

Searching for a node labeled: p_tag

<p>paragraph here</p>

Tree Traversal:

<html><head><title>basic title here!s_static</title></head><body><p>paragraph here</p></body></html>

Great! You dynamically generated HTML in an extensible way!

Startup Series: Episode 1


The distant memory that is my last post is not due to me forgetting about this beautiful little bog. I am in the process of starting a new company and, between that and my thesis work, I have not had much time to work on new posts. But alas I am back  – and with an exciting new series that chronicles my journey through the thrilling roller coaster that is the start-up experience!

I have some planned episodes (including this one) and have reserved room for additional episodes as time, and challenges, go by. This series will address issues, main concepts, and directions that I have encountered while running my start-up. Also, for those who are interested, my new company is called R-Gauge Metrics Inc.

Along the way I will try provide some reading material that has helped me with my company. Maybe they will help with yours also.

All-righty then! Lets get started!


Episode 1: Inception

Inception: The establishment or starting point of something; the beginning.

I, and I’m sure you have too, have met some people while cruising through life who were extra-keen on starting their own company. This is usually exciting for me and I love to see such motivated individuals strive towards their goals. However, upon further inquiry, I sometimes notice that the person just wants to start their own company for the sake of starting their own company. Their drive is strong and their intentions are good, but without a clear problem to solve, or product to sell, they more resemble the captain of a ship that leaves harbour without a destination to go to: They are more likely to get lost and find trouble.

Introduce the idea. The idea is just that. It is your direction – your purpose. A business does not progress unless it has some goal to progress towards, and, no, making lots of money is not a satisfactory goal to launch a business (although we all hope that we will be able to make lots – we will need something more concrete).

For me, my idea started more than a year ago, and its form was very different than the form I eventually ran with. The original idea was quite simple. I thought many companies were not handling their online reputation well and I thought that I could help them with damage control and bad reputation mitigation. I cited the BP oil spill as an example to my friends while trying to convince them that this idea was the best thing ever:

“Just think. Some company, like BP, has a huge reputation drop due to them doing something – like spilling millions of gallons into the ocean. Now whenever I Google ‘BP’ I get all that bad reputation on the front page: bad for business. What if I was able to provide consulting and help mitigate their online reputation damage.”

It sounded good. I started doing my market research and also looked into who else was working in this industry. Turns out that there were quite a few people who thought the same as I did.

My peers were skeptical. It seemed like a lot of manual work that would only succeed if I did all the work myself (Not really automatable). Not scalable at all. I would only be able to take on as many clients as I could handle myself – or hire lots of people to handle the clients for me.

I decided more discussion and bouncing ideas off of people was in order. I did this for almost six months before the notion hit me: All these sites assume that you know what your reputation is like! How do you quantify it? If I know exactly what it is, I can compare reputations!

It took another six months to generate a rough prototype to see if this was actually possible. Turns out it was! I was on my way!


The key lesson I learned here was quite simple: Really spend the time to understand the problem that you are trying to solve and make sure that people care about solving that particular problem. Understand what your goals are and make sure to express what the value of your solution is in terms of your client: why should your client care in other words. Also, note that the idea is malleable. It can change to what the customers actually want. It doesn’t help to make the best product in the world if nobody wants it.

How do you know if people will use a solution to that particular problem? Well, you could ask them. The simple solution is usually the easiest. In my case, I asked anyone who owned a company or was in mid-to-upper management of any company I could find. I asked them if they would be interested in a tool like mine and carefully listened to their responses.

Things were looking really good. I felt that I had a great product that was relevant to people in my target market. They seemed quite interested in it. It was time to look into this whole “starting a company” thing.


But how…

Showing Up

Richard Branson talks at length in his book “Like a Virgin” about various topics in the business world. He addresses issues brought up by aspiring entrepreneurs and seasoned veterans in their journey to provide great products and services.

One of those points Richard addresses is the importance to simply show up. I remember reading that section and thinking that I would take this advice with a grain of salt. What if I am competing against some of the best in the world?

I was skeptical.

A few months passed and I received an email from a prominent financial institution; it detailed a contest where Canadian postsecondary students can submit an essay on what their vision of a responsible financial institution is.

I was intrigued.

I started thinking. I am not a financial institution expert or well-versed in what makes them responsible. All I could do was think of my own convictions. What did I think a financial institution that was responsible look like? It ended up looking like a simple essay with a list of suggestions – and it was. I was certain that I would not win, but I felt strongly about it.

I am surrounded by smart people all day. I would wager that most of them are far smarter than me, but I was the only one who entered the contest. All of them said something similar when I asked them if they would enter the competition: there would be people far smarter than them who would write something and win.

I definitely had those same thoughts, but instead of giving up before I had even written a single word I figured that I would at least try. I showed up.

I wrote something that I felt strongly about. Why wouldn’t I show people?

I placed second in the Canada-wide contest.

New York Magazine published an article in February of 2011 that covered research on just this topic. The studies that were referenced identified links in children that were told they were “smart” and their likelihood to try something that was not inherently natural for them. In general they found that children who were constantly praised for their intelligence were more likely to quit when things didn’t come naturally.

I am extending this idea to include self-deprecating mentalities in adults who believe themselves to be intelligent.

Intelligent adults will assess the situation and gauge their ability to succeed based on their own perception of their capabilities. The difference between the study with the children and my extension into the adult realm is that the children actually try before their failure is realized. The adults encounter their difficulty before attempting anything. The result is the same. Both groups do not complete the attempt.

This brings further reinforcement to the saying “you are your own worst enemy.”

I suppose I should be grateful for that mentality as it allows others, such as myself, to try and succeed. I cannot help but wonder, though, what breakthroughs might have happened if those people would actually try.


New York Magazine, How Not to Talk to Your Kids: The inverse power of praise,

Like a Virgin: Secrets They Won’t Teach You In Business School, Richard Branson,