Programming, Security, Privacy, Technical

A Study in Password Generation and Validation

For those who would rather just read the code: https://github.com/calebshortt/pwanalysis

[Updated  April 16, 2019] I have included further results in an update at the bottom of this post.

 

Introduction

A study.

On many a dreary evening, once the sun sets beyond the Pacific and leaves only the gold and pink flakes of cloud and sky, I take to my studies — in the literal sense. On my desktop I maintain a growing number of “Studies” that pique my interest and encourage further research. When time permits I dive into one and if such a study bears fruit it will find its way upon these pages.

This study continues my obsession with passwords.

Firstly, I must ask you a question: Have you ever read through a password dump?

I do not mean to ask if you perused the listing then used it in some attack, but did you truly “digest” the list of passwords one by one? Did you consider their meaning or guess on the context in which they were conjured within the mind of one unsuspecting human and in which immensely intimate feelings and thoughts often manifest themselves? For who chooses a password with the thought that this very string of characters, words, symbols, and what-have-you would one day be revealed in its entirety to the world?

“*Gasp!* Plaintext?”, you balk, “But this is 2019!”. Perhaps you should ask Facebook what year it is.

I have found that passwords often reveal much more about an individual than the crude challenge-answer mechanisms utilized for authentication. If a user’s digital identity (i.e. username, etc.) is their public-facing persona then their password is often their private form. It was through my inspection of these password lists that I encountered three thoughts:

  • “Password language” is a language all it’s own and it DOES have loosely established, unspoken, syntax rules.
  • The frequency distribution of characters in passwords could be quite different to a spoken language frequency distribution.
  • The relationship between the personality of a user and their password is non-random (sounds obvious)

Continue reading

Standard
Programming, Security, Privacy, Technical

ShillBot: A Study in Identifying Reddit Trolls Through Machine Learning

For those who would rather just read the code: https://github.com/calebshortt/shillbot

 

Introduction

We’ve all been there. You’re browsing Reddit and see a post that you’re passionate about. You click the comment box and reach for the keyboard — but hesitate. Reddit’s reputation precedes it. You type anyways and punch out your thoughts. Submit.

*Bliip*

A comment already? You click the icon and read the most disproportionately-voracious response to a comment about cats you have ever seen. What a jerk! But you’re not going to play that game, and view the author’s previous posts and comments. Through your review a trend of tactless comments and inflationary responses bubbles to the surface. They’re a troll. You promptly ignore the comment.

Continue reading

Standard
Programming, Technical, Uncategorized

Installing scikit-learn; Python Data Mining Library

Update: The instructions of this post are for Python 2.7. If you are using Python 3, the process is simplified. The instructions are here:

Starting with a Python 3.6 environment.

Assumptions (What I expect to already be installed):

  1. Install numpy: pip install numpy
  2. Install scipy: pip install scipy
  3. Install sklearn: pip install sklearn

Test installation by opening a python interpreter and importing sklearn:
python
import sklearn

If it successfully imports (no errors), then sklearn is installed correctly.

Continue reading

Standard
Programming, Technical

An Experiment on PasteBin

A while ago I was browsing the public pastes on PasteBin and I came across a few e-mail/password dumps from either malware or some hacker trying to make a name for himself.

As I perused the information, I was shocked to find usernames, emails, passwords, social security numbers, credit card numbers, and more in these dumps. I reported the posts as credit card info and SSNs are nothing to trifle with, but the thought lingered as to why they were public in the first place. There must be a way to automate the process of reporting these posts, I thought, usernames and especially passwords hold a very unique signature: at least one upper-case letter, at least one lower-case letter, at least one digit, and at least 8 characters long.

How many words in the english language have that particular combination?

Continue reading

Standard
Programming, Technical

Auto-generate HTML using a tree in Python

Here’s something I’ve been working on and thought was interesting.

I needed to dynamically generate HTML with varying degrees of nesting and attributes in Python. All I found was a few Stack Overflow questions on generating HTML Рand some blogs talking about hard-coding the tags.

This is far from what I needed, so I started to look into making my own – besides, it looked fun!

I started by looking at what I actually wanted:

  • Needs to be easily extensible
  • Needs to be structured in a way that makes sense

With these requirements in mind, I elected to go the tree approach. HTML works much like a hierarchy: since the <html> tag is the root node and the <head> and <body> nodes are children of <html> and so on. Obviously it would be an n-ary tree as a node can have lots of children – or none.

Also, I would have to set up a specific tree traversal that would execute commands pre and post traversal – Namely printing out the beginning and ending tags.

This is what I came up with for the tree structure:


class DefTreeNode(object):
    '''
        A Protocol Definition Tree Node.
        It will contain a single payload (ex: a tag in HTML),
        a list of child nodes, a label, and what to do prefix
        and postfix during traversal.

        The payload is of type 'Generic_HTML_Tag'

        The label is a unique string that identifies the node.
        Both the label and the payload are required to initialize
        a node.
    '''

    def __init__(self, label, payload, contents=""):
        self.children = []
        self.payload = payload
        self.label = label
        self.contents = contents

    def addChild(self, child):
        if child:
            self.children.append(child)
            return True
        return False

    def setPayload(self, payload):
        if payload:
            self.payload = payload
            return True
        return False

    def setLabel(self, label):
        self.label = label

    def setContents(self, contents):
        self.contents = contents

    def getChildren(self):
        return self.children

    def getPayload(self):
        return self.payload

    def getLabel(self):
        return self.label

    def getPrefix(self):
        if self.payload:
            return self.payload.getPrefix()
        else:
            return None

    def getPostfix(self):
        if self.payload:
            return self.payload.getPostfix()
        else:
            return None

    def getContents(self):
        return self.contents

Note that the payload is not a string but (as the class descriptor comment says) is of type ‘Generic_HTML_Tag’. We will get to that in a second. Let’s finish the tree structure first.

Now that I have a node to work with, let’s make the tree. The tree class will contain the traversal code and a “find node” function, and it will hold the root node for the structure:


class DefinitionTree(object):
    def __init__(self, node):
        self.root = node

    def getRoot(self):
        return self.root

    def findNode(self, label):
        return self.recursive_findNode(label, self.root)

    def traverse(self):
        if self.root:
            return self.recursive_traverse(self.root)
        else:
            return ""

    def recursive_traverse(self, node, construction=""):
        '''
            Traverse the tree.
            This algorithm will run a pre-order traversal. When
            the algorithm encounters a new node it immediately
            calls its 'prefix' function, it then appends the
            node's payload, and traverses the node's children.
            After visiting the node's children, its 'postfix'
            function is called and the traversal for this node
            it complete.

            Returns a complete construction of the nodes' prefix,
            content, and postfix in a pre-order traversal.
        '''

        if(node):
            construction = construction + node.getPrefix() + node.getContents()

            for child in node.getChildren():
                construction = self.recursive_traverse(child, construction)

            return construction + node.getPostfix()

    def recursive_findNode(self, label, node):
    '''
        Executes a search of the tree to find the node
        with the specified label. This algorithm finds the first
        label that matches the search and is case insensitive.

        Returns the node with the specified label or None.
    '''

        if(node is None or node.getLabel().lower() == label.lower()):
            return node

        for child in node.getChildren():
            node = self.recursive_findNode(label, child)
            if node is not None:
                return node


Now, let's create the payload for the nodes. This will be done by creating a file to store our 'html tag classes':

 


class Generic_HTML_Tag(object):
    '''
        A Generic HTML tag class
        This class should not be called directly, but contains 
        the information needed to create HTML tag subclasses
    '''

    def __init__(self):
        self.prefix = ""
        self.postfix = ""
        self.indent = 0
        self.TAB = " "

    def getPrefix(self):
        return self.prefix

    def getPostfix(self):
        return self.postfix

    def setPrefix(self, prefix):
        self.prefix = prefix

    def setPostFix(self, postfix):
        self.postfix = postfix


class HTML_tag(Generic_HTML_Tag):
    def __init__(self, indent_level=0):
        Generic_HTML_Tag.__init__(self)
        self.indent = indent_level
        self.generateHTMLPrefix()
        self.generateHTMLPostfix()

    def generateHTMLPrefix(self):
        self.prefix = self.indent*self.TAB + "<html>"

    def generateHTMLPostfix(self):
        self.postfix = self.indent*self.TAB + "</html>"

class HEAD_tag(Generic_HTML_Tag):
    def __init__(self, indent_level=0):
        Generic_HTML_Tag.__init__(self)
        self.indent = indent_level
        self.generateHTMLPrefix()
        self.generateHTMLPostfix()

    def generateHTMLPrefix(self):
        self.prefix = self.indent*self.TAB + "<head>"

    def generateHTMLPostfix(self):
        self.postfix = self.indent*self.TAB + "</head>"

class TITLE_tag(Generic_HTML_Tag):
    def __init__(self, indent_level=0):
        Generic_HTML_Tag.__init__(self)
        self.indent = indent_level
        self.generateHTMLPrefix()
        self.generateHTMLPostfix()

    def generateHTMLPrefix(self):
        self.prefix = self.indent*self.TAB + "<title>"

    def generateHTMLPostfix(self):
        self.postfix = self.indent*self.TAB + "</title>"

class BODY_tag(Generic_HTML_Tag):
    def __init__(self, indent_level=0):
        Generic_HTML_Tag.__init__(self)
        self.indent = indent_level
        self.generateHTMLPrefix()
        self.generateHTMLPostfix()

    def generateHTMLPrefix(self):
        self.prefix = self.indent*self.TAB + "<body>"

    def generateHTMLPostfix(self):
        self.postfix = self.indent*self.TAB + "</body>"

class P_tag(Generic_HTML_Tag):
    def __init__(self, indent_level=0):
        Generic_HTML_Tag.__init__(self)
        self.indent = indent_level
        self.generateHTMLPrefix()
        self.generateHTMLPostfix()

    def generateHTMLPrefix(self):
        self.prefix = self.indent*self.TAB + "<p>"

    def generateHTMLPostfix(self):
        self.postfix = self.indent*self.TAB + "</p>"

 

Great! Obviously this is quite basic and incomplete, but the idea is there. The ‘generateHTMLPrefix() / Postfix() functions are the modifiers. You can add extra parameters and such here without crashing your code elsewhere. Also, you can add additional logic to, say, only add a specific attribute if the tag has it.

Ex: only a few tags have an onload attribute

 

Now that you have, let’s say, a file with your tree and node called HTMLTree.py and a file with your html markup classes called HTML_Tags.py, let’s bring it all together:

import HTML_Tags
import HTMLTree


if __name__ == "__main__":

    html = DefTreeNode("html_tag", HTML_Tags.HTML_tag())
    head = DefTreeNode("head_tag", HTML_Tags.HEAD_tag())

    head.addChild(DefTreeNode("title_tag", HTML_Tags.TITLE_tag(), "basic title here!"))
    html.addChild(head)

    body = DefTreeNode("body_tag", HTML_Tags.BODY_tag())
    body.addChild(DefTreeNode("p_tag", HTML_Tags.P_tag(), "paragraph here"))
    html.addChild(body)

    htmltree = DefinitionTree(html)

    searchlabel = "p_tag"
    print 'Searching for a node labeled: ' + searchlabel

    node = htmltree.findNode(searchlabel)
    if(node):
        print "\n\n" + node.getPrefix() + node.getContents() + node.getPostfix()
    else:
        print '\n\nNode with label \"' + searchlabel + '\" not found!'

    print '\n\nTree Traversal:\n'

    print htmltree.traverse()

The output should look similar to this:

Searching for a node labeled: p_tag


<p>paragraph here</p>


Tree Traversal:

<html><head><title>basic title here!s_static</title></head><body><p>paragraph here</p></body></html>

Great! You dynamically generated HTML in an extensible way!

Standard