Here’s something I’ve been working on and thought was interesting.
I needed to dynamically generate HTML with varying degrees of nesting and attributes in Python. All I found was a few Stack Overflow questions on generating HTML – and some blogs talking about hard-coding the tags.
This is far from what I needed, so I started to look into making my own – besides, it looked fun!
I started by looking at what I actually wanted:
- Needs to be easily extensible
- Needs to be structured in a way that makes sense
With these requirements in mind, I elected to go the tree approach. HTML works much like a hierarchy: since the <html> tag is the root node and the <head> and <body> nodes are children of <html> and so on. Obviously it would be an n-ary tree as a node can have lots of children – or none.
Also, I would have to set up a specific tree traversal that would execute commands pre and post traversal – Namely printing out the beginning and ending tags.
This is what I came up with for the tree structure:
class DefTreeNode(object):
'''
A Protocol Definition Tree Node.
It will contain a single payload (ex: a tag in HTML),
a list of child nodes, a label, and what to do prefix
and postfix during traversal.
The payload is of type 'Generic_HTML_Tag'
The label is a unique string that identifies the node.
Both the label and the payload are required to initialize
a node.
'''
def __init__(self, label, payload, contents=""):
self.children = []
self.payload = payload
self.label = label
self.contents = contents
def addChild(self, child):
if child:
self.children.append(child)
return True
return False
def setPayload(self, payload):
if payload:
self.payload = payload
return True
return False
def setLabel(self, label):
self.label = label
def setContents(self, contents):
self.contents = contents
def getChildren(self):
return self.children
def getPayload(self):
return self.payload
def getLabel(self):
return self.label
def getPrefix(self):
if self.payload:
return self.payload.getPrefix()
else:
return None
def getPostfix(self):
if self.payload:
return self.payload.getPostfix()
else:
return None
def getContents(self):
return self.contents
Note that the payload is not a string but (as the class descriptor comment says) is of type ‘Generic_HTML_Tag’. We will get to that in a second. Let’s finish the tree structure first.
Now that I have a node to work with, let’s make the tree. The tree class will contain the traversal code and a “find node” function, and it will hold the root node for the structure:
class DefinitionTree(object):
def __init__(self, node):
self.root = node
def getRoot(self):
return self.root
def findNode(self, label):
return self.recursive_findNode(label, self.root)
def traverse(self):
if self.root:
return self.recursive_traverse(self.root)
else:
return ""
def recursive_traverse(self, node, construction=""):
'''
Traverse the tree.
This algorithm will run a pre-order traversal. When
the algorithm encounters a new node it immediately
calls its 'prefix' function, it then appends the
node's payload, and traverses the node's children.
After visiting the node's children, its 'postfix'
function is called and the traversal for this node
it complete.
Returns a complete construction of the nodes' prefix,
content, and postfix in a pre-order traversal.
'''
if(node):
construction = construction + node.getPrefix() + node.getContents()
for child in node.getChildren():
construction = self.recursive_traverse(child, construction)
return construction + node.getPostfix()
def recursive_findNode(self, label, node):
'''
Executes a search of the tree to find the node
with the specified label. This algorithm finds the first
label that matches the search and is case insensitive.
Returns the node with the specified label or None.
'''
if(node is None or node.getLabel().lower() == label.lower()):
return node
for child in node.getChildren():
node = self.recursive_findNode(label, child)
if node is not None:
return node
Now, let's create the payload for the nodes. This will be done by creating a file to store our 'html tag classes':
class Generic_HTML_Tag(object):
'''
A Generic HTML tag class
This class should not be called directly, but contains
the information needed to create HTML tag subclasses
'''
def __init__(self):
self.prefix = ""
self.postfix = ""
self.indent = 0
self.TAB = " "
def getPrefix(self):
return self.prefix
def getPostfix(self):
return self.postfix
def setPrefix(self, prefix):
self.prefix = prefix
def setPostFix(self, postfix):
self.postfix = postfix
class HTML_tag(Generic_HTML_Tag):
def __init__(self, indent_level=0):
Generic_HTML_Tag.__init__(self)
self.indent = indent_level
self.generateHTMLPrefix()
self.generateHTMLPostfix()
def generateHTMLPrefix(self):
self.prefix = self.indent*self.TAB + "<html>"
def generateHTMLPostfix(self):
self.postfix = self.indent*self.TAB + "</html>"
class HEAD_tag(Generic_HTML_Tag):
def __init__(self, indent_level=0):
Generic_HTML_Tag.__init__(self)
self.indent = indent_level
self.generateHTMLPrefix()
self.generateHTMLPostfix()
def generateHTMLPrefix(self):
self.prefix = self.indent*self.TAB + "<head>"
def generateHTMLPostfix(self):
self.postfix = self.indent*self.TAB + "</head>"
class TITLE_tag(Generic_HTML_Tag):
def __init__(self, indent_level=0):
Generic_HTML_Tag.__init__(self)
self.indent = indent_level
self.generateHTMLPrefix()
self.generateHTMLPostfix()
def generateHTMLPrefix(self):
self.prefix = self.indent*self.TAB + "<title>"
def generateHTMLPostfix(self):
self.postfix = self.indent*self.TAB + "</title>"
class BODY_tag(Generic_HTML_Tag):
def __init__(self, indent_level=0):
Generic_HTML_Tag.__init__(self)
self.indent = indent_level
self.generateHTMLPrefix()
self.generateHTMLPostfix()
def generateHTMLPrefix(self):
self.prefix = self.indent*self.TAB + "<body>"
def generateHTMLPostfix(self):
self.postfix = self.indent*self.TAB + "</body>"
class P_tag(Generic_HTML_Tag):
def __init__(self, indent_level=0):
Generic_HTML_Tag.__init__(self)
self.indent = indent_level
self.generateHTMLPrefix()
self.generateHTMLPostfix()
def generateHTMLPrefix(self):
self.prefix = self.indent*self.TAB + "<p>"
def generateHTMLPostfix(self):
self.postfix = self.indent*self.TAB + "</p>"
Great! Obviously this is quite basic and incomplete, but the idea is there. The ‘generateHTMLPrefix() / Postfix() functions are the modifiers. You can add extra parameters and such here without crashing your code elsewhere. Also, you can add additional logic to, say, only add a specific attribute if the tag has it.
Ex: only a few tags have an onload attribute
Now that you have, let’s say, a file with your tree and node called HTMLTree.py and a file with your html markup classes called HTML_Tags.py, let’s bring it all together:
import HTML_Tags
import HTMLTree
if __name__ == "__main__":
html = DefTreeNode("html_tag", HTML_Tags.HTML_tag())
head = DefTreeNode("head_tag", HTML_Tags.HEAD_tag())
head.addChild(DefTreeNode("title_tag", HTML_Tags.TITLE_tag(), "basic title here!"))
html.addChild(head)
body = DefTreeNode("body_tag", HTML_Tags.BODY_tag())
body.addChild(DefTreeNode("p_tag", HTML_Tags.P_tag(), "paragraph here"))
html.addChild(body)
htmltree = DefinitionTree(html)
searchlabel = "p_tag"
print 'Searching for a node labeled: ' + searchlabel
node = htmltree.findNode(searchlabel)
if(node):
print "\n\n" + node.getPrefix() + node.getContents() + node.getPostfix()
else:
print '\n\nNode with label \"' + searchlabel + '\" not found!'
print '\n\nTree Traversal:\n'
print htmltree.traverse()
The output should look similar to this:
Searching for a node labeled: p_tag
<p>paragraph here</p>
Tree Traversal:
<html><head><title>basic title here!s_static</title></head><body><p>paragraph here</p></body></html>
Great! You dynamically generated HTML in an extensible way!