Dictionary and Brute Force Attacks

Dictionary Attacks

A dictionary attack is a methodology of password cracking that involves entering “words from a rundown as passwords to access a system, account, or encrypted document.” (Ohri, 2021) While a dictionary attack does not technically have to be used nefariously, as it can be used to gain access to your own system if you cannot get your password, it is commonly used to gain unauthorized access to a system which is, of course, unethical. In these situations there is the person attempting the attack, called the attacker, and the person whose machine or credentials the attack is being leveraged against, or the victim. These attacks can be done offline, in which the speed at which one can attack a system depends on the processing power of the attacker’s machine, or online, in which the speed depends on either the processing power of the attacker’s machine, or the limit of how often one can attempt to input a password on the victim machine. The determinant speed will be whichever is slower.

The maximum number of attempts an attacker may have to make is simply the length of the dictionary. The maximum possible time an attack may take is the length of the dictionary divided by the number of attempts that can be made over a given time period.

Brute Force Attacks

While a dictionary attack is indeed a type of brute force attack, a brute force attack is not necessarily a dictionary attack. What I mean by this is, a brute force attack is a methodology of password cracking in which an attacker tries different combinations of characters to attempt guessing a password or hash. A pure brute force attack uses “every possible combination of numbers [and] alphabets to guess the passwords.” (Tweak Library Team, 2020) A dictionary attack doesn’t use every possible combination of characters, but instead focuses the chosen passwords to a list of likely words or phrases.

To give an example of a pure brute force attack, let’s imagine there is a three character passphrase that is composed only of numerical values. Let’s imagine the passphrase is 723. A brute force attack would start with trying 000, then 001, then 002, all the way up to 723. You can find the maximum possible number of attempts it will take to perform a brute force attack by taking the total number of possible characters (in this case it is the digits 0-9, for a total of ten possible characters) and raise that number to the power of x, where x is the number of places these characters can be tried. In this case there are three places, for a total of 10³ = 1,000 possible combinations. You can find the maximum possible time by dividing this value by the number of attempts an attacker can make during a given time value, then multiplying by said time value. If, in our previous example, the attacker can make 5 attempts every minute, then the longest amount of time it can possibly take to find a password is 1000 * 1 minute/5 = 200 minutes. Given the exponential nature of how many possible combinations one may have to try when running a brute force attack, it should usually be a last resort. While the algorithm an attacker runs can be fine tuned to a specific password requirement, (in our example, we didn’t need to try any alphabet characters because the password only took numerical characters) this exponential nature can cause brute force attacks to take quite a long time to find a password. The upside is that, in theory, a brute force attack can find any password, so long as there are no limits to the number of attempts an attacker can make on the victim machine.

Dictionary Attack Procedure

Before launching a dictionary attack, an attacker will want to organize their dictionary. To begin with, an attacker should look into the infrastructure of the machine or software they will be attacking. What an attacker should be looking for are password requirements and constraints. Any password requirements and constraints can help define what should be included or not included in the dictionary. For example, if a password is required to be five-characters long, and constrained to only alphanumeric characters, the attack should make sure the dictionary includes any password of five or greater characters and those passwords that contain alphanumeric characters while also removing any password with less than five characters and passwords made up of only symbols or those made up of both symbols and alphanumerics. This is to tailor the password dictionary as finely as possible, to reduce our maximum possible time to attack.

The next step before launching the attack is to put the most likely passwords up top. If your dictionary runs from top to bottom, then by placing the most likely passwords at the top, you are more likely to have a faster total time of attack.

Once the attack begins, the program should grab a password from the attacker’s dictionary and attempt to use it on the victim’s machine. If the passwords are hashed on the victim’s machine, then the attacker should hash the dictionary value before attempting it. In an example infrastructure that hashes a password three times, the attacker will take their dictionary value, iteratively hash it three times using the same hash algorithm, then see if the password matches the one stored on the machine.

Considerations

Once you begin your attack, all an attacker can do is wait. The program should systematically go through the dictionary and attempt each password. A check may want to be programmed to only try a certain number of passwords every minute or hour to both avoid obvious detection by a security team or program, and to keep the system from locking up. Additionally, if performing an online attack, the attacker should try to sync up their attack speed with the speed in which a website or system can process an attempted login. If your algorithm doesn’t wait for a password to be successfully attempted in an online machine, your program may think it is attempting passwords when it is really only trying one password every few seconds and skips all the inbetween ones. For example, if a system can only attempt one login every minute, if your program runs one password every second and doesn’t check to make sure that login attempt has finished, then it will only try one out of every sixty passwords.

If a victim's machines use salting or hashing, then the attacker must account for this. Using multiple hashing algorithms can mean having to hash the dictionary value multiple times before seeing if it is included in a list of passwords. If passwords are salted, then an attacker must try all dictionary words for each salt value. This can increase the amount of time it takes to fully attempt each dictionary value.

References

Dictionary attack: A beginner's guide in 5 easy points. Jigsaw Academy. (2021, January 4). Retrieved April 3, 2022, from https://www.jigsawacademy.com/blogs/cyber-security/dictionary-attack/

Tweak Library Team. (2020, August 26). Difference between brute force & dictionary attack. Tweak Library. Retrieved April 3, 2022, from https://tweaklibrary.com/difference-between-brute-force-dictionary-attack/

Open-Source Digital Forensics Software

This paper will discuss four important open-source tools used in digital forensics and incident response investigations. The tools covered will be: Nmap/Zenmap, NetworkMiner, Dumpzilla, and Ophcrack.

Nmap/Zenmap

Nmap (and its GUI counterpart, Zenmap) are active network enumeration tools. They allow you to send specially crafted packets to network devices and listen for the responses. Based on these responses, the tools will determine basic network enumeration information, like OS guessing, open ports, and even open services. Nmap is the CLI version, and Zenmap is the GUI version. I personally prefer command line, because I believe it is faster, but many people are more comfortable on a GUI than in CLI. When testing Nmap on Google’s DNS server, it comes back with the open ports and some service fingerprinting to let us know more about the machines we are enumerating. I was able to see that the DNS server has port 53 and 443 up, however other ports may have been blocked by the firewall from replying to my nmap scan. This tool can be used in investigations to help enumerate a network, but keep in mind that it can often be noisy and bring unwanted attention to yourself.

I used the option: -sCV to specify that I want default scripts to run and I want to do service enumeration. There are a lot of options you can use, some of which are more aggressive and will get you more information, others that are slow and stealthy to avoid detection.

Zenmap is the GUI counterpart to nmap, performing the same functionality only in a graphical interface. The interface is simple to use, and uses common parlance for the types of scans you want to do, while also showing you the command that nmap will run in the backend. This isn’t a bad way to learn different kinds of scans and how they interact. The output will look something like this:

NetworkMiner

Sometimes, active enumeration is not an option. Actively touching a network can negatively affect network performance and sometimes even bring down machines. This is where NetworkMiner comes in. NetworkMiner is a passive network sniffer used to detect OS, hostname, and other features of network machines. After running it on my machine and surfing the web for a short while, these were my results:

Where WireShark focuses on network traffic, making that the forefront of what a user sees, NetworkMiner seems to focus more on enumeration. When I look my results, they are contained in the IP that interacted with/on my network. I expanded the IP of my device and saw that it guessed Windows, which is correct. It probably does this based on the TTL of the packets. Although I would not use this tool for real-world network diagnosis, it may be another tool in your arsenal for enumerating a network and putting together a more detailed network map.

Dumpzilla

Dumpzilla is a tool used to scrape traffic information from Firefox browser (along with other, lesser used browsers.) This is an incredibly useful, and dangerous, tool if it can get on a victim machine. I ran the tool and got results from an older profile of Firefox I used to use in 2021:

It is kind of scary seeing that a tool can extract exactly what I put into a search bar on my browser, but thinking from a red team perspective, it can serve extremely useful during a penetration test to try and find possible sites to use for a watering hole attack. In terms of Incident Response, this tool allows an investigator to quickly gather browsing data from a host machine. A script can be written to download the tool onto the machine, run the script on all Firefox profiles on the machine, transfer the results back to the investigator, and then remove the tool from the machine.

Ophcrack

Ophcrack is a free Windows password cracker that uses rainbow tables to accomplish its goals. Rainbow tables are basically precomputed tables storing hashed strings. This can be checked against password hashes to find a result. One downside of rainbow tables is that quite a large amount of storage is usually needed for them. When you download the program, you also need to download the tables you want. To emphasize my point before about storage size, one table I saw was around 3 Gb, which isn’t exactly small for a password cracker. After running the tool with the XP free fast rainbow table, it couldn’t crack the “test” password I provided it:

This is quite disappointing, but I believe with more rainbow tables and proper password hashes, this tool could be useful for getting into a computer.

Data Hiding Techniques

Data hiding is a method of doing exactly what the name suggests, hiding sensitive data. This can be used to watermark images, hide exploitative code, and transfer secret messages. This paper will cover 4 techniques used for data hiding, as well as the tools employed in these techniques.

First, we will cover Steganography. Steganography is “The practice of hiding a secret message inside of… something that is not secret. (Stanger, 2020) One of the core focuses of steganography is to “focus on the imperceptivity of both the hidden data and the act of data embedding.” (Shi et al., 2016) In other words, not only is it important to successfully conceal data in another cover medium, but the integrity of that medium must not be noticeably deteriorated. If you choose to hide data within an image, a human must not be able to notice that the image quality has decreased. This technique is commonly used with images to embed an invisible watermark in them. If somebody were to steal the image, this could be proven by looking at where this watermark data is hidden.

There is an online steganography tool (https://stylesuxx.github.io/steganography/) that can be used to freely hide messages within images.

As you can see above, it is as easy as selecting an image and typing a message. The two images (before and after steganography) are below, and there is no noticeable difference. We can send this photograph to somebody else, who can then use the decoding function to view our original message.

Another method of data hiding is code obfuscation. When you write malicious code, you probably do not want it to be reverse engineered, especially by the blue team. By obfuscating your code, you make it incredibly difficult to understand, by adding an overwhelming amount of redundancy. For example, instead of writing a function that prints out “Hello World”, you can write a function that calls out to some random internet page with the word “Hello” in the HTML code. Rip that word out from the HTML and store it into a variable, then display that variable followed by the word “World.” Once run, both functions would accomplish the same task, but one of them is more difficult to understand, especially by a human reader. There is a free online JavaScript obfuscation tool that can employe this very method (https://obfuscator.io/). Below, we can see the stark difference between a simple Hello World script, and that same code once it has been obfuscated.

The third method of data hiding is bit shifting. This is when you shift the bits of data to make readable material look like gibberish. You can perform this on almost any medium, as to get your original data back you only need to shift the bits back to their original position. This can be used with code to make it appear as a binary file or some other oddly-formatted file. Because the computer analyzes the hex values of the file, you likely will not be able to run a bit-shifted piece of code until you reverse the bit-shift process. The online tool Dcode (https://www.dcode.fr/circular-bit-shift) will allow us to bit-shift a message and make it literally unreadable.

The last technique covered will be hiding data in bad blocks. When a computer looks for places to store information, it knows not to looked at marked bad blocks. These are “blocks [that] have (supposedly) gone bad.” (Verhasselt, 2009) These locations will not be looked at by the file system. If we tell the filesystem which blocks are bad (even if they aren’t), we can hide data there and the filesystem will never check it. This is more useful on older machines, as this technique is a bit out-dated, but most valuable infrastructure is on dated machines. Creating and using bad blocks is a straightforward, but fairly technical process. If you want to see an in-depth example on this technique, see the following blog post: https://davidverhasselt.com/hide-data-in-bad-blocks/

References

dCode. (2022). Circular bit shift. Online Decoder, Encoder, Solver, Translator. Retrieved February 14, 2022, from https://www.dcode.fr/circular-bit-shift

Kachalov, T. (n.d.). JavaScript obfuscator tool. JavaScript Obfuscator Tool. Retrieved February 13, 2022, from https://obfuscator.io/

Shi, Y.-Q., Li, X., Zhang, X., Wu, H.-T., & Ma, B. (2016). Reversible data hiding: Advances in the past two decades. IEEE Access, 4, 3210–3237. https://doi.org/10.1109/access.2016.2573308

Stanger, J. (2020, July 6). The ancient practice of steganography: What is it, how is it used and why do cybersecurity pros need to understand it. Default. Retrieved February 13, 2022, from https://www.comptia.org/blog/what-is-steganography

stylesuxx@gmail.com. (n.d.). Steganography Online. Steganography online. Retrieved February 14, 2022, from https://stylesuxx.github.io/steganography/

Verhasselt, D. (2009, April 22). Hide data in bad blocks. Retrieved February 14, 2022, from https://davidverhasselt.com/hide-data-in-bad-blocks/

Analyzing a USB Image with Autopsy

The Sleuth Kit is “a collection of command line tools and a C library that allows you to analyze disk images and recover files from them.” (sleuthkit.org) Autopsy is the GUI frontend that runs on TSK (The Sleuth Kit) backend. In a nutshell, Autopsy allows you to do digital forensics investigations on device images.

When you start the program, you can create a new case. Cases are the individual investigation that you, or your organization, are currently a part of. An investigation will usually have a name, contact information of the investigators working on it, and a central location for the investigation files to sit. Once you create a new case, you can select the Host to be auto-generated, choose the “Disk Image or VM File” Data Source Type, provide the path to your USB image, then load your image into Autopsy, where it will begin analyzing it.

Now that the image file is loaded into Autopsy, you may start exploring it. You can look into each part of the USB image, including FAT files, and see the results in either text or hex. This is brilliant for being able to find data hidden in the parts of the USB drive that a normal file explorer would not allow you to see. An example of this is orphan files. These orphan files are files that are deleted and no longer in the parent folder. Autopsy will automatically analyze the image when you provide it and locate any orphan files. A practical example of when this may be useful would be if the person you are investigating deletes files but doesn’t go any further than that to hide their traces. Autopsy will locate most or all of the deleted remnants of these files and still allow you to view them.

A common method of anti-forensics is steganography. This is where you hide data within data, such as hiding text information within an image file. Because Autopsy has a Hex examiner, you can look through the hex or plain-text of a file to find information that may be hidden within it. Autopsy can also be configured to use Google or Bing to translate text, meaning that Autopsy can be used when investigating nation-state events.

Autopsy’s Application tab will display a file in what appears to be it’s native state. For example, if you find a JPEG file on the USB drive, you can view the image as is, instead of just analyzing hex. It will also display videos and HTML files. You can also view the file’s metadata easily, which is very useful for finding timestamps.

When I loaded up a peer’s USB drive image onto Autopsy, I went exploring into the file structure. I found images, which I could view as the image and as hex. The headers of the text results allowed me to verify it was a JPEG file. Additionally, I found an MP3 file that I could actually play on my machine! Listening to a jazz song while analyzing a device image isn’t a bad combination.

One last thing I will touch on, when you explore the file system, you can view change times and access times for each file. During an investigation, this may be important for attribution reasons. If you know that a crime occurred during a particular time window, you can better attribute that crime to a suspect. By being provided access and change times, you can attribute an event to the user of the device during that time.

References

Open Source Digital Forensics. The Sleuth Kit (TSK) & Autopsy: Open Source Digital Forensics Tools. (n.d.). Retrieved February 5, 2022, from https://www.sleuthkit.org/

Device Imaging with dd

In this tutorial, I will explain what dd is and how to use it to create an image of a device. Let’s start with explaining what an image is, and why you may want to create one. An image is just a “comprehensive duplicate of electronic media such as a hard-disk drive.” (Goldstein, 2019) Comprehensive duplicate means that it exactly copies a device, being a USB-drive, hard-drive, floppy disk, etc, bit-for-bit. Images are used in virtualization when you want to run an Operating System from a predetermined state, and in digital forensics investigations to ensure that all work is done on an image, so as to not accidentally taint the integrity of an original piece of evidence.

The dd command in Linux (*nix) is “a command-line utility for Unix and Unix-like operating systems whose primary purpose is to convert and copy files.” (GeeksforGeeks, 2019) In other words, dd is a program that can be run in a Linux or Unix terminal that can easily make an image of a device medium, such as a USB drive. dd is built into *nix machines so you don’t have to worry about installing it.

Let’s begin by looking at the manual file and learning the syntax. You can read the man file of most binaries by invoking man <binary>. For dd, you can read it with man dd.

We can see that dd is used to convert and copy files, and you can manually adjust some of the operands. The general syntax is dd [OPERAND]. The only operands we will worry about for this tutorial are if (infile), of (outfile), and status. The infile is the device you want to make an image of, the outfile is where you want this image to be safed, and status is used to provide a given level of information to the user.

When you want to create an image of a usb drive, we must first locate the usb drive using lsblk. This command “lists information about all available devices.” (Broz, Zak 2021)

We can see a couple devices, but we will focus on sdb. This is the second disk drive, and will likely be the usb-stick you want to image. To double check that this is the drive you want to image, you can look at the MOUNTPOINT for sdb1 and see that it is mounted at /media/ubuntu/003B-5B4A. Just list the files in that location using ls and make sure it is the usb-stick you want to image.

Next, we will invoke the dd command. The syntax we will use will be as follows:

sudo dd if=/dev/sdb of=~/Desktop/imageOfUSB status=progress

Above, we specify that we want the infile, or device we want to image, to be the USB stick drive. The outfile will save the image in our current user’s desktop, called imageOfUSB, and the status line states that we want to see the progress of the image. The sudo at the beginning means that we will run the command as superuser, which is necessary for the dd program. When the program completes the copy, the output should look like this:

We can use hashes to ensure that both the image and device are exactly the same, bit-for-bit. A hash takes any input and outputs a string of alphanumeric characters. These characters are found with a very complex algorithm. It ensures that if any data changes, even if one bit is flipped, the output of the hash algorithm will look vastly different. We can see below that the image and the device have the same hash results:

Now, we will put the image on an empty usb stick. To ensure it is empty, I will explain how to format (clean) a usb-stick. First, open the Disks utility on Linux. The screen should look something like this:

Next, select the usb stick. Click the cogwheels on the volume you want to format, then click Format Partition. You can name the volume, but make sure you select Erase to erase everything on that usb stick. Click Next then click Format. It shouldn’t take too long for small drives, but if you have anything over 1GB, it will likely take a few minutes. Anything over 10GB may take an hour or longer on a virtual machine. Anything over 1TB may take a day. Once it is formatted, we can run essentially the reverse of our previous command to put the image onto the usb drive.

sudo dd if=~/Desktop/imageOfUSB of=/dev/sdb status=progress

This may take a lot longer than the previous run, since usb write speeds are usually slower than the read speeds.

That is really all there is to making images and putting images onto devices.

References

Broz, M., & Zak, K. (2021, October 27). lsblk(8) -- Linux manual page. LSBLK(8) - linux manual page. Retrieved January 29, 2022, from https://man7.org/linux/man-pages/man8/lsblk.8.html

GeeksforGeeks. (2019, May 15). 'DD' command in linux. GeeksforGeeks. Retrieved January 29, 2022, from https://www.geeksforgeeks.org/dd-command-linux/#:~:text=dd%20is%20a%20command%2Dline,system%20just%20like%20normal%20files.

Goldstein, S. (2019, September 24). Two key differences between digital forensic imaging and digital forensic clone and how they can affect your legal case.: News: Capsicum: Digital Forensics, investigations, cyber security. CAPSICUM. Retrieved January 29, 2022, from https://capsicumgroup.com/2-key-differences-between-digital-forensic-imaging-and-digital-forensic-clone-and-how-they-can-affect-your-legal-case/#:~:text=A%20Forensic%20Image%20is%20a,as%20a%20hard%2Ddisk%20drive.&text=This%20exact%20duplicate%20of%20the,for%20analysis%20and%20evidence%20preservation.

Introduction to Linux Commands

While Windows may “dominate the game on home computers,” (Galov, 2021) the Linux operating system has over 95% of the market share for the world’s top 1 million web servers (Vaughan-Nichols, 2015). Linux is a popular operating system most often used in servers, and as the OS for Android phones. Like any OS, Linux has a command line interface, or CLI, that is usually used by system administrators. A CLI works by inputting commands and arguments. For Linux, the syntax of this input is command [arg1] [arg2].... This paper will go over some of the most important commands to know when using Linux. I will stay away from some of the more powerful commands, and will stick to beginner-friendly commands to get you more comfortable with Linux CLI.

Let’s begin by grouping and summarizing the commands I will explain in this paper. First, we have the get-out-of-jail-free commands. These include help, and man. These commands will assist users in figuring out how specific commands work, or just present a shortened list of commands that can be used at the present moment in the OS. Next is the traversal commands: pwd, cd. These commands are used to traverse your file directory. Finally, I will touch on file manipulation commands, including mkdir, rmdir, touch, vim/nano, and cat.

Let’s start with help. While help isn’t always available on every distribution of Linux, if you find yourself in a place with no idea what command to use, it is a good idea to try typing either help or ?. These commands on their own will sometimes give you a list of some basic commands you can use within that workspace. This is good if you are pentesting and you gain access to a device and have no idea what commands can be used. Additionally, sometimes using help [command] will give you a shortened description of how to use the command. The man command is arguably one of the most important commands, because it will reference the manual page for a given program or command. Whether you completely forget how to use a command, or want to figure out how to use a specific feature of that command, typing man [command] is always a good bet. Be sure to understand that man doesn’t have entries for every single command possible. If you install something from GitHub, it may not have a manual entry.

Once you know what commands you can use and how to use them, you may want to explore the system directory. This is where the following commands come in handy. First is pwd. This command stands for print working directory. It essentially just prints the full filename of your current working directory, or where you are in the filesystem. One thing to note is that everything is a file in Linux. Directories (the equivalent of folders in Windows OS) are files, just like how a text file is a file. That’s why the pwd command will print the full filename of where you are. If you ever find yourself lost in the filesystem, use this command to regain your bearings. After you know where you are, you can navigate to another file with cd. This stands for change directory. It allows you to navigate to another file, or directory, within the system. You can navigate with an explicit file, or implicitly. To navigate to an explicit file, you must type the file filename and location for that file, from the highest level all the way down to the actual file. This may look like /home/kali/Desktop/Important Documents. It is called explicit because you are explicitly referencing the directory you want to travel to, with no regard for your current location. Implicit file traversal uses your current location to find where you want to go. You can reference parent directories with <../> and the current directory with <./>. If you want to travel to a directory called “Dir” that is two parent directories above your current location, you can do so with cd ../../Dir.

Now, I expect that you have found the directory you wished to visit. How can you create a new directory to store files in? Of course, with the mkdir command, which stands for make directories. This command does exactly that, it makes a directory either in your current location, or with the given explicit/implicit location. It only creates the directory if one by that name and in that location does not already exist. You can also set the mode for the created directory, which affects who can read, write, and execute files within the directory. Further, you can remove directories with rmdir, however they must be empty. There are certain flags within the rm command that allows you to recursively delete a non-empty directory, but I will not be writing about that in this paper.

You can create empty files with the touch command. The official use of this command is to change file timestamps, but the average Linux user will use this to create empty files. To edit a file on a Linux command line, you will likely use either Vim, or Nano. These are two text editors that will come preinstalled on most versions of Linux. To run either, simply type Vim/nano [filename]. This will launch the editor and allow you to edit the provided file. If the file does not exist, these programs will open an empty file that will be saved as the provided filename upon a save command being used. Nano is considered a lot more beginner friendly, as Vim is known for having a pretty steep learning floor, however the skill ceiling for Vim is a lot higher, and becoming proficient in it’s keybindings will allow you to edit files incredibly fast.

The last command I will write about is cat. The official use is to concatenate files, and print to the command line, but like the touch command it is typically used to display the contents of various types of files. The syntax is cat [filename]. Running this command will display all the contents of the file. Be cautious when using this on non-text files or compressed/encrypted files. It may display characters that can confuse your shell, and then you may need to launch a new shell to get it to work properly again.

References

Galov, N. (2021, August 9). 111+ mind-boggling linux statistics and facts for 2021 - linux rocks! HostingTribunal. Retrieved January 22, 2022, from https://hostingtribunal.com/blog/linux-statistics/#gref

Vaughan-Nichols, S. (2015, October 15). Can the internet exist without linux? ZDNet. Retrieved January 22, 2022, from https://www.zdnet.com/article/can-the-internet-exist-without-linux/

Live Analysis

This essay was written for Professor Leinecker's Digital Forensics I course.

Live analysis, or a live response, is when you observe the volatile memory of a system. This kind of memory can be RAM or cache memory, for example. A problem with live analysis is volatility. Much of this information is what is called “volatile”, meaning it will be deleted or altered if the device loses power. So why do live analysis then? Sometimes it is the only option. If you have a warrant that limits you to live response, or if you need to capture that volatile memory to observe internet activity, for example, this would restrict you to only performing a live response. Of course, if you already have (or have seized) a device, you can pull a full image from it and do analysis, but you won’t be able to view much of this volatile memory because it will be gone if the device is powered off. Imaging a hard drive is great for analysis, but it won’t contain any of the vital volatile memory that may be needed to gain evidence for a conviction of wrongdoing.

What kind of information might be found before a computer is shut down? Well, this can be anything from user data, to applications or programs. Basically, it is the temporary memory of a computer. Web cache and browsing information might be stored in volatile memory too, revealing somebody’s web activity. Cookies are an example of web-based volatile memory. They are stored in the web browser for (typically) as long as the browser remains open. However, once the browser is closed, the cookies are then lost. Sometimes, programs or applications will store passwords or their hashes in memory. If you can pull the strings from RAM memory, you may be able to crack the hashes and gain the cleartext passwords.

The risks to doing live response can be great, depending on the system, time you have to do it, and tools you use. If you are performing live analysis on a production server, for example, you have to be very careful not to crash the system. Some tools might use too much power or might not work well with the configuration of the server, and cause a hard-crash. This is bad, because crashing a production system may give the company leverage to hold you liable for losses incurred during the outage; for a big company these losses may be great. Some tools also don’t actually perform pure live analysis, meaning they may alter parts of memory in the process of collection. This in turn would make any evidence gathered moot, because altered evidence has no integrity and is oftentimes not admissible in court, or sometimes even in private investigations. Another consideration is data availability and integrity prior to collection. Some people may set up traps that will automatically shut down the computer if it detects live analysis being done on it. Also, most people have encrypted hard drives and keep their computer closed at all times when not on it. They do this because the non-volatile memory is encrypted, and as long as they have their computer, they can shut it down when not using it.

Wireshark is an example of a live analysis tool. It can be set up on a machine to capture network traffic between that machine and other endpoints. Magen RAM capture is a tool that does what it’s name implies: it captures volatile RAM memory. RAM Capturer does the same.

Data Hiding

This essay was originally written 25 Sept 2021 for Professor Leinecker's Digital Forensics I course.

Data hiding, or steganography, is the practice of hiding data within other data. The data that is hidden is called the payload. This can be text, images, videos, any data. The data that it is hidden in is called the carrier. Again, this can be any type of data. Oftentimes, data is hidden within slack space or free space. It can also be hidden by replacing carrier data values with payload data values.

Data hiding has been practiced for centuries. A long time ago, data hiding would be done with invisible ink, or using wax to cover stones with engravings in them that read out a secret message. In the present age, data hiding is done electronically. You can deconstruct any file you want to hide into bits, shove those bits somewhere they won’t be seen, and then reconstruct the file at a later time. This technique is so incredible because it is invisible to the human eye. Looking at an image file or listening to an audio file that has data hidden in it will usually not be detectable unless forensically analyzed.

To find a file hidden inside another file, you should begin by looking for the file signature, or header. This is a string of hex values that every file of that file-type begins with. For a JPEG, it is FF D8 FF. Every JPEG image of the JFIF format will begin with these 6 hex values. Make those values the start of your block, and have it end at the file trailer (the last bits in every file of that type.) Extract those bits to their own file and you now have the original file that was hidden.

To find text in a hidden file can be more difficult because of encryption. If you were to hide human-readable text inside a file, you could easily find it by extracting strings of consecutive human-readable characters from the file. This is because most of the ASCII characters in a file are not human-readable (meaning not in the English alphabet.) For example you may have a string of text that reads: “ÀYŒ€c¸+k·£‘»zzŠñït”. However if you have a string of consecutive characters that read: “This is a code”, you can assume that it was inserted in there on purpose. If the text inserted into a file are encrypted into a format that combines readable and non-readable characters, it would be very difficult to distinguish these from the original file’s data.

One example of the former is hiding data within image color values. An image has data in the form bytes defining its RGB values. The least significant bit often makes no noticeable difference in the color or quality of an image. This is because it affects less than half of a percent of a pixel’s value. The human eye struggles with perceiving differences that small. To perform the data hiding, you would look at two bit streams: one being the bits of data you wish to hide, called the payload bits, the other being the least significant bits for the RGB values in an image, called the carrier bits. You simply replace the carrier bits with the payload bits. After this is done, the image should look no different to the human eye, but you have hidden your data within the image. To view your data, you would go through the least significant RGB value bits, extract it and concatenate it with the next one. After you have finished, your payload bit stream will be exactly the same as before. The only problem with this method is, if you wanted to return the carrier image used to hide the data to its original, unchanged form, you couldn’t unless you saved those RGB values that were replaced.

Another method of data hiding is using slack space. Slack space exists when only a portion of a cluster is used. This space is simply unused space and you can add whatever data to it you wish, making it the perfect carrier for payloads. The only caveat is that if that space gets overwritten (through use of the file system,) you may lose your payload bits. This method then clearly works best with unused medium (archive file systems, backup file systems, etc.)

Anti-Forensics: 3 tools

This essay was originally written 03 Oct 2021 for Professor Leinecker's Digital Forensics I course.

xxUSBSentinel (https://github.com/thereisnotime/xxUSBSentinel)

You can download the executable from github.com for this tool. It’s goal is to “make recovering your encryption drive keys almost impossible.” I had trouble understanding this tool at first so I did more research. When a computer is shut down, there is a chance that a key storage utility will lose the encryption key for an encrypted drive, requiring the user input it again [1]. Knowing this information, the tool’s purpose is a quick key-loss implementation. You launch the program, plug and un-plug your USB stick, and then next time you put your USB stick in, you can arm the device to listen for the disconnect message - after which it will shut down your computer.

Let’s say you have an encrypted USB stick on your computer. You want the key for this drive to be forgotten should investigators look at your computer. This utility makes it so that when the user pulls their USB stick from the computer, the encryption key is forgotten and must be entered again upon the next use of the drive. An investigator can get around this via social engineering, by accessing the computer while the USB stick is still inserted.

Metadata-Remover (https://github.com/Anish-M-code/Metadata-Remover)

This tool is simple - it removes identifiable metadata from images and videos. The use of this tool is to protect your anonymity when posting media. This is a CLI tool written in C and Python3. To use it you can just download the latest release, install to a directory, and run the exe. This launches a command line. Drag your image to the “images” folder in the tool directory, then you put the name of the image. This scrubs it.

The importance of this tool is to protect your anonymity. People can use image and video metadata to identify your location and other personally identifiable information. A forensic investigator can thwart these efforts by gaining access to the original file. The unfortunate part with this method, is that it is supposed to be used when uploading files online, so an investigator would need access to the host machine and there may be a copy of the original.

ForensicsF***er (https://github.com/NoahGWood/FileChanger)

This tool modifies timestamps after a pre-determined length of time passes. Basically, a user can run this program and modify the file times in EXT-4 filesystem (Linux only.) The tool also has a really cool feature called self-destruct mode that will delete the python script after use. This tool is more of a proof-of-concept than an actually usable tool (only works on Linux, uses pre-determined files.) But the POC could be used in a more user-friendly tool.

The real-world use of this tool is to remove data integrity. If you have evidence, and all the timestamps are from a date after the investigation started, this could ruin the integrity of the investigation (at least from an outside perspective) and force evidence to be thrown out. A forensic investigator could prevent this by removing the program before it goes off, or by imaging the file system multiple times and only working on those that have not been modified by the program.

Sources:

[1] Afonin, O. (2021, September 23). Forensic implications of sleep, hybrid sleep, hibernation, and fast startup in Windows 10. ElcomSoft blog. Retrieved October 3, 2021, from https://blog.elcomsoft.com/2021/09/forensic-implications-of-sleep-hybrid-sleep-hibernation-and-fast-startup-in-windows-10/.

Getting Started with WinHex

This post was originally written 27 August 2021 for Professor Leinecker's Digital Forensics I course.

This paper will discuss my experience with WinHex. I do not own the product yet, however I have read up on the software and watched videos of it in action. To begin, it seems when you open a file or examine a drive, you can view the raw hex of it. I am not too sure when this would come in handy, but my guess would be in defeating steganography efforts. What seems really cool however, is the ability to edit the hex values. This would allow you to change the contents of a file at such a low level, that you may be able to obfuscate the original data or alter it in a non-human-readable way, however this is just speculation.

I am curious about how the software is able to recover deleted data. I do not know enough yet to make much more than an educated guess on how the process works, but I would image it may look at recently modified sections of memory and somehow reverse engineer what the memory was to its original state (maybe factoring in the time since modification, or performing an inverse function of some sort, i.e. if deleted data puts the bits through a deletion algorithm: put them through the inverse of said algorithm.)

I would like to see if the software can be used in Capture The Flag challenges. This is a hobby of mine, and if WinHex could assist in file-recovery focused CTF challenges, or steganography-focused ones, then it would give me a leg-up on the competition. Again, not knowing much about the software makes it difficult to speculate, but I would imagine that WinHex will eventually be either put in a suite of tools that all come pre-baked into a Digital Forensics suite, or it will eventually be replaced by an open-source free version.