Skip to content

Latest commit

 

History

History
682 lines (528 loc) · 27.5 KB

forensics.adoc

File metadata and controls

682 lines (528 loc) · 27.5 KB

Forensics

Luke Jones


What is Forensics?

In general, computer science professionals refer to "Digital Forensics" as "Forensics", for simplicity’s sake. Digital Forensics is the field in cybersecurity that tries to gather and understand evidence after an incident, which can be crime, to determine how it happened. This not only helps law enforcement when pledging someone innocent or guilty, but also to understand how to improve security in a system that was successfully attacked. Digital Forensics focuses on gathering evidence present in computer devices that hold information electronically. It is a branch of Forensic Science, which can also investigate any type of crime even if there is not computer media involved.

How to search for strings and filenames

We will begin by learning how to search for information in a file system. Go to the picoCTF webshell at:

Once you are connected, open up this problem in a separate tab:

Download the problem file in your webshell by right-clicking the link in the problem description and selecting Copy Address or Copy Link. Then download it by typing in `wget ` and pasting the address after 'wget', space. Your command should look something like this, but is likely to not be exactly the same:

$ wget https://jupiter.challenges.picoctf.org/static/495d43ee4a2b9f345a4307d053b4d88d/file

You need to copy and paste your own link for the file.

Great! So now you should have the challenge file saved on your webshell as file. Now what?

As a reflex, you should always use the program file on new files that CTF challenges give you. The next command is kind of confusing, because the first word references the program file and the second word references the file named file, but run this command and see what it tells you:

$ file file

If done properly, it should tell you:

file: ASCII text, with very long lines

This tells us the file is plain text, but has unusually long lines. Since it is plain text, we can use cat to see what it contains.

$ cat file

Running this command will show that the file is mostly made up of gibberish. If this were a cryptography challenge, decoding the gibberish might be what needs to happen, but this is a 100 point general skills question, so I doubt that’s what needs to happen here. What is the challenge author pushing us towards? There’s only one hint and it is a grep tutorial. What is grep?

Grep is a Linux utility, so we can learn about it by bringing up its man page:

$ man grep

The first line of the man page says:

grep, egrep, fgrep, rgrep - print lines that match patterns

This is perfect! We want to search through gibberish to find the flag. But how do we specify the pattern to search for and the file to search in? For this, I recommend the grep tutorial in the hint, not the man page. (Man pages tend to be highly technical and can be confusing to novices)

One of the first examples in the grep tutorial uses the following command:

$ egrep 'mellon' mysampledata.txt

'mellon' is what is being searched for and it is being searched for in 'mysampledata.txt' What if we searched for 'picoCTF' in 'file'? That command would look like:

$ egrep 'picoCTF' file

This should get the flag for you and print it on your screen.

Let’s consider another challenge:

Download the zip file into your webshell like you did for the previous challenge. As before, use file on it right away to have an idea of what you’re dealing with:

$ file files.zip

You should see the following output:

files.zip: Zip archive data, at least v1.0 to extract, compression method=store

To see more of this challenge, all we have to do is unzip the archive:

$ unzip files.zip

You’ll see a lot of output, but you can ignore that for now. List the contents of your current directory to see the new directory called 'files'. Try exploring that a bit with cd and ls, remember that you’re looking for a file called 'uber-secret.txt'.

It may be hard to find 'uber-secret.txt' without the help of a tool. This problem is called 'First Find' and our last problem was called 'First Grep'. Is there a tool called 'find' in Linux? See if there is a manpage:

man find

There is! The first line reads:

find - search for files in a directory hierarchy

This sounds perfect. Exit the manual by pressing 'q'. As mentioned before, manpages are quite technical and can be overwhelming to try and read when you are first starting out. Let’s find some simpler examples by Googling. My Google query was find file linux command. I felt the need to specify linux command because find is such a generic word. My top Google result was this:

I especially liked this result because I know plesk is not a commercialized site. Scroll down to the first example under Basic Examples.

find . -name thisfile.txt

This command means: starting in the current directory (which is what ., dot means), look in this directory and all subdirectories for the file named 'thisfile.txt'. We can slightly modify this example to fit our needs for the challenge.

Make sure you are in the 'files' directory for this command. If you unzipped the archive in your home directory, you can use the following command to get back to the 'files' directory:

$ cd ~/files

Once you’re in the files directory, use this command:

$ find . -name uber-secret.txt

If you were in the 'files' directory when you ran this command, you should get the following output:

./adequate_books/more_books/.secret/deeper_secrets/deepest_secrets/uber-secret.txt

This is the path to the file that was found. We’re going to get into the same directory as this file by following the directories listed in this file path. We know that '.' is our current directory, so our first cd is to 'adequate_books'. Remember to use the Tab key to autocomplete unambiguous file and directory names. To explain what I mean by 'unambiguous' here’s a relevant example of an ambiguous file name in our current context:

$ cd a

If you press the Tab key after only typing 'a' it won’t autocomplete because there are two directories that start with 'a', 'acceptable_books' and 'adequate_books'. The shell doesn’t know which one you want. To get Tab to autocomplete type the following unambiguous directory name and then strike tab:

$ cd ad

When you press tab, it becomes:

$ cd adequate_books/

One last note on tab completion. When there is an ambiguous file name that doesn’t tab complete to something, you can hit the tab key again to see the list of files that could be completed with your given prefix. The other possibility is that there are zero matches on your given prefix, in which case nothing is printed when you hit tab a second time.

So now we are in 'adequate_books', what’s next? From our found file above, 'more_books' is after 'adequate_books', so we cd accordingly:

$ cd more_books/

For this directory, observe the difference between ls -l and ls -al. You’ll see that an additional directory is shown when the '-a' flag is given. This flag means 'show all (including hidden files and directories)'. In Linux, any file or directory starting with '.' is considered hidden and will only be shown in specific circumstances.

$ cd .secret/
$ cd deeper_secrets/
$ cd deepest_secrets/

All of these cd commands could be combined into a single command, but I’ve broken them up here for clarity and exposition. List the contents of 'deepest_secrets':

$ ls -al

To see the contents of the file, use cat:

$ cat uber-secret.txt

There’s the flag for this challenge!

Try this slightly more difficult challenge with your new found skills:

Disk analysis

One of the most fundamental skills of a forensics analyst is inspecting and deeply understanding disks. These can be actual hardware or dumps of disks captured in files. There are a few really good GUI tools out there for not just disk analysis, but whole management of digital evidence for cases. Our disk analysis problems will not require any licenses to proprietary software. Some people like to use Autopsy which is a GUI frontend to the tools we will demonstrate how to use in this section. We will use the individual Sleuthkit tools so that you learn a little more than from a GUI that abstracts away some of the details. Disks are all about the details.

Sleuthkit Intro presentation

We will be considering disk images exclusively, due to the difficulty of sending real hard drives through the Internet at the time of this writing! Try this picoGym problem, which presents the first step in analyzing disk images:

This problem should be pretty approachable given what you’ve done leading up to this point, namely downloading individual challenge files and using command line utilities. Something new in this challenge is using netcat or nc. For this challenge, nc is used to access a checker program. This program will check your answer to the challenge and give you the flag if it is correct. For this challenge, the invocation of nc (what you type to run it) is given and is straightforward, but I will explain it for the sake of clarity. Here’s my given nc invocation: nc saturn.picoctf.net 52279 The last number might be different for you, that’s expected. We’ll go through what each part of this program call means:

  • nc This, of course, is the name of the program we are running. Netcat, or 'nc' as this system calls it. Sometimes the program name will be the full 'netcat' variety, but on the webshell, it is 'nc'.

  • saturn.picoctf.net This is the name of the computer we’re connecting to. This is a challenge server that picoCTF runs.

  • 52279 This is the number of the port we’re connecting to for the challenge. This will probably be different for your challenge.

So go ahead and solve your first Sleuthkit problem on the picoGym and learn the tool, mmls, which we will use for subsequent problems.

Sleuthkit Apprentice walkthrough

Here’s the next challenge in that short series:

This challenge requires mmls as a first step to use other Sleuthkit tools, but now is the time for some true forensic background.

A disk image is a huge dump of many numbers. But these numbers have an invisible structure to them that gives them much more meaning. Navigating this invisible structure manually is tedious and deeply difficult, but the Sleuthkit tools handle this invisible structure for us. To begin using the Sleuthkit tools we must understand some of the layers that apply to disk images. The four main layers are: media, block, inode, and filename.

  • Media: the media layer tools all are prepended with 'mm' and operate on the disk image with little guidance from the analyst. mmls is a media layer tool that gives us the partition table of the image and key information for delving into the other layers. Media is the lowest level, providing key information to access the deeper layers, but not shedding much light on the data contained in the image.

  • Block: the block layer is the second lowest level of the four layers considered here. Block layer tools are prepended with 'blk' in the Sleuthkit. blkcat is a block layer tool that outputs the contents of a single block. The block layer is the numbers of the disk image broken into equal-sized chunks. A single file is likely to contain multiple blocks.

  • Inode: the inode layer is the bookkeeping layer of a disk image. It’s like the table of contents, with the chapter numbers being like the inodes, and the pages like the blocks of a file. Inode layer tools are prepended with 'i'. icat is an inode layer tool that outputs a single file based on its inode number.

  • Filename: the filename layer is one layer that most any user of a computer actually sees and interacts with. This is the layer with which we will start our exploration of the Sleuthkit in the current challenge. Interacting with the filename layer will look a lot like using the shell normally. Filename layer tools are prepended by 'f'. fls lists the files on an image starting at the root. This is what we will use for our exploration of the disk image.

First off, download the challenge file:

$ wget https://artifacts.picoctf.net/c/331/disk.flag.img.gz

Next, decompress the challenge file:

$ gunzip disk.flag.img.gz

Dump the partition table of the disk image. We want to find the offset to the main partition:

$ mmls disk.flag.img

DOS Partition Table
Offset Sector: 0
Units are in 512-byte sectors

      Slot      Start        End          Length       Description
000:  Meta      0000000000   0000000000   0000000001   Primary Table (#0)
001:  -------   0000000000   0000002047   0000002048   Unallocated
002:  000:000   0000002048   0000206847   0000204800   Linux (0x83)
003:  000:001   0000206848   0000360447   0000153600   Linux Swap / Solaris x86 (0x82)
004:  000:002   0000360448   0000614399   0000253952   Linux (0x83)

It would seem that the fourth partition is the main partition, because it is the largest and has an uneven length. That’s a bit of a guess, but it’s for sure either partition labeled 'Linux (0x83)'. Copy the 'Start' value to your clipboard of the fourth partition. Let’s look at the root of this partition by supplying the 'Start' value to the offset option in fls:

$ fls -o 360448 disk.flag.img

d/d 11: lost+found
d/d 12: boot
d/d 1985:       etc
d/d 1986:       proc
d/d 1987:       dev
d/d 1988:       tmp
d/d 1989:       lib
d/d 1990:       var
d/d 3969:       usr
d/d 3970:       bin
d/d 1991:       sbin
d/d 451:        home
d/d 1992:       media
d/d 1993:       mnt
d/d 1994:       opt
d/d 1995:       root
d/d 1996:       run
d/d 1997:       srv
d/d 1998:       sys
d/d 2358:       swap
V/V 31745:      $OrphanFiles

This looks like the main partition because it has many of the standard linux root directories, like 'home', 'usr', 'root', etc. Remember that fls is part of the filename layer Sleuthkit tools. You can think of fls as standing for 'filename list'. Here, it’s listed all the top-level directories in the disk image.

This next part requires some forensic intuition. A lot of these directories are system-generated and maintained. Let’s focus on the directories that have a lot of potential user influence like root and home. But first, let’s take a step back and print the help information for fls:

$ fls

fls will print some succinct help information if ran with no arguments. This is true for many command line tools and programs, but is not universal.

$ fls
Missing image name
usage: fls [-adDFlhpruvV] [-f fstype] [-i imgtype] [-b dev_sector_size] [-m dir/] [-o imgoffset] [-z ZONE] [-s seconds] image [images] [inode]
        If [inode] is not given, the root directory is used
        -a: Display "." and ".." entries
        -d: Display deleted entries only
        -D: Display only directories
        -F: Display only files
        -l: Display long version (like ls -l)
        -i imgtype: Format of image file (use '-i list' for supported types)
        -b dev_sector_size: The size (in bytes) of the device sectors
        -f fstype: File system type (use '-f list' for supported types)
        -m: Display output in mactime input format with
              dir/ as the actual mount point of the image
        -h: Include MD5 checksum hash in mactime output
        -o imgoffset: Offset into image file (in sectors)
        -p: Display full path for each file
        -r: Recurse on directory entries
        -u: Display undeleted entries only
        -v: verbose output to stderr
        -V: Print version
        -z: Time zone of original machine (i.e. EST5EDT or GMT) (only useful with -l)
        -s seconds: Time skew of original machine (in seconds) (only useful with -l & -m)

The first line after our fls invocation with no arguments is an error message, saying that we failed to include a mandatory argument, the image name. However, fls uses the opportunity to educate us on how to properly invoke it. All arguments in square brackets, i.e. '[' and ']', are optional. Anything not in square brackets is mandatory. After the invocation is a helpful note saying 'If [inode] is not given, the root directory is used'. This is how we first used fls. We supplied no inode and the root directory was printed. But now, we want to look at specific directories so we will need their inodes. Helpfully, fls actually prints those along with file and directory names. It’s the number on the line with each name, if we look back to our listing of '$ fls -o 360448 disk.flag.img' we can find the inode number for /home which is 451. Let’s add that to our fls call:

$ fls -o 360448 disk.flag.img 451
$

This actually seems to do nothing. It’s not actually doing nothing, there just are no results. /home is an empty folder in the disk image. Let’s try another directory, /root. Go back and get the inode number and plug it into fls:

$ fls -o 360448 disk.flag.img 1995
r/r 2363:       .ash_history
d/d 3981:       my_folder

This directory has a file, called .ash_history and a directory named my_folder. Let’s see what is in 'my_folder'. Use the inode number like before:

$ fls -o 360448 disk.flag.img 3981
r/r * 2082(realloc):    flag.txt
r/r 2371:       flag.uni.txt

Bingo! Now with the inode number of 'flag.uni.txt' we can print the file using icat:

$ icat -o 360448 disk.flag.img 2371
picoCTF{by73_5urf3r_adac6cb4}

Please be aware that your flag will likely have a different suffix.

Now, it’s good to go back and address what the other file in 'my_folder' was. Its name is flag.txt, why can’t we icat that file? In short, because the file has been deleted and the inode has even been reassigned to a different file. You can try using icat on the 2082 inode, but it is part of an unrelated file somewhere on the system.

If you want to continue to learn about Sleuthkit tools, try this problem:

If you want to use what you know to dive even deeper into a disk, try this problem:

If you get stuck, try reading writeups of the challenges. Just google search 'Writeup, [challenge name], picoCTF'. There’s going to be various levels of quality and depth in writeups, so don’t feel like you have to stick with the first one you look at.

Packet analysis

Another important field of forensics is packet or network analysis. This field of forensics conerns itself with understanding what has happened on a network through the examination of captured packets. This will require the use of a GUI tool called 'Wireshark', which means you cannot use the webshell to complete this problem. The webshell can be used to complete many introductory problems, but more advanced problems sometimes need a GUI tool to be solved in an efficient manner. Consider this an exercise in installing and using GUI tools. Knowing how to do this will help you greatly in the future.

Installing Wireshark

On your computer, download Wireshark from their site:

You must download the version corresponding to your operating system. It should be a straightforward process, however, if you have any issue or doubt, you can Google plenty of good documentation about Wireshark.

If you’re using a Chromebook you will need administrator privileges to enable Linux mode on the device. With Linux mode enabled, you can install Wireshark through apt-get and run it with the Linux terminal.

Packet Primer walkthrough

Consider this picoCTF challenge:

Download the packet capture and open it in Wireshark. It should look like this once you open it. Google how to open a packet capture in Wireshark if you can’t figure it out by exploring the menus of the tool.

for 1 open wireshark
Figure 1. Packet Primer opened in Wireshark

Packet analysis is all about filtering, even for this packet capture that is tiny. Most packet captures are going to have thousands if not tens of thousands of packets. This capture has only 9 because it is an introductory problem. You could manually inspect each packet and that wouldn’t be a bad strategy, but we want to approach this problem more technically, because it is just setting us up for future problems that have thousands of packets.

So, we know that the flag is unlikely to be in the ARP messages as these are just messages relating IP addresses and hardware addresses. To filter out ARP messages, add !arp to your filter in Wireshark:

Note
'ARP' stands for Address Resolution Protocol and these messages are common in every network capture as it is needed to connect a hardware address to an IP address.
for 2 arp filter
Figure 2. Packet filter to discard ARP messages

Of the remaining 5 packets, the first 3 are the TCP handshake and so they can be ignored. Of the remaining 2 packets, let’s look at the one that has the PSH flag set, which means there is data for the application in the packet:

Note
The TCP handshake, also known as the 'three-way handshake' can be identified by the flags in the packets. First 'SYN' from host A, the 'SYN, ACK' from host B, then finally, 'ACK' from host A. 'SYN' stands for synchronization, and 'ACK' stands for acknowledgement. Both parties synchronize and acknowledge.
for 3 psh packet
Figure 3. Packet with PSH flag set

When you click on packet 4, you should see the flag in the packet bytes pane, you may have to scroll down to see it all:

for 4 packet flag
Figure 4. Flag in packet bytes pane

Remember, your flag might be different than mine. It would be good to notice that there was something different about the packet with the flag from the beginning. It has a protocol of 'S101', and it’s the only one. Such glaring oddities should always be examined. Sometimes, the only clue in a packet analysis problem is a small difference between the flag packet and the rest of the thousands of packets. A good strategy is to filter as many packets as you can, then look for oddities. I should note also that there is not always a 'flag packet'. Sometimes a flag can span across multiple packets, just like packet payloads can span across multiple packets.

Note
'S101' is an uncommon protocol. The packet isn’t really speaking S101, it is just using the preferred port of the protocol, port 9000.

Leave your packet capture open if you can. We are going to use it to illustrate concepts introduced in the next section.

Network Layers

We’ll now cover some background to deepen your understanding of packets and networks. The networks we commonly use today, are broken down into different layers. This design by layers assigns responsibilities to each layer to accomplish something. It is good to have a design by layers for several reasons. For example, if network engineers want to make a change in one of the layers, the impact on the other layers is minimized. Another example, is that if you are a programmer and want to connect your application with a server, you do not necessarily need to care if the user is using wifi or ethernet cable, or how the user is connecting to the internet. Your application can simply trust other layers are going to take care of that and your application will have a successful connection. These are the layers, viewed in a top down approach.

  1. Application layer: Responsible for handling data traffic between applications. HTTP belongs to this layer; HTTP protocol is commonly used to obtain Web Pages. In the Packets Primer capture, click the fourth packet. This packet’s application layer is called 'Data' in the middle pane. Click the arrow to expand the view of the layer. There’s not much in this display because the application data is just the flag. Other layers will break down all the fields of a layer, showing the value for each one in the packet.

    for 5 app layer
    Figure 5. Application layer expanded
  2. Transport layer: Responsible for providing several connections on the same host, that means that you can have several applications on the same device and each of them can have a different connection even if it is just one device. It also defines functionalities for reliable transport. Two protocols are used on this layer. TCP (Transport Control Protocol). You use this protocol when you need to have reliable transport, this makes sure that if a piece of information was missing while being transfertransferred it is resent. HTTP from the Application layer, runs on top of TCP, because when you visit a Web Page you want to have every part of it accurately. On the other hand, when you don’t need reliable transport, but you want faster transport that does not resend parts that were missing, UDP (User Datagram Protocol) is used. An example when UDP is needed is for voice communication. When you are talking if a little part of the audio is missing, you do not want it to appear later in the communication because that would confuse the listener. The listener can still understand what you are saying if the part missing is small enough. Since UDP has no controls for transport, it is faster than TCP. This layer assigns a port to each connection, and that is how it tells the difference between connections in the same computer, because of the port.

  3. Network layer: It provides devices with an address in the network called the IP (Internet Protocol) address, and routes information through different routers. It provides mapping between all the computers connected to the internet. When you connect to a network in some specific place, an IP is assigned to your device.

  4. Data link layer: It provides communication between devices that are connected directly. Examples of protocols in the data link layer are Ethernet or WiFi. You generally use WiFi to send messages to your router directly without any other devices in between. Each device has a physical address in wifi or ethernet, known as the mac address. The mac address is used for this layer. This is not an address like the IP that can change depending on the network you are connected to. The mac address is assigned to the hardware of your network card when it is manufactured.

  5. Physical layer: This handles electrical pulses on the wire that represent bits.