My Octopress Blog

A blogging framework for hackers.

Setting Up a Malware Lab.

Setting up your malware test environment correctly is very important. This is my 2 cents on the matter.

There are 2 main options in my opinion:
1) A basic, portable lab.
2) Fully professional lab.

The essential components:
1) Easily restorable. Definitely. 2) Correct tools.
3) Upgradable/Managable.
4) Isolated!

My setup (which I’m going to use here) is what I would consider basic, say in comparsion to Fireeye’s lab or perhaps the People’s Liberation Army lab.

Install a Windows 7 VM on a host. Personally I use VMWare. I find it solid. (Set the patch level of the machine to low).

Ensure the VM is isolated BUT you can connect to the Web when needed. Point 3 - easily managable!


Sysinternals Suite



Static Analysis
PE Browse
PE Studio

Change Analysis


Reverse Engineering Tools
•Skill and patience! Comes with time.

Take a snapshot when finished and you’re golden. Next we’ll discuss Static vs Dynamic Analysis. All good analysts should at a minimum be able to perform Static Analysis! Fullstop.

Introduction to Analysing PDFs

PDF files - one of the most popular file formats!

PDF stands for Portable Data Format. They contain all the graphics, fonts and text for a document, as well as the logic information to display them. The most common PDF reader is Adobe AcrobatReader.

An interesting fact for you next pub outing - PDF readers are the second most popular software in the world. Can you guess what number one is? Attackers know that PDF reader software is installed on a potential victims machine. But they wouldn’t try exploit this would they…. Attackers sending malicious PDF documents is common, very very common infact. So being in InfoSec you should at least know the basics, and not soley rely on your sandbox(s) x 10.

Exploits for PDFs are very popular - check out Crimepack

A deeper look at the format can be found at Didier Stevens Blog

For now lets get to work. We’ll be using two tools PDF-ID and PDF-Parser. Both written by Didier Stevens

PDF-ID is not a pdf parser. It will scan through a PDF looking for PDF keywords and shows you how many times they appear in a file - helps to intially triage PDF documents.

PDF-PARSER tool allows you to parse the physical and logical structure of a PDF file.

EX1) Analysis of PDF2.pdf - a PDF with Hello World.

Results of running PDF-ID on PDF2.pdf.

PDF-ID dentified it is a PDF from the header. 6 Object files are present in the file. Most PDF files will contain some binary data but this one has been designed to be pure ASCII. As you can see the code is easy to read and it describes a series of objects.

Going further with PDF-Parser. Results of running PDF-parser on PDF2.pdf.

To parse out individual objects in PDF files we run the following command:

-o 5 to specify object 5.

-c forces to show the contents

And now it only takes the stream I want.

But we get back binary data, rather than simple ASCII which we would expect right? This is because of this filter - FlateDecode. The filter is compressing the text using zlib compression, which is very common in normal PDF files. Reduce size PDF = Faster to download.

Luckily PDF-PARSER has a flag to deal with this. Decoded compression.

EX2) Analysis of java.pdf. PDFID output.

Same as previous, still one page etc. However there is Javascript present. A user opening this PDF normally will not notice as you will see.

Using the -s flag of PDF parser to figure out which object it is in.

Javascript appears in Object 7. Search feature in PDF-PARSER is NOT case sensitive. /JS shows the actual JavaScript to run. The JS will use the app class which references the PDF document itself. It calls the alert method with some parameters. The title and the message parameters are set as well as the objects to use.

As mentioned - if you open this the JavaScript it does not run. JavaScript is an event based language, something needs to trigger the JS to run. There is nothing in the PDF to ‘tell’ it to run. Such as an OpenAction method.

EX3) Another example involving a PE file. PDF-ID output.

We have an embedded file 1 - i.e. a file embedded in the PDF. That embedded file is essentially added as an attachment. Searching for this reveals it is in Object 8.

Using -d to dump the file.

We can straight up run/sandbox at this stage. Or reverse if you know how. We’ll be covering this later.

Hope this was informative :).

When time allows I’ll post another analysis focusing on things like heap spray attacks and obfuscation.

Check out Joe Sandbox and peepdf

Edit: Joe’s Sandbox used to be free!