everythingpossible - Fotolia


How to test data analysis in virtual environments

Security teams need to practice and test data analysis, but it can be challenging to do in small environments. Expert Frank Siemons explains some ways to make it work anyway.

The world of IT security revolves around data. Generating, collecting, storing and analyzing data is at the heart of security logs. But these massive data sets place a lot of stress on storage and processing resources.

In a professional production environment, the data collection and analysis infrastructure should be well organized and there should be enough resources to allow for a reliable and well-performing setup.

However, a security professional will need to practice and test data analysis and manipulation within a much smaller environment, because of budget constraints and the need for greater flexibility. Fortunately, this is still possible with a low budget and a limited amount of time. Let's look at some ways to test data analysis in small environments with a small budget. A security professional should set up a test data analysis lab and run the security tests even if the business is small and on a limited budget. Let's look at some ways to efficiently run a test data analysis procedure for a small environment.


To keep the expenses low and the configuration flexibility high, utilize as much virtualized hardware as possible. VMware ESXi is a good candidate for a data analysis test lab, but other products such as Parallels, VirtualBox or Hyper-V would also work. For this example, let's use VMware. After registering for a free license, the hypervisor can be installed on relatively inexpensive hardware -- just be aware of the supported CPU requirements though, or it could get expensive. On top of the hypervisor, many different virtual hosts can be installed and linked together via virtual networks. This includes, for instance, Linux and Unix-based proxy servers, IDS and firewall appliances, and Splunk and Syslog Servers.

Creating a VM for testing in VMware

Proxy servers

A proxy server provides great visibility of the network traffic and handles traffic all the way up to the application layer of the ISO model. It shows HTTP and FTP specific communication, for instance. There are many free and open source proxy servers available, like ClearOS, CacheGuard and pfSense, for example. PfSense, which is based on FreeBSD, is one of the easiest to manage and it has many advanced features built in, such as Squid Proxy, SSL inspection, antivirus and VPN. Remote Syslog is a selectable option for the firewall logs, but in order to also export the proxy logs, the following line needs to be added to the Custom Options field within the "proxy server" settings: access_log syslog:LOG_LOCAL4. This should be followed by a restart. For SSL Inspection, which can provide HTTPS proxy logging, the Squid 3 Proxy package needs to be installed via the Web UI.

Remote logging options.


A great starting point for centralized logging and to run a test data analysis procedure for security is Splunk Light. It is free after registration and can be easily installed on an Ubuntu Server, for instance. Considering the host itself is virtualized, the entire server comes at no cost. Another option available is Splunk Enterprise Free. This starts off as a 60-day trial allowing up to 500 MB of data to be indexed per day, but it will provide the option to change to a perpetual Splunk Enterprise Free license after those 60 days.

Companion article

See Infosec Institute's accompanying article on Configuring a Test Lab for Data Analysis

A Splunk universal forwarder is needed to collect the logs -- in this case pfSense Remote Syslog -- and forward those into a Splunk index. This forwarder can be installed on the same host as the Splunk Light Search Head (the Web UI) or on a separate host. Configuration and customization of the data feeds and the listening ports of the Splunk universal forwarders can all be done via the Splunk Web Interface.

Splunk Web interface


Data mining and data science is a hot topic at the moment. For anyone who wants to delve deeper in this subject, there is the option to run a Hadoop test server. Normally Hadoop runs in a cluster configuration, but this can be reduced to a single-node pseudo cluster as well. This single server will then contain both the datanode and the namenode. In order to interchange the data between the Splunk and Hadoop servers, Hadoop Connect can be downloaded and installed. This requires Splunk Enterprise Free to be installed and will not work on Splunk Light due to the app support requirement within Splunk. Installation manuals and videos are available on the Splunk site.

Using Hadoop Connect


In the end, security is all about data. Once this test lab has been setup, usable data needs to be generated. This requires some activity within the virtual network. That activity could be a matter of running the home LAN through the Proxy server, setting up a Honeypot on the perimeter or running simulated attacks within the network. Because of the use of virtual components, the test lab will be flexible enough to adapt to the required scenarios and it will be able to generate many types of data ready for analysis.

Next Steps

Learn about the basics of big data security analytics tools

Look at a comparison of the top big data security analytics tools

Report testing checklist: Perform QA on data analysis reports

Dig Deeper on Cloud Computing Virtualization: Secure Multitenancy - Hypervisor Protection