Log management requirements
Before delving too deeply into the challenges around cloud-based log management, let's consider some of the challenges generally facing log management. Log management itself is an ambiguous term, so, for the purposes of this article, it’s defined as a service to collect, normalize, store and search log data. Those four descriptors, defining functional capabilities, represent a lot of complexity and value to the organization if implemented correctly. However, if implemented insecurely, log management can introduce substantial risk and liabilities by centralizing vast amounts of sensitive trade secrets, passwords and user information, customer records and/or regulated personal data.
A few considerations of log management that affect functional capabilities (including when in the cloud) are business continuity planning, how long the data must be retained and the business justification (i.e. the purpose) for the use of log management. Do you need log management to be the secure repository for all of your data, or do you want it to enable functional log search?
To collect the data, we must have a way to extract and centralize the data from disparate data sources. These sources may be in the form of syslog from a network device, text files from an application, system and security logs from a server, or even tables within a database. The log management infrastructure must be able to handle these sources, supporting a wide array of common products, as well as custom sources.
Next, a log management system must be able to, in some fashion, normalize the data. This places the data in a standardized format and then extracts, or otherwise associates, portions of that data with consistently named fields. Examples of named fields include “source IP address” and “user.” A major advantage of this normalization is the consistent search across the data of multiple platforms. For example, firewalls from two separate vendors may label the same information differently. Normalization abstracts the information into a common taxonomy so the searches within the log management service do not have to directly compensate for these differences.
Third, the log management system must store the data in a way that is secure, tamper resistant, and retained for an adequate amount of time. Depending on your needs, this must include a way to retrieve records in bulk; for example, to provide data responsive to litigation concerns. If your data is sensitive, encryption of that data at rest should be a consideration.
Fourth, the log service must provide a useful search, reporting and extract interface. Your organization will obtain little value if the data is centralized, but cannot be effectively used because, for example, the search interface is too limited.
Cloud-based log management
Now that we have a simple description of log management and understand the factors that complicate it, let’s consider how these change when moving to a cloud computing provider. Our first functional capability was the collection of data from disparate log sources. That task must still happen and is essentially unchanged when moving to the cloud, except for an increased importance of encryption during transport. Just as with a COTS vendor, you must still know your major log sources and verify that the vendor’s products support them.
The second factor, normalization, is also unchanged: The cloud service must properly normalize the data if you want to be able to gain the most value out of your logs. It is critical, with both COTS and cloud-based offerings, to verify that the data within the supported products is actually normalized properly.
The third functional factor, storage, is an area where the cloud solutions differ from the in-house COTS solutions. When designing the in-house solution, you must ensure you have adequate hardware in terms of processing power, storage capacity and storage speed. The compression level of the log management software is important for estimating the disk capacity you’ll need to support your data retention requirements. With cloud-based solutions, those concerns are essentially removed; your service charge is probably based on the data volume and/or the number of devices sending data, but you don’t need to plan, implement and support that hardware. Not only does the cloud-based solution remove that responsibility from you, but it also removes the up-front capital expenditures, replacing them with ongoing operational expenses that typically work better from a budgeting perspective.
The security of sensitive data at rest, even when on servers in a secure facility you operate, is always a consideration. A server can be hacked into and the data extracted, a server admin (with legitimate authorized access) can copy or tamper with the data, or any other number of issues (accidental or intentional) can cause that data to be exposed. Encryption of that data at the application and disk level, preferably such that server administrators do not have the ability to decrypt, can substantially reduce that exposure risk. When the data moves to the cloud, those controls are beyond your ability to enact and verify. You need to determine how, if at all, the cloud computing provider secures the data and prevents it from being tampered with or accidentally intermingled with that of another cloud customer. When asked about data security, vendors often will focus on encryption of the data in-transit and the security of the data center(s) where the servers are housed. Neither of these controls will make it more difficult for an attacker who has compromised one or more servers to obtain your data.
The possibility of a rogue administrator taking your data is very real and should be discussed with your cloud computing provider. By rogue administrator, I mean one with authorized access to the log management system, but without a legitimate need to access your specific data. This risk is the same when the data is stored in-house. However, in that case, you have more insight into the administrators you hire (or contract) and can audit their activities. Does the cloud vendor monitor the activities of the system administrators who interact with customer-facing systems? On behalf of a client of mine, I recently asked the technical account manager of a MSSP with cloud-based log management services how they track which of their employees access my client’s data. The MSSP’s answer pertained to the security of the data centers that the MSSP’s employees use and how it prevents unauthorized persons from accessing client networks. While that is an important consideration, it does not mitigate against or identify a rogue (or bored) MSSP employee sorting through my client’s very sensitive logs. If the data is encrypted at the application level, and keyed to you, then the data will not be viewable by anyone outside your company even if they gain access to the cloud vendor’s servers.
Standard data lifecycle questions apply to cloud computing providers as they maintain their systems and your data within your data retention period. How does the vendor sanitize data on decommissioned hardware? How do they secure any backup media?
Another complicating factor for cloud-based log management is the production of large amounts of historical data. If, for example, you must produce three months worth of log data from 10 servers, will you be able to do so? How will the vendor produce and provide it to you? Imagine a scenario where your company needs to provide log data for litigation purposes -- will you be able to? This is less important if the log management service is not intended to be the record of retention, as you’d be able to collect the data from the various individual sources.
Our last functional area is search: How useful and flexible is the vendor’s search functionality? Do you have the ability to search for text across all log entries? Can you search for specific things, such as every occurrence of a specified user name? Can your search results be aggregated and charted? Will the log management platform allow you to search for the all of the viruses shown in your antivirus logs and sort that by the number of computers infected by each virus? Do you need flexible search or not? These functional questions should all be guided by the underlying business need driving log management.
Log management consists of two core areas: collecting logs and making that data useful. You need to understand why you want log management in order to determine collection and usage needs. Your solution requirements should be based on your business needs, and those will be identical for in-house and cloud-based solutions.
About the author:
Tom Chmielarski is a Senior Consultant at GlassHouse Technologies.
This was first published in January 2011