Security

Keys to Developing a Sound Data Loss Prevention Strategy

Agencies should select appropriate data identification strategies and build an effective monitoring environment.

Mike Chapple is associate teaching professor of IT, analytics and operations at the University of Notre Dame.

No matter what your agency’s mission, you no doubt have some critical data in your possession. Inadvertently releasing that information could lead to serious problems. How, then, are you ensuring that you maintain control over your agency’s sensitive data?

Data loss prevention (DLP) systems help keep tabs on an organization’s sensitive data by building an inventory and then maintaining control over the flow of information both inside and outside of the network. There has been a rapid adoption of DLP over the past few years, and now most organizations either already have or are considering deployment of a DLP system.

What Does DLP Technology Do?

There are three critical roles that DLP products play in the enterprise. First, they help build an inventory of the sensitive information in an environment. It’s common for organizations to have only vague ideas about where this data resides, if they even have a clear definition of what constitutes sensitive data.

Many organizations go to great lengths to protect their centralized stores of sensitive information, such as that stored in an ERP system or on enterprise databases, only to be shocked when they discover that users are copying that data into shadow systems stored on notebook computers. DLP can help identify those unsanctioned sensitive data repositories and either eradicate them or implement effective controls around them.

The second role that DLP plays is to monitor the flow of sensitive information throughout an organization. DLP products can tag sensitive information and then document how that data is transferred across networks and between systems. This can help identify business processes that work with sensitive information and implement appropriate security controls.

Finally, DLP allows organizations to proactively block the use of sensitive information that does not meet its security policies. When the DLP system identifies a flow of sensitive data, it can optionally terminate the network connection, redact the sensitive data or apply additional security controls, such as encryption. These actions take place in real time, preventing an unintended leak of sensitive data.

Identifying Sensitive Data

There are two major techniques that DLP systems use to identify sensitive information: pattern matching and document tagging. These are commonly used in parallel to create the most effective approaches to DLP.

With pattern matching approaches, the DLP system uses regular expressions to identify information that might be sensitive. This technique is typically used to identify sensitive numbers that follow a regular pattern, such as Social Security numbers (using the format xxx-xx-xxxx) and credit card numbers (using the format xxxx-xxxx-xxxx-xxxx).

Pattern matching is a good way to identify this type of information, but it is highly prone to false positives. For example, if an unformatted nine-digit number is on a system, a pure pattern-matching system has no way of telling whether it is a Social Security number, a dollar amount or a CUSIP number used to identify financial securities.

For this reason, many DLP products add contextual information to their pattern matching algorithms. In fact, this is one of the major ways that DLP manufacturers differentiate their products. Examples of ways that DLP can use context to improve the accuracy of pattern matching include:

Prioritizing formatted data over unformatted data (for example, the hyphens in 123-45-6789 make it much more likely to be a Social Security number than 123456789);
Looking at the header row in spreadsheets to determine whether it contains clues to the field’s nature;
Using field-specific knowledge to eliminate false positives. (For example, credit card numbers contain a checksum digit calculated using the openly available Luhn algorithm; sixteen-digit numbers that fail the Luhn check are not valid credit card numbers and may be ignored. Similarly, there are no valid Social Security numbers with “00” in the middle position.)

Pattern matching might also be used to search for specific keywords in documents. For example, an agency with a rigorous classification policy might configure its DLP to monitor for documents leaving the organization with the word “Confidential” in the header.

With document tagging, security administrators build an inventory of specific documents that contain sensitive information. The DLP system then has two possible approaches for detecting the attempted sharing of those documents outside of the organization. In the first approach — document fingerprinting — the system computes a cryptographic hash of the file and then compares that digital fingerprint with the fingerprints of all documents leaving the organization. In the second approach, the system stores the entire content of the sensitive files and then watches for data leaving the organization matching those patterns.

The Endpoint and the Network

There are two major environments monitored by DLP products: endpoints and networks. Many organizations start with one approach or the other and then eventually expand their DLP implementation to include both environments.

Host-based DLP solutions target the weakest link in the security chain: the notebooks, desktops and servers that serve as endpoints. These products can help identify those unknown data stores that contain sensitive information through the use of an agent-based approach. In these cases, a software agent residing on an organization’s endpoints takes an inventory of sensitive information and monitors for policy violations. Most host-based DLP products also provide users with the ability to digitally “shred” unwanted data and securely encrypt sensitive data that they do not wish to delete.

Network-based DLP products sit at the perimeter of an organization’s network and scan all outbound network traffic for potential policy violations. These systems don’t have the ability to build an inventory of sensitive information, but they do provide a last line of defense capable of stopping the flow of sensitive information before it leaves a network.

Adopting a data loss prevention strategy for an organization requires selecting appropriate data identification strategies and then building a monitoring environment that can effectively watch for identified data that is being used in violation of an organization’s security policies.