Key Ways to Improve DLP Coverage and Accuracy

Published 03/18/2022

Written by Amit Kandpal, Director - Customer Success at Netskope

In this blog series, we’ve been examining key questions for cloud DLP transformation. Make sure to also check out Part 1, Part 2, and Part 3. In this final part, let’s look at some available options in decreasing order of breadth in terms of coverage but increasing efficacy in terms of precision and minimizing false positives.

File Types: This is one of the broadest available options with limited false positives. However, these controls can usually be easily bypassed by simple manipulation of the target files.
Keywords/Phrases Dictionaries: Most solutions have templates available that can be used to configure policies for common use cases like GDPR, PII, PCI, PHI, source code, etc. The challenge always is to be able to tune the policies to a manageable level of severity and false positives as these are very broad by definition (e.g. names, medical conditions etc.)
Regular Expressions and Patterns: Regular Expressions support a more precise detection of sensitive content and can be a powerful tool for tuning the policies based on the analysis of false positives.
Classification Labels: Most DLP tools have the functionality to read and apply labeling based on metadata and content scans. The classification labels are a great way to increase accuracy but initiatives to implement data classification across an organization can be a lengthy and frustrating process.
Data Identifiers: can be a powerful tool for data detection based on specific content with high accuracy (NIE, TIE, DNI, Credit Cards, many many more) and are available with the most advanced DLP solutions.
Contextual Policies: As discussed in the earlier blog posts of this series, contextual policies are a great way to reduce the risk surface. Device Classification, for example, allows you to define rules that function like posture checks, and then evaluate devices based on these rules. The rules vary based on the OS Platform being applied to. Once evaluated, the devices are classified as “Managed” or “ Unmanaged “ and allow granular control of activities
Machine Learning-based Classification: AI/ML classification can be a very effective option for some use cases that do not lend themselves well to other traditional methods. Some examples of these use cases are sensitive M&A documents, tax forms, source code, desktop screenshots, passports, IDs. Some advanced tools have the option to train your own classifiers.
FingerPrinting/Exact Match OCR: DLP Fingerprints enables you to protect unstructured confidential information by generating a unique DNA(classification) for sensitive files. Exact Data Match can be used to protect structured data. You can feed the system a document or database and it will be able to detect it, fragments, and even variations of it. OCR works by examining images, either on their own (png, jpeg, gif, bmp) or embedded inside a document (PDF, Office, Archives).