Automated Data Classification and Sensitivity Labeling Across Your Entire File Estate

The majority of enterprise data exposure incidents are not caused by sophisti-cated attacks — they are caused by employees who do not know which files are sensitive, and systems that do not tell them. Automated classification solves this by removing human judgment from the initial categorization decision and applying consistent, policy-defined labels to every file across every storage sys-tem.

The Scale Problem Manual Classification Cannot Solve

Large organizations routinely accumulate tens of millions of files across Share-Point libraries, on-premises file servers, NAS systems, departmental shared drives, and archival storage. Asking users to manually classify these files pro-duces three predictable outcomes: inconsistent labeling, chronic under-labeling, and user frustration that leads to classification being ignored entirely.

The retrospective challenge is even more acute. When a new compliance re-quirement takes effect — a data localization law, an updated industry standard, or an internal governance policy — the organization needs to classify its entire existing file estate against the new framework. Manual approaches are simply not viable at this scale.

FileOrbis addresses this with an automated classification engine that operates continuously and retrospectively across all connected storage systems.

FileOrbis Classification Engine: How It Works

Pattern-Based Content Detection

The FileOrbis classification engine scans file content for defined sensitive pat-terns. Detection operates on the actual content of documents — not just file-names or metadata — using a combination of regular expression matching, key-word proximity analysis, and structural recognition for common data formats.

Out-of-the-box detection patterns include:

Personally identifiable information across multiple national formats (Turkish TC Kimlik, German Ausweisnummer, UK National Insurance Number, US SSN equivalents), IBAN and SWIFT financial identifiers, credit card numbers using Luhn validation, health record identifiers, legal privilege markers, and contract confidentiality language.

Custom classification dictionaries allow organizations to add:

Internal project codenames that should be treated as confidential
Proprietary technical terminology unique to your industry
Regulatory keywords specific to your operating jurisdiction
Customer or counterparty names that trigger handling requirements

Metadata and Structural Classification

In addition to content scanning, FileOrbis classifies files based on structural signals that are reliable indicators of sensitivity without requiring content in-spection:

Folder taxonomy: Files in designated folders — Legal, HR, Finance, Board — inherit classification rules appropriate to their context
File type: Spreadsheets, presentation decks, and database exports receive elevated default sensitivity levels pending content review
Creator and modifier identity: Files created or last modified by users in sensitive roles trigger classification review workflows
Age and last-access date: Aged files that have not been accessed in an extended period may require reclassification review under data minimiza-tion policies

Microsoft Purview Integration

Organizations that have invested in Microsoft Purview sensitivity labels can in-tegrate those labels directly with FileOrbis classification. Files already labeled in Purview retain their classification when accessed through FileOrbis, and poli-cies defined in FileOrbis can be applied based on Purview label values.

Crucially, FileOrbis extends this framework to file stores that Purview cannot reach — on-premises Windows file servers, NAS devices, and legacy document repositories. The result is a consistent classification and governance envelope across the entire enterprise file estate, not just the M365 portion.

Continuous and Retrospective Scanning

FileOrbis classification does not operate only on new files. The engine runs:

On ingestion: Every new file uploaded to, created in, or synchronized with a connected storage system is classified at the moment of entry.

On modification: When a file’s content changes, its classification is re-evaluated to reflect the updated content.

On schedule: Periodic full-estate scans re-classify all files against the current policy framework, ensuring that files classified under older rules are updated when policy changes occur.

On demand: Administrators can trigger targeted scans — for a specific folder, department, or storage system — at any time.

Classification-Driven Policy Enforcement

Classification labels in FileOrbis are not passive metadata — they are active policy triggers. Once a label is assigned:

Access control policies appropriate to the classification level are enforced automatically
External sharing restrictions corresponding to the classification apply with-out user configuration
DLP rules tied to the classification govern all movement of the file
Retention policies — how long the file is kept and when it is eligible for deletion — are applied based on the classification category
The classification label and its history are included in all audit records associated with the file

This creates a fully automated governance chain from content detection through policy enforcement — reducing the compliance burden on users and administra-tors while strengthening the actual protection applied to sensitive data.

Subscribe to our Newsletter

About FileOrbis

Aiming to manage the user and file relationship within an institutional framework, FileOrbis is constantly being developed in order to meet different industry and customer needs in terms of file management and sharing. Since 2018, FileOrbis continues to be developed with the excitement of the first day. FileOrbis focuses on high security, rich integration, ease of use and integrated management criteria.