This content originally appeared on DEV Community and was authored by Ayush Kumar
Introduction
Data serves as the vital essence of any organization. Whether you’re dealing with sensitive customer information, or financial records, safeguarding your data is non-negotiable.
Many organizations face challenges such as:
- How do you protect the data if you don’t know where it is?
- What level of protection is needed?—because some datasets require more protection than others.
Azure Synapse Analytics offers powerful features to help you achieve this, ensuring confidentiality, integrity, and availability.
In this blog, we’ll explore the Data Encryption capabilities integrated into Azure Synapse Analytics, discussing encryption techniques for data at rest and in transit, as well as approaches for detecting and categorizing sensitive data in your Synapse workspace.
What is Data Discovery and Classification?
Imagine your company that have massive amounts of information stored in their databases. But some of columns needs extra protection – like Social Security numbers or financial records. Manually finding this sensitive data is a time-consuming nightmare.
Here’s the good news: there’s a better way! Azure Synapse offers a feature called Data Discovery that automates this process.
How does Data Discovery work?
Think of Data Discovery as a super-powered scanner. It automatically goes through every row and column of your data lake or databases, looking for patterns that might indicate sensitive information. Just like a smart assistant, it can identify potentially sensitive data and classify those columns for you.
Once the data discovery process is complete, it provides classification recommendations based on a predefined set of patterns, keywords, and rules. These recommendations can then be reviewed, and then Sensitivity-classification labels can be applied to the appropriate columns. This process is known as Classification.
What happen after classifying sensitivity labels on columns?
Sensitivity-classification labels is a new metadata attributes that have been added to the SQL Server database engine. So, after classifying sensitivity labels on columns, the organization can leverage these labels to:
- implement fine-grained access controls. Only authorized person with the necessary clearance can access sensitive data.
- masking the sensitive data when accessed by users who do not have the necessary permissions, allowing them to see only anonymized versions of the data.
- monitoring of access and modification activities on sensitive data (Auditing access to sensitive data). Any unusual or unauthorized activities can be flagged for investigation.
Steps for Discovering, Classifying or labelling columns that contain sensitive data in your database
The classification includes two metadata attributes:
Labels: The main classification attributes, used to define the sensitivity level of the data stored in the column.
Information types: Attributes that provide more granular information about the type of data stored in the column.
Step 1 -> Choose Information Protection policy based on your requirement
SQL Information Protection policy is a built-in set of sensitivity labels and information types with discovery logic, which is native to the SQL logical server. You can also customize the policy, according to your organization’s needs, for more information, see Customize the SQL information protection policy in Microsoft Defender for Cloud (Preview).
Step 2 -> View and apply classification recommendations
The classification engine automatically scans your database for columns containing potentially sensitive data and provides a list of recommended column classifications.
- After accepting recommendation for columns by selecting the check box in the left column and then select Accept selected recommendations to apply the selected recommendations.
You can also classify columns manually, as an alternative or in addition to the recommendation-based classification.
To complete your classification, select Save in the Classification page.
Note: There is another option for data discovery and classification, which is Microsoft Purview, which is a unified data governance solution that helps manage and govern on-premises, multicloud, and software-as-a-service (SaaS) data. It can automate data discovery, lineage identification, and data classification. By producing a unified map of data assets and their relationships, it makes data easily discoverable.
Data Encryption
Data encryption is a fundamental component of data security, ensuring that information is safeguarded both at rest and in transit. So, Azure Synapse take care of this responsibility for us. It leverages robust encryption technologies to protect data.
Data at Rest
Azure offers various methods of encryption across its different services.
Azure Storage Encryption
By default, Azure Storage encrypts all data at rest using server-side encryption (SSE). It’s enabled for all storage types (including ADLS Gen2) and cannot be disabled. SSE uses AES 256 to encrypts and decrypts data transparently. AES 256 stands for 256-bit Advanced Encryption Standard. AES 256 is one of the strongest block ciphers available and is FIPS 140-2 compliant.
Well, I know these sounds like some Hacking terms. But the platform itself manages the encryption key, so we don’t have to understand these Hacking terms
. Also, it forms the first layer of data encryption. This encryption applies to both user and system databases, including the master database.
Note: For additional security, Azure offers the option of double encryption. Infrastructure encryption uses a platform-managed key in conjunction with the SSE key, encrypting data twice with two different encryption algorithms and keys. This provides an extra layer of protection, ensuring that data at rest is highly secure.
Double the Protection with Transparent Data Encryption (TDE)
For dedicated SQL pools, enabling Transparent Data Encryption (TDE) adds a second layer of data encryption. It performs real-time I/O encryption and decryption of database files, transaction logs files, and backups at rest without requiring any changes to the application. By default, it uses AES 256.
By default, TDE protects the database encryption key (DEK) with a built-in server certificate managed by Azure. However, organizations can opt for Bring Your Own Key (BYOK), that key can be securely stored in Azure Key Vault, offering enhanced control over encryption keys.
Data in transit
Data encryption in transit is equally crucial to protect sensitive information as it moves between clients and servers. Azure Synapse utilizes Transport Layer Security (TLS) to secure data in motion.
Azure Synapse, dedicated SQL pool, and serverless SQL pool use the Tabular Data Stream (TDS) protocol to communicate between the SQL pool endpoint and a client machine. TDS depends on Transport Layer Security (TLS) for channel encryption, ensuring all data packets are secured and encrypted between endpoint and client machine. It uses a signed server certificate from the Certificate Authority (CA) used for TLS encryption, managed by Microsoft. Azure Synapse supports data encryption in transit with TLS v1.2, using AES 256 encryption.
This content originally appeared on DEV Community and was authored by Ayush Kumar