Press "Enter" to skip to content

Category: Security

Data Masking And Row-Level Filtering In Hadoop

Syed Mahmood and Srikanth Venkat discuss two security features in Apache Ranger:

Dynamic data masking via Apache Ranger enables security administrators to ensure that only authorized users can see the data they are permitted to see, while for other users or groups the same data is masked or anonymized to protect sensitive content. The process of dynamic data masking does not physically alter the data, or make a copy of it. The original sensitive data also does not leave the data store, but rather the data is obfuscated when presenting to the user. Apache Ranger 0.6 included with HDP 2.5, introduces a new type of authorization policy called “Masking Policy” that can used to define which specific data fields are masked and what are the rules for how to anonymization or pseudonymize the specific data. For example, a security administrator may choose to mask credit card numbers when displayed to customer service personnel, such that only last four digits are rendered in the form of XXXX-XXXX-XXXX-0123. The same would be true of sensitive data such as social security numbers or email addresses that are masked to be rendered in a different formats based on data masking rules.

This is part one of a two-part series; part two will dig into the technical details.  I have to wonder if Ranger’s dynamic data masking is as easy to circumvent as SQL Server’s.

Comments closed

Ambari And Active Directory

Jon Morisi documents his efforts in getting Ambari to play nicely with Active Directory over Kerberos:

You then need to trust the certificate on all the linux hosts
From the IBM article:

  1. Create ‘/etc/pki/ca-trust/source/anchors/activedirectory.pem’ and paste the certificate contents

  2. Trust CA cert: sudo update-ca-trust enable; sudo update-ca-trust extract; sudo update-ca-trust check

  3. Trust CA cert in Java:

  4. mycert=/etc/pki/ca-trust/source/anchors/activedirectory.pem sudo keytool -importcert -noprompt -storepass changeit -file ${mycert} -alias ad -keystore /etc/pki/java/cacerts

  5. And at last, please make sure every node on your cluster has access to the ad host.

LDAP support is a key part of setting up a production Hadoop cluster.

Comments closed


Kevin Hill diagnoses an SSPI error:

Apparently, the account was either locked out from our failed logon attempts, or had been disabled in Active Directory due to its age.  They do that sometimes.   Most likely the issue was locked.

We restarted the SQL Server (O/S restart) and that resolved it once the AD group unlocked it.

My assumption is that the lockout either blocked Kerberos authentication due to SPN no longer being valid, or the SPN itself got corrupted.  It was still there, just not working.   Verified its existence through running SetSPN -L with the account name.

This is on my top five list of least helpful error messages.  Even if it is literally true, it does not help you diagnose and correct the issue.  There are a number of potential causes and it’s up to you to troubleshoot each one (assuming you even know that it could be an issue) until it just works again.

Comments closed

Thoughts On Dynamic Data Masking

John Martin on Dynamic Data Masking:

For the rest of this blog post, I will be working with the following scenario:

  • I have an SSRS Server, hosting a number of reports that display information about my SQL Server estate. From performance metrics through to details of failed jobs and poorly performing queries. I want to add an additional layer of security, restricting who can see the names of servers, databases, and other internal infrastructure information. Permission to view these reports will be granted to both support teams and business users, with the business users not being permitted to see the sensitive data.

John is much more optimistic about this feature than I am.

Comments closed

Cloud Security

Kenneth Fisher provides musings based on an Azure security document:

It gives you a map on how to manage your security as you move into the cloud. Note: one of the main points is that your on premise security is equally as important and has to be managed with and as a part of your cloud security.

Now if you are like me and want more than just dry reading they also provide a link to a Microsoft Virtual Academy training course called Security in a Cloud-Enabled World that follows this roadmap and provides more detail and guidance.

Read the whole thing.

Comments closed

Checking For Credentials

Denny Cherry uses a try-catch block to figure out if you can authenticate automatically and prompts you otherwise:

Runbooks are very powerful tools which allow you to automate PowerShell commands which need to be run at different times.  One of the problems that I’ve run across when dealing with Azure Runbooks is that there is no way to use the same script on prem during testing and the same script when deploying. This is because of the way that authentication has to be handled when setting up a runbook.

The best way to handle authentication within a runbook is to store the authentication within the Azure Automation configuration as a stored credential.  The problem here is that you can’t use this credential while developing your runbook in the normal Powershell ISE.

This is a clever idea.

Comments closed

Vendors And Privileges

Dave Mason has a good post about onerous third-party software requirements:

If you’re not familiar with SQL Server, the “sysadmin” server role conveys the highest level of authorization available to a login. “db_owner” also conveys a high level of authorization. Both requirements are far more than what is necessary and violate the Principle of Least Privilege. While I strongly disagree with the install-time requirements, I can at least understand the argument: it’s a one-time activity. But elevated permissions at run time are inexcusable.

Most of the time, software companies publish that because they want to avoid the hassle of support calls when people don’t grant privileges correctly.  I’ve worked with one third-party vendor in the past who sent me the actual permissions requirements after I pestered them a bit, as I wasn’t going to let just anyone have sysadmin on my servers.  But that’s not a scalable approach and does nothing for the next guy who reads the documentation and just gives sysadmin away.

Comments closed

Table-Valued Parameters With Always Encrypted

Arvind Shyamsundar wants to use Table-Valued Parameters to load data in batches into an Always Encrypted table:

With this setup on the database side of things, we proceed to develop our client application to work around the TVP limitation. The key to doing this is to use the SqlBulkCopy class in .NET Framework 4.6 or above. This class ‘understands’ Always Encrypted and should need minimal rework on the developer front. The reason for the minimal rework is that this class actually accepts a DataTable as parameter, which is previously what the TVP was passed as. This is an important point, because it will help minimize the changes to the application.

Let’s get this working! The high-level steps are outlined below; there is a full code listing at the end of this blog post as well.

The upshot is that, at least as of today, Table-Valued Parameters are not supported with Always Encrypted.  Arvind does give an alternative, however, so click through for more information.

Comments closed

Write-Only Permissions

Kenneth Fisher looks at granting write permissions but no read permissions to a user:

Now wait, why are they getting a read error when trying to UPDATE or DELETE? Because of the WHERE clause. The WHERE requires reading the data to see if a row meets the required conditions.

It turns out that write-only permissions don’t really work the way you’d want, as typically you want to read data even if your final goal is to update or delete rows.

Comments closed

Securing Spark Shuffle

Cheng Xu uses Apache Commons Crypto to secure data when Spark shuffles off to disk:

The basic steps can be described as follows:

  1. When a Spark job starts, it will generate encryption keys and store them in the current user’s credentials, which are shared with all executors.

  2. When shuffle happens, the shuffle writer will first compress the plaintext if compression is enabled. Spark will use the randomly generated Initial Vector (IV) and keys obtained from the credentials to encrypt the plaintext by using CryptoOutputStream from Crypto.

  3. CryptoOutputStream will encrypt the shuffle data and write it to the disk as it arrives. The first 16 bytes of the encrypted output file are preserved to store the initial vector.

  4. For the read path, the first 16 bytes are used to initialize the IV, which is provided to CryptoInputStreamalong with the user’s credentials. The decrypted data is then provided to Spark’s shuffle mechanism for further processing.

Once you have things optimized, the performance hit is surprisingly small.

Comments closed