Press "Enter" to skip to content

Author: Kevin Feasel

Spark Security

Dave Wang discusses the Databricks Enterprise Security framework:

The Databricks just-in-time data platform takes a holistic approach to solving the enterprise security challenge by building all the facets of security — encryption, identity management, role-based access control, data governance, and compliance standards — natively into the data platform with DBES.

  • Encryption: Provides strong encryption at rest and inflight with best-in-class standards such as SSL and keys stored in AWS Key Management System (KMS).
  • Integrated Identity Management: Facilitates seamless integration with enterprise identity providers via SAML 2.0 and Active Directory.
  • Role-Based Access Control: Enables fine-grain management access to every component of the enterprise data infrastructure, including files, clusters, code, application deployments, dashboards, and reports.
  • Data Governance: Guarantees the ability to monitor and audit all actions taken in every aspect of the enterprise data infrastructure.
  • Compliance Standards: Achieves security compliance standards that exceed the high standards of FedRAMP as part of Databricks’ ongoing DBES strategy.

In short, DBES will provide holistic security in every aspect of the entire big d

As enterprises come to depend on technologies like Spark and Hadoop, they need to have techniques and technologies to ensure that data remains secure.  This is a sign of a maturing platform.

Comments closed

Automate Spark Jobs Using Oozie

Mike Grimes shows how to use Oozie to automate Hadoop and Spark jobs:

This problem is easy to solve, right? You can write scripts that run jobs in sequence, and use the output of one program as the input to another—no problem. But what if your workflow is complex and requires specific triggers, such as specific data volumes or resource constraints, or must meet strict SLAs? What if parts of your workflow don’t depend on each other and can be run in parallel?

Building your own infrastructure around this problem can seem like an attractive idea, but doing so can quickly become laborious. If, or rather when, those requirements change, modifying such a tool isn’t easy . And what if you need monitoring around these jobs? Monitoring requires another set of tools and headaches.

This is a pretty detailed look at the basics of Oozie.

Comments closed

MSTest Output

Richie Lee shows how to use TestContext.WriteLine to debug MSTest runs:

Unit testing can sometimes throw up unexpected issues that are entirely concerned with the unit tests and not the application under test. Recently I’ve been having issues with DateTime formats affecting the results of the unit tests: the format of the DateTime was different on the build server than it was on my local machine. The stored procedure could not be altered (because reasons*). The datetime formats on both SQL and Windows were identical to my machine, and running the test itself on the build box caused the test to go green, so clearly there was something funky going on in the build process. SO I needed to find a way of logging the values during the process to verify where the issue was. And, as luck would have it, MSTest has such a method to help you. First create a private instance of the TestContext object in the TestClass, then you create a public property.

This turns out to be really easy to do.

Comments closed

sp_help_revlogin

Chrissy LeMaire wants to deprecate sp_help_revlogin:

Now you’ve migrated the logins with their passwords, SIDs, and a few default properties. But you don’t have the logins’ server roles, server permission sets, database roles or database permission sets. So now you gotta find and use someone’s modified version of sp_help_revlogin, but you’re still left with manually executing the procedure against your source and destination servers.

Oh, and don’t forget different versions of SQL Server use different hashing algorithms, so you’ll need to use one of the many different versions of sp_help_revlogin if you have a mixed environment.

Let’s hope you only have one or two SQL Servers to migrate and not hundreds.

Chrissy has a couple of great Powershell cmdlets to help get rid of this procedure, as well as a nice explanation of each and Youtube videos should you be so inclined.  Definitely check it out, as well as her dbatools Powershell suite.

Comments closed

PCI Compliance

Denny Cherry reminds us that SQL Server 2005 is no longer PCI compliant:

If you are running a PCI compliant system on SQL Server 2005 you are going to fail your next audit. One of the audit requirements is that the vendors must support the version of your software which you are running on. As Microsoft no longer offers support for SQL Server 2005 that’s going to cause you to fail your next PCI audit.

Microsoft’s serious about sunsetting old versions of SQL Server, and at this point, there have been five versions of SQL Server released since 2005.

Comments closed

Troubleshooting R Installations

Ginger Grant walks through how to fix a couple issues you might run into while installing SQL Server R Services:

If you look at the code from the interactive window, you will notice that the error occurred with trying to run rxSummary. In both cases I didn’t get the error when I changed the compute context to SQL Server from local, but when I tried to run a function which runs on the server. In both cases the R tools where installed prior to installing SQL Server 2016. The Open Source R tools install to C:\Program Files\R\R-3.3.0 (your version number may be higher). The Microsoft R Open installs to C:\Program Files\Microsoft\MRO\R-3.2.5. To use the libraries needed for the RevoScaleR libraries included in R Server, the version of Microsoft R required is Microsoft RRE, which is installed here C:\Program Files\Microsoft\MRO-for-RRE\8.0. Unfortunately, SQL Server 2016 shipped with version 8.0.3 not 8.0.0. If you are getting data and using a local compute context, you will have no problems. However, when you want to change your compute context to run on SQL Server, you will get an error.

While I received a different error on the server than my laptop, the reason for both messages was the same. Neither computer was running version 8.0.0.3 of the R client tools. On the server I was able to fix the error without downloading a thing. After installing a stand-alone version of R Server from the SQL Server Installation Center, the error went away and I got results when trying to run rxSummary. Unfortunately, it was not possible for me to run R Server on my laptop, as R Server is disabled from within the Installation Center. I believe that is because I have SQL Server 2016 developer edition on a laptop, not on a server. I needed to do something else to make it work.

Click the link for the full story.

Comments closed

Direct Seeding Availability Group Replicas

Mike Fal is excited about direct seeding of Availability Groups in SQL Server 2016:

In SQL Server 2012 and 2014, creating an Availability Group could take a significant amount of work. One of the more tedious tasks is setting up your replica databases. This is because that you need to restore your database to your replica node in a state close enough to the primary to allow synchronization to happen. It can take several log backup restores to hit that magic window where you can join the database on the secondary node. Then, you get to do it again on the next replica!

Enter direct seeding in 2016. With this feature you no longer have to do any of the restores. You simply create your replicas with direct seeding enabled, then when you add a database to the AG, SQL Server will directly seed the database to your replica nodes. It’s surprisingly simple.

This sounds pretty interesting.

Comments closed

Compressing LOB Data

Andy Mallon wants to compress LOB data:

The COMPRESS() function takes an input of string or binary data, and applies the gzip algorithm to it. It returns a value of type varbinary(max). In essence, instead of storing string or binary data, you can gzip it up and store it in a varbinary(max) column. There’s also a DECOMPRESS() function for when you are reading the data and need to unzip it.

This costs some CPU, but gzip can save quite a bit of space. How much space, and whether it’s worth the CPU cost will vary depending on your data and workload. In this blog post, we’ll take a look at one table. We’ll look at the space savings we get out of using COMPRESS(), and we’ll look at the effort necessary to implement it.

Read on for Andy’s test and thoughts.

Comments closed

Plan Explorer Supports SSMS 2016

SQL Sentry Plan Explorer is now compatible with SQL Server 2016:

We’ve been busy modifying SQL Sentry Plan Explorer so it will work seamlessly with the changed add-in model in the new Management Studio. (As a bit of background, SSMS is now based on the Visual Studio 2015 shell, and some work is required to transition existing add-ins.)

I am happy to announce that, as of today, Plan Explorer 2.8 (build 10.1.94), with add-in support for SSMS 2016, is now available!

Huzzah.  That’s two plug-ins down and two more to go before I’m 100% ready for SSMS 2016.

Comments closed

Primary And Foreign Keys

Shane O’Neill discusses primary key and foreign key constraints:

If you check the two definitions for Primary key and Foreign key you’ll see that, even though they are both called keys, they serve two different purposes; namely identifying rows and enforcing links.

And those two purposes are not mutually exclusive!

A column/column combo that identifies a row can also be used to enforce a link back to another table (or itself, as shown above with Foreign keys).

The assumption, that if you were one then you couldn’t be the other, was incorrect. If your business rules call for it, don’t let a column being one type of key stop it from being the other.

Not included but still important are unique constraints.

Comments closed