Press "Enter" to skip to content

Author: Kevin Feasel

Create An SSIS Catalog

Andy Leonard shows how to create an SSIS catalog:

Check the “Enable CLR Integration” checkbox to enable the other controls on the form.

I recommend you also check the “Enable automatic execution of Integration Services stored procedure at SQL Server startup” checkbox. This feature causes a stored procedure to execute whenever SQL Server starts. The stored procedure will identify any SSIS packages in a running (or other “active”) status and mark them as “Ended Unexpectedly.” You want this. Trust me. (As my friend Kevin Boles (LinkedIn | @thesqlguru) says, “Push the trust me button and let’s move on,” (paraphrased).

You cannot alter the name of the SSIS Catalog database. It is SSISDB. And, as in Highlander, there can be only one SSIS Catalog per instance of SQL Server.

This post is full of helpful notes if you’ve never used the SSISDB database before.

Comments closed

Hexbin Scatterplot

Devin Knight continues his Power BI visuals series:

In this module you will learn how to use the Hexbin Scatterplot Power BI Custom Visual.  The Hexbin Scatterplot is a variation of the traditional Scatter Chart but instead of using bubble size it relies on color saturation and hexbins to show value distribution.  You should consider using this chart when you’re more interested in visualizing density instead of individuals points themselves.

This is worth checking out.

Comments closed

The Secret Lives Of Seeks

Rob Farley digs into what happens with a seek operation:

Let’s go back to our original query, looking for address types 2, 4, and 5, (which returns 2 rows) and think about what’s going on inside the seek.

I’m going to assume the Query Engine has already done the work to figure out that the Index Seek is the right operation, and that it has the page number of the index root handy.

At this point, it loads that page into memory, if it’s not already there. That’s the first read that gets counted in the execution of the seek. Then it locates the page number for the row it’s looking for, and reads that page in. That’s the second read.

But we often gloss over that ‘locates the page number’ bit.

The upshot is rather interesting:  in certain edge cases, an uglier query can be better than an easier-to-understand query.  If you do this, however, you definitely want to document it; otherwise, you’ll leave the next maintainer (which could be you!) confused.

Comments closed

More On Direct Seeding AGs

Erik Darling discusses direct seeding of Availability Groups:

This isn’t in the GUI (yet?), so don’t open it up and expect magic mushrooms and smiley-face pills to pour out at you on a rainbow. If you want to use Direct Seeding, you’ll have to script things. But it’s pretty easy! If I can do it, anyone can.

I’m not going to go through setting up a Domain Controller or Clustering or installing SQL here. I assume you’re already lonely enough to know how to do all that.

The script itself is simple, though. I’m going to create my Availability Group for my three lovingly named test databases, and add a listener. The important part to notice is SEEDING_MODE = AUTOMATIC. This will create an Availability Group called SQLAG01, with one synchronous, and one asynchronous Replica.

Mike Fal wrote about this as well.

Comments closed

tidyr Update

Hadley Wickham has a new version of tidyr out:

I’m pleased to announce tidyr 0.5.0. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has a simple convention: put variables in the columns and observations in the rows. You can learn more about it in the tidy data vignette.

Check out the latest version of tidyr; it’s one of the most useful data manipulation packages on the R platform.

Comments closed

Databricks Community Edition

Databricks has released an IDE for Spark:

We are excited to announce the General Availability (GA) of Databricks Community Edition(DCE). As a free version of the Databricks service, DCE enables everyone to learn and exploreApache Spark, by providing access to a simple, integrated development environment for data analysts, data scientists and engineers with high quality training materials and sample application notebooks.

Less than four months ago, at Spark Summit New York, we introduced Databricks Community Edition (DCE) beta. Its introduction generated tremendous interest with thousands of people requesting accounts. Today, we are delighted to report that more than 8,000 users have signed on DCE, many of them using the service heavily. The top 10% active users are averaging over 6 hours per week, and are executing over 10,000 commands on average.

They also just started an EdX course on an introduction to Spark yesterday.  If you’re interested in Spark but haven’t had the time to learn, this might be a good course to take.

Comments closed

Spark Security

Dave Wang discusses the Databricks Enterprise Security framework:

The Databricks just-in-time data platform takes a holistic approach to solving the enterprise security challenge by building all the facets of security — encryption, identity management, role-based access control, data governance, and compliance standards — natively into the data platform with DBES.

  • Encryption: Provides strong encryption at rest and inflight with best-in-class standards such as SSL and keys stored in AWS Key Management System (KMS).
  • Integrated Identity Management: Facilitates seamless integration with enterprise identity providers via SAML 2.0 and Active Directory.
  • Role-Based Access Control: Enables fine-grain management access to every component of the enterprise data infrastructure, including files, clusters, code, application deployments, dashboards, and reports.
  • Data Governance: Guarantees the ability to monitor and audit all actions taken in every aspect of the enterprise data infrastructure.
  • Compliance Standards: Achieves security compliance standards that exceed the high standards of FedRAMP as part of Databricks’ ongoing DBES strategy.

In short, DBES will provide holistic security in every aspect of the entire big d

As enterprises come to depend on technologies like Spark and Hadoop, they need to have techniques and technologies to ensure that data remains secure.  This is a sign of a maturing platform.

Comments closed

Automate Spark Jobs Using Oozie

Mike Grimes shows how to use Oozie to automate Hadoop and Spark jobs:

This problem is easy to solve, right? You can write scripts that run jobs in sequence, and use the output of one program as the input to another—no problem. But what if your workflow is complex and requires specific triggers, such as specific data volumes or resource constraints, or must meet strict SLAs? What if parts of your workflow don’t depend on each other and can be run in parallel?

Building your own infrastructure around this problem can seem like an attractive idea, but doing so can quickly become laborious. If, or rather when, those requirements change, modifying such a tool isn’t easy . And what if you need monitoring around these jobs? Monitoring requires another set of tools and headaches.

This is a pretty detailed look at the basics of Oozie.

Comments closed

MSTest Output

Richie Lee shows how to use TestContext.WriteLine to debug MSTest runs:

Unit testing can sometimes throw up unexpected issues that are entirely concerned with the unit tests and not the application under test. Recently I’ve been having issues with DateTime formats affecting the results of the unit tests: the format of the DateTime was different on the build server than it was on my local machine. The stored procedure could not be altered (because reasons*). The datetime formats on both SQL and Windows were identical to my machine, and running the test itself on the build box caused the test to go green, so clearly there was something funky going on in the build process. SO I needed to find a way of logging the values during the process to verify where the issue was. And, as luck would have it, MSTest has such a method to help you. First create a private instance of the TestContext object in the TestClass, then you create a public property.

This turns out to be really easy to do.

Comments closed

sp_help_revlogin

Chrissy LeMaire wants to deprecate sp_help_revlogin:

Now you’ve migrated the logins with their passwords, SIDs, and a few default properties. But you don’t have the logins’ server roles, server permission sets, database roles or database permission sets. So now you gotta find and use someone’s modified version of sp_help_revlogin, but you’re still left with manually executing the procedure against your source and destination servers.

Oh, and don’t forget different versions of SQL Server use different hashing algorithms, so you’ll need to use one of the many different versions of sp_help_revlogin if you have a mixed environment.

Let’s hope you only have one or two SQL Servers to migrate and not hundreds.

Chrissy has a couple of great Powershell cmdlets to help get rid of this procedure, as well as a nice explanation of each and Youtube videos should you be so inclined.  Definitely check it out, as well as her dbatools Powershell suite.

Comments closed