Author: Kevin Feasel

The Secret Lives Of Seeks

Published 2016-06-17 by Kevin Feasel

Rob Farley digs into what happens with a seek operation:

Let’s go back to our original query, looking for address types 2, 4, and 5, (which returns 2 rows) and think about what’s going on inside the seek.

I’m going to assume the Query Engine has already done the work to figure out that the Index Seek is the right operation, and that it has the page number of the index root handy.

At this point, it loads that page into memory, if it’s not already there. That’s the first read that gets counted in the execution of the seek. Then it locates the page number for the row it’s looking for, and reads that page in. That’s the second read.

But we often gloss over that ‘locates the page number’ bit.

The upshot is rather interesting: in certain edge cases, an uglier query can be better than an easier-to-understand query. If you do this, however, you definitely want to document it; otherwise, you’ll leave the next maintainer (which could be you!) confused.

Comments closed

More On Direct Seeding AGs

Published 2016-06-17 by Kevin Feasel

Erik Darling discusses direct seeding of Availability Groups:

This isn’t in the GUI (yet?), so don’t open it up and expect magic mushrooms and smiley-face pills to pour out at you on a rainbow. If you want to use Direct Seeding, you’ll have to script things. But it’s pretty easy! If I can do it, anyone can.

I’m not going to go through setting up a Domain Controller or Clustering or installing SQL here. I assume you’re already lonely enough to know how to do all that.

The script itself is simple, though. I’m going to create my Availability Group for my three lovingly named test databases, and add a listener. The important part to notice is SEEDING_MODE = AUTOMATIC. This will create an Availability Group called SQLAG01, with one synchronous, and one asynchronous Replica.

Mike Fal wrote about this as well.

Comments closed

tidyr Update

Published 2016-06-16 by Kevin Feasel

Hadley Wickham has a new version of tidyr out:

I’m pleased to announce tidyr 0.5.0. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has a simple convention: put variables in the columns and observations in the rows. You can learn more about it in the tidy data vignette.

Check out the latest version of tidyr; it’s one of the most useful data manipulation packages on the R platform.

Comments closed

Databricks Community Edition

Published 2016-06-16 by Kevin Feasel

Databricks has released an IDE for Spark:

We are excited to announce the General Availability (GA) of Databricks Community Edition(DCE). As a free version of the Databricks service, DCE enables everyone to learn and exploreApache Spark, by providing access to a simple, integrated development environment for data analysts, data scientists and engineers with high quality training materials and sample application notebooks.

Less than four months ago, at Spark Summit New York, we introduced Databricks Community Edition (DCE) beta. Its introduction generated tremendous interest with thousands of people requesting accounts. Today, we are delighted to report that more than 8,000 users have signed on DCE, many of them using the service heavily. The top 10% active users are averaging over 6 hours per week, and are executing over 10,000 commands on average.

They also just started an EdX course on an introduction to Spark yesterday. If you’re interested in Spark but haven’t had the time to learn, this might be a good course to take.

Comments closed

Spark Security

Published 2016-06-16 by Kevin Feasel

Dave Wang discusses the Databricks Enterprise Security framework:

The Databricks just-in-time data platform takes a holistic approach to solving the enterprise security challenge by building all the facets of security — encryption, identity management, role-based access control, data governance, and compliance standards — natively into the data platform with DBES.

Encryption: Provides strong encryption at rest and inflight with best-in-class standards such as SSL and keys stored in AWS Key Management System (KMS).

Integrated Identity Management: Facilitates seamless integration with enterprise identity providers via SAML 2.0 and Active Directory.

Role-Based Access Control: Enables fine-grain management access to every component of the enterprise data infrastructure, including files, clusters, code, application deployments, dashboards, and reports.

Data Governance: Guarantees the ability to monitor and audit all actions taken in every aspect of the enterprise data infrastructure.

Compliance Standards: Achieves security compliance standards that exceed the high standards of FedRAMP as part of Databricks’ ongoing DBES strategy.

In short, DBES will provide holistic security in every aspect of the entire big d

As enterprises come to depend on technologies like Spark and Hadoop, they need to have techniques and technologies to ensure that data remains secure. This is a sign of a maturing platform.

Comments closed

Automate Spark Jobs Using Oozie

Published 2016-06-16 by Kevin Feasel

Mike Grimes shows how to use Oozie to automate Hadoop and Spark jobs:

This problem is easy to solve, right? You can write scripts that run jobs in sequence, and use the output of one program as the input to another—no problem. But what if your workflow is complex and requires specific triggers, such as specific data volumes or resource constraints, or must meet strict SLAs? What if parts of your workflow don’t depend on each other and can be run in parallel?

Building your own infrastructure around this problem can seem like an attractive idea, but doing so can quickly become laborious. If, or rather when, those requirements change, modifying such a tool isn’t easy . And what if you need monitoring around these jobs? Monitoring requires another set of tools and headaches.

This is a pretty detailed look at the basics of Oozie.

Comments closed

MSTest Output

Published 2016-06-16 by Kevin Feasel

Richie Lee shows how to use TestContext.WriteLine to debug MSTest runs:

Unit testing can sometimes throw up unexpected issues that are entirely concerned with the unit tests and not the application under test. Recently I’ve been having issues with DateTime formats affecting the results of the unit tests: the format of the DateTime was different on the build server than it was on my local machine. The stored procedure could not be altered (because reasons*). The datetime formats on both SQL and Windows were identical to my machine, and running the test itself on the build box caused the test to go green, so clearly there was something funky going on in the build process. SO I needed to find a way of logging the values during the process to verify where the issue was. And, as luck would have it, MSTest has such a method to help you. First create a private instance of the TestContext object in the TestClass, then you create a public property.

This turns out to be really easy to do.

Comments closed

sp_help_revlogin

Published 2016-06-16 by Kevin Feasel

Chrissy LeMaire wants to deprecate sp_help_revlogin:

Now you’ve migrated the logins with their passwords, SIDs, and a few default properties. But you don’t have the logins’ server roles, server permission sets, database roles or database permission sets. So now you gotta find and use someone’s modified version of sp_help_revlogin, but you’re still left with manually executing the procedure against your source and destination servers.

Oh, and don’t forget different versions of SQL Server use different hashing algorithms, so you’ll need to use one of the many different versions of sp_help_revlogin if you have a mixed environment.

Let’s hope you only have one or two SQL Servers to migrate and not hundreds.

Chrissy has a couple of great Powershell cmdlets to help get rid of this procedure, as well as a nice explanation of each and Youtube videos should you be so inclined. Definitely check it out, as well as her dbatools Powershell suite.

Comments closed

PCI Compliance

Published 2016-06-16 by Kevin Feasel

Denny Cherry reminds us that SQL Server 2005 is no longer PCI compliant:

If you are running a PCI compliant system on SQL Server 2005 you are going to fail your next audit. One of the audit requirements is that the vendors must support the version of your software which you are running on. As Microsoft no longer offers support for SQL Server 2005 that’s going to cause you to fail your next PCI audit.

Microsoft’s serious about sunsetting old versions of SQL Server, and at this point, there have been five versions of SQL Server released since 2005.

Comments closed

Troubleshooting R Installations

Published 2016-06-16 by Kevin Feasel

Ginger Grant walks through how to fix a couple issues you might run into while installing SQL Server R Services:

If you look at the code from the interactive window, you will notice that the error occurred with trying to run rxSummary. In both cases I didn’t get the error when I changed the compute context to SQL Server from local, but when I tried to run a function which runs on the server. In both cases the R tools where installed prior to installing SQL Server 2016. The Open Source R tools install to C:\Program Files\R\R-3.3.0 (your version number may be higher). The Microsoft R Open installs to C:\Program Files\Microsoft\MRO\R-3.2.5. To use the libraries needed for the RevoScaleR libraries included in R Server, the version of Microsoft R required is Microsoft RRE, which is installed here C:\Program Files\Microsoft\MRO-for-RRE\8.0. Unfortunately, SQL Server 2016 shipped with version 8.0.3 not 8.0.0. If you are getting data and using a local compute context, you will have no problems. However, when you want to change your compute context to run on SQL Server, you will get an error.

While I received a different error on the server than my laptop, the reason for both messages was the same. Neither computer was running version 8.0.0.3 of the R client tools. On the server I was able to fix the error without downloading a thing. After installing a stand-alone version of R Server from the SQL Server Installation Center, the error went away and I got results when trying to run rxSummary. Unfortunately, it was not possible for me to run R Server on my laptop, as R Server is disabled from within the Installation Center. I believe that is because I have SQL Server 2016 developer edition on a laptop, not on a server. I needed to do something else to make it work.

Click the link for the full story.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31