Curated SQL – Page 898 – A Fine Slice Of SQL Server

Row-Level Security: With Row-level security (RLS) you are given the ability to publish a single report to your users but expose the data differently to each person. So instead of creating multiple copies of the same report in order to limit the data, you can just create one report that will only show the data the logged in user is allowed to see. This is done with filters, which restrict data access at the row level, and you define filters within roles. For example, creating a role called “United States” that filters the data in a table where the Region = “United States”. You then add members (user, security group, or distribution list) who can only see data for the United States to the “United States” role (the assignment of members can only be done within the Power BI Service). If a user should not have access to a report, then just don’t include that person in any of the roles for that report, so they would always see a blank report.

Click through for several more options and links to additional resources.

Comments closed

Developing Shiny Apps in Databricks

Published 2020-03-11 by Kevin Feasel

Yifan Cao, Hossein Falaki, and Cyirelle Simeone announce something cool:

We are excited to announce that you can now develop and test Shiny applications in Databricks! Inside the RStudio Server hosted on Databricks clusters, you can now import the Shiny package and interactively develop Shiny applications. Once completed, you can publish the Shiny application to an external hosting service, while continuing to leverage Databricks to access data securely and at scale.

That’s really cool. Databricks dashboards are nice for simple stuff, but when you really need visualization power, having Shiny available is great.

Comments closed

R 3.6.3 Now Available

Published 2020-03-11 by Kevin Feasel

David Smith takes a look at R 3.6.3:

On February 29, R 3.6.3 was released and is now available for Windows, Linux and Mac systems. This update, codenamed “Holding the Windsock“, fixes a few minor bugs, and as a minor update maintains compatibility with scripts and packages written for prior versions of R 3.6.
February 29 is an auspicious date, because that was the day that R 1.0.0 was released to the world: February 29, 2000. In the video below from the CelebRation2020 conference marking the 20th anniversary of R, core member Peter Dalgaard reflects on the origins of R, and releases R 3.6.3 live on stage (at the 33-minute mark).

I’m holding out for R 4, though then I’ll have to wait to see when SQL Server will officially support it.

Comments closed

Adoption Patterns with Query Store

Published 2020-03-11 by Kevin Feasel

Erin Stellato has some thoughts on Query Store adoption:

Last fall we had a previous customer reach out for help after they had to wait 45 minutes for a database to come online after a server reboot. The database queries were blocked by QDS_LOADDB waits. There were three things in play here – the first was that they had CAPTURE_MODE set to ALL, and it should be AUTO. Second, they didn’t have trace flag 7752 enabled (the behavior of which is now the default in SQL Server 2019). And the third was that their Query Store was 100GB in size. The workload was fairly ad hoc, so these three things together caused the problem initially described. They implemented the TF, made multiple changes to the settings (set CAPTURE_MODE to AUTO, changed MAX_STORAGE_SIZE_MB to 10GB, decreased CLEANUP_POLICY to 3 days), and then Query Store was usable for them.

Read on for more examples.

Comments closed

Using Query Store to Replay Workloads

Published 2020-03-11 by Kevin Feasel

John Sterrett shows us an unorthodox use of Query Store:

Today, I wanted to talk about my least favorite part of replaying workloads. It’s having an extended event or server-side trace running during a workload replay only so we can compare the results at a query-level when the replay is finished. Now, this might seem like a trivial thing but when you have workloads over 10k batch requests/sec this can consume terabytes of data quickly. The worst part is waiting to read all the data, slice and dice the data for analysis.
Starting with SQL Server 2016 there is a better and faster way to go! You can replace your extended event or server-side trace with Query Store captured data. Today, I will show you how to use the Query Store for the same purpose.

Click through for the solution.

Comments closed

Finding Query Store’s Server Impact

Published 2020-03-11 by Kevin Feasel

Tracy Boggiano shows us how to track the performance impact of Query Store on an environment:

This month’s T-SQL Tuesday blogging party is brought to you well by me and I wanted to talk more about Query Store. I did write a book on it but there if still more to know about that is not in the book. I am sure the rest of everyone’s posts will prove enlightening and provide valuable content for folks using or looking to implement Query Store. Someone should have told Grant to hold off a week on his post about DROP / CREATE of procedures and what happens with plan forcing so it could officially be part of the party.
I frequently get asked while presenting about the impact of running Query Store on the instance and one thing that was not in the book was the performance counters that were added to help track just that.

You should probably buy a bunch of copies of Tracy’s book. Just in case.

Comments closed

Query Store and Spinlocks

Published 2020-03-11 by Kevin Feasel

David Fowler takes a look at an issue with Query Store:

We moseyed on down to the server in question to take a look at it. One thing stood out immediately, CPU was pegged out at 100% but SQL itself didn’t actually seem to be doing anything, transactionssecond was on the floor. Unfortunately this happened a while back and I didn’t think to capture any graphs or metrics at the time so you’re just going to have to take my word for this.
After looking into a few different things, the mention of spinlock contention came up. I’ll be honest here, actual spinlock contention is rare and is something that I’ve seen cause an issue only a handful of times so it’s something that I don’t generally get to until I’ve ruled out just about everything else.

David’s scenario was on an older patch of SQL Server and it was fixed later. It’s a good reminder to keep those servers patched.

Comments closed

Sqoop Scheduling and Security

Published 2020-03-10 by Kevin Feasel

Jon Moirsi continues a series on Sqoop:

In previous articles, I’ve walk through using Sqoop to import data to HDFS. I’ve also detailed how to perform full and incremental imports to Hive external and Hive managed tables.
In this article I’m going to show you how to automate execution of Sqoop jobs via Cron.
However, before we get to scheduling we need to address security. In prior examples I’ve used -P to prompt the user for login credentials interactively. With a scheduled job, this isn’t going to work. Fortunately Sqoop provides us with the “password-alias” arg which allows us to pass in passwords stored in a protected keystore.

That particular keystore tie-in works quite smoothly in my experience.

Comments closed

Monitoring Data Quality on Streaming Data

Published 2020-03-10 by Kevin Feasel

Abraham Pabbathi and Greg Wood want to check data quality on Spark Streaming data:

While the emergence of streaming in the mainstream is a net positive, there is some baggage that comes along with this architecture. In particular, there has historically been a tradeoff: high-quality data, or high-velocity data? In reality, this is not a valid question; quality must be coupled to velocity for all practical means — to achieve high velocity, we need high quality data. After all, low quality at high velocity will require reprocessing, often in batch; low velocity at high quality, on the other hand, fails to meet the needs of many modern problems. As more companies adopt streaming as a lynchpin for their processing architectures, both velocity and quality must improve.
In this blog post, we’ll dive into one data management architecture that can be used to combat corrupt or bad data in streams by proactively monitoring and analyzing data as it arrives without causing bottlenecks.

This was one of the sticking points of the lambda architecture: new data could still be incomplete and possibly wrong, but until reached the batch layer, you wouldn’t know that.

Comments closed

Two Query Store Stories

Published 2020-03-10 by Kevin Feasel

Mark Wilkinson gives us two separate takes on Query Store:

When the Query Data Store (QDS) feature was announced for SQL Server 2016, we were excited about the prospect of being able to have deep insight on any query running in our environment. I work for a company that deals heavily in the e-commerce space, and we have a large SQL Server footprint. Our environment is unique in that is essentially a multi-tenant system, but all the tenants could have wildly different workloads. It’s really the kind of query execution scenario QDS was built for. We had the pleasure of working with the Microsoft SQLCAT team to get 2016 and QDS up and running in our production environment before it was GA.
In this post I’m going to share two stories about our QDS experience (from pre and post GA of the feature). One from the perspective of the Database Developer, and one from the Database Administrator. For the most part this is not a technical post full of queries and code samples. It’s just me talking about some things I have experienced using QDS in production.

CentralQDS, by the way, is really cool. Hopefully we’re able to show that to the world someday (and note how I say “we” even though I did absolutely nothing with it except for being in the same company as the people who developed it).

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts