Press "Enter" to skip to content

Day: June 17, 2016

Row-Level Security

Steve Jones talks about row-level security:

I’ve wanted this feature to be easy for a long time. In fact, I’ve implemented a similar system a few times in different applications, but it’s been a cumbersome feature to meet, plus each developer needs to understand how the system works for it to work well. Even in the case where we once used views to hide our RLS, it was a performance issue.

Microsoft has made things easier with their Row Level Security feature. This was actually released in Azure in 2015, but it’s now available in SQL Server 2016 for every on premise installation as well.

Essentially for each row, there is some data value that is checked to determine if a user has access. This doesn’t mean a join. This doesn’t mean you write a lot of code. The implementation is simple, and straightforward, and I like it.

The implementation is pretty easy, but I’m concerned about the performance.  At least from my early view, this adds a good bit of performance drag on your queries.  That’s not so bad in a data warehouse environment, but in a busy transactional system, I’m not sure it can keep up.  I’d be interested in hearing other thoughts, though.

Comments closed

Checked C

Microsoft has introduced Checked C:

Checked C is a modified version of C that addresses the issues that arise with pointers, C’s mechanism for accessing memory directly. The language provides several new kinds of pointer and array types that come with built-in safeguards. They’re distinct from the existing unsafe pointer types in C, so a programmer can use the new, checked pointer types for safety and revert back to the unsafe types if that’s ever required.

The new version also provides checked program scopes. These are blocks of the program code where bounds checking is turned on by default. If the programmer attempts to use an unchecked pointer within such a scope, the compiler will reject it. It’s also possible to specify that a whole program be checked by default, by way of a compiler #pragma directive

I wonder how much traction this language will get.  I can imagine most of the people still writing C code being innately anti-Microsoft and anti-external checks.

Comments closed

SSIS Parameterization

Slava Murygin shows how to use project parameters and expressions to make SSIS packages a bit more dynamic:

Being on SSIS presentation recently, I’ve realized that a lot of people, who are working with SSIS for years, still do not know what “Parameterizing” is and how to do it.

SSIS has been changed a lot in SQL Server 2012, where Microsoft announced “Project Deployment Model”. Since then you can deploy Project, and you can assign Parameters to that project, which can be passed to it for execution. Before that, developers used Configurations to supply values for internal variables and connections.

Adding parameters to packages grants you a huge level of flexibility when moving between environments or reusing components.

Comments closed

Comparing Query Plans

Kenneth Fisher wants to compare two query plans using SQL Server 2016:

One of the new/not so new features that I’ve been playing with off and on is the ability to compare two query plans. I’m still relatively new at performance tuning and I frequently like to compare the query plans from before and after any changes I make. Historically I’d bring up both plans next to each other and walk through them section by section. You can see how this method would be a bit of a pain, particularly for large plans or plans that have changed quite a bit. But SSMS 2016 can now help us out.

Lets say for example I’m working on the view [AdventureWorks2014].[HumanResources].[vEmployee]. I decide that the OPTION FAST will speed the overall performance up.

If you can’t get SQL Sentry Plan Explorer installed on your machine for whatever reason, this is a viable alternative.

Comments closed

The YARN Fair Scheduler

Justin Kestelyn discusses the Fair Scheduler in YARN:

Assume that we have a YARN cluster with total resources <memory: 800GB, vcores 200> with two queues: root.busy (weight=1.0) and root.sometimes_busy (weight 3.0).  There are generally four scenarios of interest:

 

  • Scenario A: The busy queue is full with applications, and sometimes_busy queue has a handful of running applications (say 10%, i.e. <memory: 80GB, vcores: 20>). Soon, a large number of applications are added to the sometimes_busy queue in a relatively short time window. All the new applications in sometimes_busy will be pending, and will become active as containers finish up in thebusy queue. If the tasks in the busy queue are fairly short-lived, then the applications in thesometimes_busy queue will not wait long to get containers assigned. However, if the tasks in the busyqueue take a long time to finish, the new applications in the sometimes_busy queue will stay pending for a long time. In either case, as the applications in the sometimes_busy queue become active, many of the running applications in the busy queue will take much longer to finish.

 

If you’re interested in a deeper dive into YARN, this is a good series to start with.

Comments closed

User-Defined Functions In Hive

Tim Spann talks about user-defined functions in Hive:

When you start using Hive you may miss some of the functions you are used to from Oracle, MySQL or elsewhere. Or you might just want a profanity filter. Whatever the case you can browse our list below for a large selection of UDF libraries. You can also use the pointers listed to write your own.

The Brickhouse Collection of UDFs from Klout includes functions for collapsing multiple rows into one, generating top K lists, a distributed cache, bloom counters, JSON functions, and HBase tools.

Coming from a SQL Server background, UDFs might be something you instinctively avoid (or at least that’s the case with me).  In practice, though, they’re a really good addition to the product.

Comments closed

Create An SSIS Catalog

Andy Leonard shows how to create an SSIS catalog:

Check the “Enable CLR Integration” checkbox to enable the other controls on the form.

I recommend you also check the “Enable automatic execution of Integration Services stored procedure at SQL Server startup” checkbox. This feature causes a stored procedure to execute whenever SQL Server starts. The stored procedure will identify any SSIS packages in a running (or other “active”) status and mark them as “Ended Unexpectedly.” You want this. Trust me. (As my friend Kevin Boles (LinkedIn | @thesqlguru) says, “Push the trust me button and let’s move on,” (paraphrased).

You cannot alter the name of the SSIS Catalog database. It is SSISDB. And, as in Highlander, there can be only one SSIS Catalog per instance of SQL Server.

This post is full of helpful notes if you’ve never used the SSISDB database before.

Comments closed

Hexbin Scatterplot

Devin Knight continues his Power BI visuals series:

In this module you will learn how to use the Hexbin Scatterplot Power BI Custom Visual.  The Hexbin Scatterplot is a variation of the traditional Scatter Chart but instead of using bubble size it relies on color saturation and hexbins to show value distribution.  You should consider using this chart when you’re more interested in visualizing density instead of individuals points themselves.

This is worth checking out.

Comments closed

The Secret Lives Of Seeks

Rob Farley digs into what happens with a seek operation:

Let’s go back to our original query, looking for address types 2, 4, and 5, (which returns 2 rows) and think about what’s going on inside the seek.

I’m going to assume the Query Engine has already done the work to figure out that the Index Seek is the right operation, and that it has the page number of the index root handy.

At this point, it loads that page into memory, if it’s not already there. That’s the first read that gets counted in the execution of the seek. Then it locates the page number for the row it’s looking for, and reads that page in. That’s the second read.

But we often gloss over that ‘locates the page number’ bit.

The upshot is rather interesting:  in certain edge cases, an uglier query can be better than an easier-to-understand query.  If you do this, however, you definitely want to document it; otherwise, you’ll leave the next maintainer (which could be you!) confused.

Comments closed

More On Direct Seeding AGs

Erik Darling discusses direct seeding of Availability Groups:

This isn’t in the GUI (yet?), so don’t open it up and expect magic mushrooms and smiley-face pills to pour out at you on a rainbow. If you want to use Direct Seeding, you’ll have to script things. But it’s pretty easy! If I can do it, anyone can.

I’m not going to go through setting up a Domain Controller or Clustering or installing SQL here. I assume you’re already lonely enough to know how to do all that.

The script itself is simple, though. I’m going to create my Availability Group for my three lovingly named test databases, and add a listener. The important part to notice is SEEDING_MODE = AUTOMATIC. This will create an Availability Group called SQLAG01, with one synchronous, and one asynchronous Replica.

Mike Fal wrote about this as well.

Comments closed