Press "Enter" to skip to content

Author: Kevin Feasel

SSIS And NUMA

SQL Sasquatch has some SSIS package issues stemming from a lack of NUMA awareness:

So the server had plenty of free RAM.  But NUMA node 1 was in a pinch.  And SSIS spooled its buffers to disk.  Doggone it.

I guess I’d figured that notifications were sent based on server-wide memory state.  But I guess maybe memory state on each NUMA node can lead to a memory notification?

The target SQL Server instance, a beefy one, was also on this physical server.  There’s 1.5 TB of RAM on the server.  🙂

It also looks like the easiest fix is something which was deprecated in Windows Server 2012 R2.

Comments closed

ElasticSearch Production Checklist

Darren Perucci has a checklist for spinning up a production ElasticSearch environment:

When you are looking for a node or a cluster, it is a good idea to have a name which describes what you will need to find and separates one from another.

The default cluster name of “elasticsearch ” could allow any nodes to join the cluster, even if this was not intended. Thus, it is a good idea to give the cluster a distinct identifier instead.

There are a few things to think about in this post.

Comments closed

Row-Level Security

Steve Jones talks about row-level security:

I’ve wanted this feature to be easy for a long time. In fact, I’ve implemented a similar system a few times in different applications, but it’s been a cumbersome feature to meet, plus each developer needs to understand how the system works for it to work well. Even in the case where we once used views to hide our RLS, it was a performance issue.

Microsoft has made things easier with their Row Level Security feature. This was actually released in Azure in 2015, but it’s now available in SQL Server 2016 for every on premise installation as well.

Essentially for each row, there is some data value that is checked to determine if a user has access. This doesn’t mean a join. This doesn’t mean you write a lot of code. The implementation is simple, and straightforward, and I like it.

The implementation is pretty easy, but I’m concerned about the performance.  At least from my early view, this adds a good bit of performance drag on your queries.  That’s not so bad in a data warehouse environment, but in a busy transactional system, I’m not sure it can keep up.  I’d be interested in hearing other thoughts, though.

Comments closed

Checked C

Microsoft has introduced Checked C:

Checked C is a modified version of C that addresses the issues that arise with pointers, C’s mechanism for accessing memory directly. The language provides several new kinds of pointer and array types that come with built-in safeguards. They’re distinct from the existing unsafe pointer types in C, so a programmer can use the new, checked pointer types for safety and revert back to the unsafe types if that’s ever required.

The new version also provides checked program scopes. These are blocks of the program code where bounds checking is turned on by default. If the programmer attempts to use an unchecked pointer within such a scope, the compiler will reject it. It’s also possible to specify that a whole program be checked by default, by way of a compiler #pragma directive

I wonder how much traction this language will get.  I can imagine most of the people still writing C code being innately anti-Microsoft and anti-external checks.

Comments closed

SSIS Parameterization

Slava Murygin shows how to use project parameters and expressions to make SSIS packages a bit more dynamic:

Being on SSIS presentation recently, I’ve realized that a lot of people, who are working with SSIS for years, still do not know what “Parameterizing” is and how to do it.

SSIS has been changed a lot in SQL Server 2012, where Microsoft announced “Project Deployment Model”. Since then you can deploy Project, and you can assign Parameters to that project, which can be passed to it for execution. Before that, developers used Configurations to supply values for internal variables and connections.

Adding parameters to packages grants you a huge level of flexibility when moving between environments or reusing components.

Comments closed

Comparing Query Plans

Kenneth Fisher wants to compare two query plans using SQL Server 2016:

One of the new/not so new features that I’ve been playing with off and on is the ability to compare two query plans. I’m still relatively new at performance tuning and I frequently like to compare the query plans from before and after any changes I make. Historically I’d bring up both plans next to each other and walk through them section by section. You can see how this method would be a bit of a pain, particularly for large plans or plans that have changed quite a bit. But SSMS 2016 can now help us out.

Lets say for example I’m working on the view [AdventureWorks2014].[HumanResources].[vEmployee]. I decide that the OPTION FAST will speed the overall performance up.

If you can’t get SQL Sentry Plan Explorer installed on your machine for whatever reason, this is a viable alternative.

Comments closed

The YARN Fair Scheduler

Justin Kestelyn discusses the Fair Scheduler in YARN:

Assume that we have a YARN cluster with total resources <memory: 800GB, vcores 200> with two queues: root.busy (weight=1.0) and root.sometimes_busy (weight 3.0).  There are generally four scenarios of interest:

 

  • Scenario A: The busy queue is full with applications, and sometimes_busy queue has a handful of running applications (say 10%, i.e. <memory: 80GB, vcores: 20>). Soon, a large number of applications are added to the sometimes_busy queue in a relatively short time window. All the new applications in sometimes_busy will be pending, and will become active as containers finish up in thebusy queue. If the tasks in the busy queue are fairly short-lived, then the applications in thesometimes_busy queue will not wait long to get containers assigned. However, if the tasks in the busyqueue take a long time to finish, the new applications in the sometimes_busy queue will stay pending for a long time. In either case, as the applications in the sometimes_busy queue become active, many of the running applications in the busy queue will take much longer to finish.

 

If you’re interested in a deeper dive into YARN, this is a good series to start with.

Comments closed

User-Defined Functions In Hive

Tim Spann talks about user-defined functions in Hive:

When you start using Hive you may miss some of the functions you are used to from Oracle, MySQL or elsewhere. Or you might just want a profanity filter. Whatever the case you can browse our list below for a large selection of UDF libraries. You can also use the pointers listed to write your own.

The Brickhouse Collection of UDFs from Klout includes functions for collapsing multiple rows into one, generating top K lists, a distributed cache, bloom counters, JSON functions, and HBase tools.

Coming from a SQL Server background, UDFs might be something you instinctively avoid (or at least that’s the case with me).  In practice, though, they’re a really good addition to the product.

Comments closed

Create An SSIS Catalog

Andy Leonard shows how to create an SSIS catalog:

Check the “Enable CLR Integration” checkbox to enable the other controls on the form.

I recommend you also check the “Enable automatic execution of Integration Services stored procedure at SQL Server startup” checkbox. This feature causes a stored procedure to execute whenever SQL Server starts. The stored procedure will identify any SSIS packages in a running (or other “active”) status and mark them as “Ended Unexpectedly.” You want this. Trust me. (As my friend Kevin Boles (LinkedIn | @thesqlguru) says, “Push the trust me button and let’s move on,” (paraphrased).

You cannot alter the name of the SSIS Catalog database. It is SSISDB. And, as in Highlander, there can be only one SSIS Catalog per instance of SQL Server.

This post is full of helpful notes if you’ve never used the SSISDB database before.

Comments closed

Hexbin Scatterplot

Devin Knight continues his Power BI visuals series:

In this module you will learn how to use the Hexbin Scatterplot Power BI Custom Visual.  The Hexbin Scatterplot is a variation of the traditional Scatter Chart but instead of using bubble size it relies on color saturation and hexbins to show value distribution.  You should consider using this chart when you’re more interested in visualizing density instead of individuals points themselves.

This is worth checking out.

Comments closed