Press "Enter" to skip to content

Curated SQL Posts

Concept Help In Powershell

Fred Weinmann gives us a reading list of Powershell built-in documentation:

Newcomers to PowerShell are often faced with a lot of things to learn, a lot of disorienting features and little in the way of obvious guidance. While there are quite a few good books out there by now, finding the right one is not always easy.
That said, PowerShell comes with a lot of built in documentation:
Get-Help about_*
Will list a whole lot of articles explaining many a concept, but finding just what you need – especially before you know all the right words to search for – is a lot harder.

There’s a lot to this and more seasoned developers can skip a lot of it, but if you’re just learning about Powershell, I think it’s a nice ordering of where to start learning.

Comments closed

Gapless Sequential Values

Gail Shaw shows how to build gapless sequential values:

To be clear, I don’t think this is a good idea. The identity column works well if a sequential series of numbers are needed. If the number sequence needs to that spans tables, then the sequence object is a good replacement.

But, there will always be some requirements that insist on gap-less sequences, or insist on not using identity (probably for ‘compatibility’ reasons), so let’s see how to do it properly.

For true surrogate keys, this is a bad idea because it is both unnecessary and overkill.  Where this becomes useful is cases where an auditor is expecting a proper sequence without any gaps, such as invoice numbers, check numbers, or purchase order numbers.  As Gail mentions, those are uncommon scenarios.

Comments closed

Adaptive Join Behavior

Grant Fritchey explains adaptive joins in SQL Server 2017:

Currently the adaptive join only works with columnstore indexes, but according to Microsoft, at some point, they will also work with rowstore. The concept is simple. For larger data sets, frequently (but not always, let’s not try to cover every possible caveat, it depends, right), a hash join is much faster than a loops join. For smaller data sets, frequently, a loops join is faster. Wouldn’t it be nice if we could change the join type, on the fly, so that the most effective join was used depending on the data in the query. Ta-da, enter the adaptive join.

First, the statistics are used at compile time for the tables we’re joining. Based on those statistics, a row target it set. Below that threshold, a loops join will be used. Above that threshold, a hash join. The way the row count is determined is that the operator will always build the hash table. With the hash table built and loaded, it will know how many rows it has. If it’s going to do a loops join, the hash table is tossed and a loops join commences. If the threshold has been passed on the row counts and it’s going to do a hash join, it already has the hash table built and proceeds to do a hash join. It’s easy to understand. Let’s see it in action.

Click through to see it in action.  It’s not a world-changer yet, but as it becomes available to rowstore queries (without the filtered, empty columnstore index trick), I think people will come to appreciate it.

Comments closed

Don’t Use SMALLDATETIME

Randolph West argues against using the SMALLDATETIME data type:

But let’s say you don’t need that kind of accuracy and are happy with a granularity to the nearest minute. Maybe you’re storing time cards and don’t think it’s necessary to store seconds. As discussed in the Fundamentals series, you really want to choose the most appropriate datatype for your data.

Enter SMALLDATETIME, which rounds up or down to the nearest minute. The seconds value for any SMALLDATETIME is 00. Values of 29.999 seconds or higher are automatically rounded up to the nearest minute, while values of 29.998 seconds or lower are rounded down.

Read on to see Randolph’s explanation of why he recommends against using SMALLDATETIME.

Comments closed

Building A Biml Project: Adding Tasks

Martin Schoombee continues his series on building a Biml project by adding SSIS tasks:

The <DirectInput> tag is pretty self-explanatory and executes the stored procedure (in this case). It could also have been a SQL query.

The <Results> element contains a <Result> child element for each field returned in our query. I found this a little confusing at first, but the Name property’s value should reflect the name of the field that is returned from your query or stored proc.

The VariableName property is then used to map the returned field’s value to the relevant variable, and note that we don’t use the “User::<variable>” notation that we see elsewhere in SSIS, but we refer to the scope of the variable with the “User.” prefix. I’m guessing that the change in syntax is because of some other dependencies and/or potential clashes between the Biml compiler and SSIS itself.

This has been a good introductory-level series so far; check it out if you haven’t already.

Comments closed

Launching A Sparklyr Cluster

David Smith shows how to launch a sparklyr cluster in Azure:

When you’re finished, shut down your cluster using the aztk spark cluster delete command. (While you can delete the nodes from the Pools view in the Azure portal, the command does some additional cleanup for you.) You’ll be charged for each node in the cluster at the usual VM rates for as long as the cluster is provisioned. (One cost-saving option is to use low-priority VMs for the nodes, for savings of up to 90% compared to the usual rates.)

That’s it! Once you get used to it, it’s all quick and easy — the longest part is waiting for the cluster to spin up in Step 5. This is just a summary, but the full details see the guide SparklyR on Azure with AZTK.

It’ll take a bit more than five minutes to get started, but it is a good sight easier than building the servers yourself.

Comments closed

Flotilla: Containerized Task Execution

Akshay Wadia, et al, introduce a new open source tool:

Data scientists are not always equipped with the requisite engineering skills to deploy robust code to a production job execution and scheduling system. Yet, forcing reliance on data platform engineers will impede the scientists’ autonomy. If only there was another way.

Today we’re excited to introduce Flotilla, our latest open source project. Flotilla is a human friendly service for task execution. It allows you to focus on the work you’re doing rather than how to do it. In other words, Flotilla takes the struggle out of defining and running containerized jobs.

It looks like an interesting service.

Comments closed

dbatools Making Extended Events Easier

Chrissy LeMaire solves a bunch of common problems with Extended Events:

So why do people keep using traces? We compiled a list of reasons from Erin Stellato’s Why do YOU avoid Extended Events. And this list is LONG!

  • Traces are straightforward and less complex than Extended Events
    Totally seems that way!
  • Traces provide a consistent interface for mixed environments
    Whether you use SQL Server 7 or SQL Server 2017, the interface is pretty much the same.
  • Traces are faster to setup quick traces
    Just open up Profiler, connect to a server, click a few times and you’re set.
  • “Extended Events are more efficient for the SQL Server engine, but not more efficient for the DBA”
    Love this quote.
  • People already have a library of Profiler templates
    Including me
  • Ignorance of XML / Querying all the generated XML is outrageous
    When I first saw what it takes to query Extended Events, I bailed immediately. I am not learning XPATH, ever.
  • Templates work remotely across all instances
    This is also true for Extended Events, but the commenter did not know that.
  • XEvents are persistent and must be stopped manually
    Once you close Profiler or restart SQL Server, all non-default traces will disappear. Extended Events will persist until you delete them.
  • Ability to import PerfMon data and look at Trace and PerfMon counter data at the same time
    Most people that use Profiler don’t seem to know about this feature but those who do LOVE it. You can read more at Brad McGehee’s Correlating SQL Server Profiler with Performance Monitor.

    Microsoft reportedly has no plans to provide this functionality.

  • Consistent user experience across SSAS and Database Engine
    Gotta take their word, I don’t use SSAS.
  • It’s easy to train others to use Profiler
    Imagine – if it’s easier to learn Profiler, it’ll be far easier to teach.
  • Traces can be easily replayed
    There are a number of Microsoft tools to replay traces, but none to replay Extended Events.
  • MS Premier Support still asks for traces
    Likely because they also have tools that they want to work across all supported versions, which still includes SQL Server 2008 R2.
  • xe_file_target_read_file is a CPU hog
    This wasn’t listed on Erin’s page but was told to me while I was performing my research.

Whewf! That’s a lot of compelling reasons not to make the switch. So let’s see how we can address each of them using PowerShell. All code listed here can be found at sqlps.io/xecode.

Chrissy & co can’t solve all of them, but they solve a majority.  Click through to see how.

Comments closed

Temp Tables In Redshift

Derik Hammer has some notes on temporary tables in Amazon Redshift:

One difference between regular tables and temporary tables is how they are typically used. Temporary tables are session scoped which means that adding them into a process or report will probably cause them to be created multiple times. Temporary tables might be very similar to regular tables but most regular tables are not re-written into, every time they are queried.

The disk writes involved in populating the temporary table might be more expensive than the reads would be if you were to modify your query to include the logic into one, larger, query. The frequency of the report or process will be a factor into how much of a performance hit you get by using the temporary tables. If you are using temporary tables to make debugging a procedure easier or to enhance readability, make sure you understand the IO cost of performing writes and then reading that data back into a subsequent query.

Read on for more.

Comments closed

Query Store And CU4

Erin Stellato notes that if you ran SQL Server 2017 CU2 with Query Store, you want to upgrade and run a special script:

There is a Query Store fix in the latest Cumulative Updates of SQL Server 2017, and if you are keeping current with the latest cumulative updates, there’s a script you need to run if you’ve installed CU2 at any point.

First, if you are still on CU2, you might want to consider upgrading to CU4 (released February 17, 2018).

Second, if you previously had CU2 installed (and are now on CU3 or CU4), you want to make sure that you’ve run the script included with CU3 and CU4 release notes.

The script removes plans from Query Store that were captured when running CU2.

Read the whole thing and keep those servers patched.

Comments closed