Press "Enter" to skip to content

Curated SQL Posts

Query Store And Multiple Plans Per Query

Kendra Little follows Betteridge’s Law:

Can I Force Multiple Plans for a Query in Query Store?

Nope.

At least, not right now.

I started thinking about this when I noticed that the sys.sp_query_store_unforce_plan requires you to specify both a @query_id and a @plan_id.

If there’s only ever one plan that can be forced for a query, why would I need to specify the @plan_id?

I’ve got no insider knowledge on this, I just started thinking about it.

Read on for Kendra’s thoughts.  Maybe we will get something like multiple plans for a single query in the future, though figuring out which forced plan would relate to which combination of parameters would get complex pretty fast.

Comments closed

Calculating The End Of The Month

Bob Pusateri gives us a few techniques for calculating the last day of a particular month:

Months are funny. Unlike other parts of a date, they vary in length:

  • The last second of a minute is always 59.
  • The last minute of a hour is always 59.
  • The last hour of a day is always 23.

But the last day of a month? Well that depends on what month it is. And the year matters too because a leap year means February gets an extra day.

Click through for several techniques, including the knuckle technique for advanced practitioners.  But what if I need to calculate the end of a lunar month?

Comments closed

Migrating Database Files

Jeff Mlakar gives us three methods for migrating database files from one location to another:

The database will be unavailable during this operation so we need to notify our end users. Consider the ramifications if an application is using the database – we might want to stop application services or take some other custom action during the move.

Plan ahead before starting the job. Know what you are going to do before doing it. If you can test your method against a lab or development database that will help too.

Sound advice and technique.  Click through to see those three methods.

Comments closed

Switching Partitions And Table Structure

Andrew Pruski demonstrates a gotcha when switching partitions between tables:

When working with partitioning the SWITCH operation has to be my favourite. The ability to move a large amount of data from one table to another as a META DATA ONLY operation is absolutely fantastic.

What’s also cool is that we can switch data into a non-partitioned table. Makes life a bit easier not having to manage two sets of partitions!

However, there is a bit of a gotcha when doing this. Let’s run through a quick demo.

Read on for more.

Comments closed

Microsoft ML Server 9.3 Released

Nagesh Pabbisetty announces Microsoft Machine Learning Server 9.3:

In ML Server 9.3, we have added support for SQL compute context in ML Server and in R Client running on Linux platforms, so data scientists who work on Linux workstations can directly use in-database analytics with SQL Server compute context. Additionally, the SQLRUtils package can now be used to package the R scripts into T-SQL stored procedures and run them from R environment on Linux clients.

An interesting scenario enabled by the addition of SQL Server Compute context in ML Server running on Linux is that organizations can now provide a browser-based interface for accessing SQL Server compute context with R Studio Server and ML Server running on a Linux machine connecting to SQL Server.

Since introducing revoscalepy library in the last release of ML Server and SQL Server 2017, we have shipped several additions and improvements in the Python APIs as part of CU releases of SQL Server 2017. We have added APIs like rx_create_col_info, rx_get_var_info etc. that make it easier to get column information, esp. with large number of columns. We added rx_serialize_model for easy model serialization. We have also improved performance when working with string data in different scenarios.

This also gets you up to R 3.4.3. H/T David Smith

Comments closed

Looping In Python And R

Dmitry Kisler has a quick comparison of looping speed in Python and R:

This post is about R versus Python in terms of the time they require to loop and generate pseudo-random numbers. To accomplish the task, the following steps were performed in Python and R (1) loop 100k times (ii is the loop index) (2) generate a random integer number out of the array of integers from 1 to the current loop index ii (ii+1 for Python) (3) output elapsed time at the probe loop steps: ii (ii+1 for Python) in [10, 100, 1000, 5000, 10000, 25000, 50000, 75000, 100000]

The findings were mostly unsurprising to me, though there was one unexpected twist.

Comments closed

Starting A Data Science Project: Business Understanding

I continue my data science project series:

As you listen to these types of questions, your goal is to nail down a specific problem with a specific answer.  You want to narrow down the scope to something that your team can achieve, ideally something with a built-in measure for success.  For example, here are a few specific problems that we could go solve:

  • Find a model which predicts quarterly sales to within 5% no later than 30 days into the quarter.
  • Given a title and description for a product, tell me a listing category which Amazon will, with at least 90% confidence, consider valid for this product.
  • Determine the top three factors which most affect the number of years the first owner holds onto our mid-range sedan.

With a specific problem in mind, you can look for relevant data.  Of course, you’ll probably need to modify the scope of this problem over time as you gather new information, but this gives you a starting point for success.  Also, don’t expect something as clear-cut as the above early on; instead, people will hem and haw, not quite sure what they really want.  You can take a fuzzy goal into data acquisition, but as you acquire data, you will want to work with the champion to focus down to a targeted and valuable problem.

Read on for several references to big sacks of cash.  After becoming a manager, I’ve become much more attuned to the idea of receiving big sacks of cash.

Comments closed

Documenting Reporting Services Installations

Craig Porteous explains the types of things you should document in SQL Server Reporting Services:

If you’re using Kerberos authentication with Reporting Services you’ll at least have to update the rsReportServer.config file with the correct authentication mode. Beyond that you have SPNs on your SSRS domain Service account to consider. This may be managed by your domain administrators but it’s still a good plan to back these up or check that someone is backing these up. It is too easy to clear SPNs or constrained delegation settings accidentally.

Having a backup will also give you a good reference point for building new environments. You can of course easily see SPNs from the command line using this command:

SETSPN –L DOMAIN\SERVICEACCOUNT

What this boils down to is, if you lost your Reporting Services installation today, could you get things back to the way they were?

Comments closed

Concept Help In Powershell

Fred Weinmann gives us a reading list of Powershell built-in documentation:

Newcomers to PowerShell are often faced with a lot of things to learn, a lot of disorienting features and little in the way of obvious guidance. While there are quite a few good books out there by now, finding the right one is not always easy.
That said, PowerShell comes with a lot of built in documentation:
Get-Help about_*
Will list a whole lot of articles explaining many a concept, but finding just what you need – especially before you know all the right words to search for – is a lot harder.

There’s a lot to this and more seasoned developers can skip a lot of it, but if you’re just learning about Powershell, I think it’s a nice ordering of where to start learning.

Comments closed

Gapless Sequential Values

Gail Shaw shows how to build gapless sequential values:

To be clear, I don’t think this is a good idea. The identity column works well if a sequential series of numbers are needed. If the number sequence needs to that spans tables, then the sequence object is a good replacement.

But, there will always be some requirements that insist on gap-less sequences, or insist on not using identity (probably for ‘compatibility’ reasons), so let’s see how to do it properly.

For true surrogate keys, this is a bad idea because it is both unnecessary and overkill.  Where this becomes useful is cases where an auditor is expecting a proper sequence without any gaps, such as invoice numbers, check numbers, or purchase order numbers.  As Gail mentions, those are uncommon scenarios.

Comments closed