Curated SQL – Page 1280 – A Fine Slice Of SQL Server

In your database you keep track of services your company supports in a table called CompanyServices, and each service normally reports about once a minute that it’s online in a table called EventLog. The following code creates these tables and populates them with small sets of sample data:

[…]

The special islands task is to identify the availability periods (serviced, starttime, endtime). One catch is that there’s no assurance that a service will report that it’s online exactly every minute; you’re supposed to tolerate an interval of up to, say, 66 seconds from the previous log entry and still consider it part of the same availability period (island). Beyond 66 seconds, the new log entry starts a new availability period. So, for the input sample data above, your solution is supposed to return the following result set (not necessarily in this order):

It’s a neat twist on an old problem.

Comments closed

Backing Up Query Store Data

Published 2018-09-18 by Kevin Feasel

Grant Fritchey explains that Query Store data gets backed up like regular data, but with a caveat:

The core of the answer is very simple. Query Store, like any other data written to a database, whether a system table or a user table, is a logged operation. So, when you backup the database, you’re backing up Query Store data. When you backup the logs, you’re also backing up Query Store data. A point in time will include all the data written to the Query Store at that point.

However, that’s the kicker. At what point was the Query Store information written to disk?

Read on to learn when, and what you can do about it if you prefer otherwise.

Comments closed

Switching To Managed Disks In Azure

Published 2018-09-18 by Kevin Feasel

Chris Seferlis walks us through an easy method to convert unmanaged disks to managed disks in Azure:

First off, why would you want a managed disk over an unmanaged one?

Greater scalability due to much higher IOPs and storage limits. There’s no longer the need to add additional storage accounts when you’re adding disk space, which has been a challenge for users that were using large virtual machines and required large storage space.
Better availability and reliability which ensures that disks are now isolated from each other in different storage scale units.
Managed disks offer an over 99.99% uptime, plus are always stored with 3 replicas of the data.
More granular access control by employing role-based access control (RBAC) security. You have granular capability to assign access to various people in your organization.

Keep reading to learn how to switch.

Comments closed

Flint: Time Series With Spark

Published 2018-09-17 by Kevin Feasel

Li Jin and Kevin Rasmussen cover the concepts of Flint, a time-series library built on Apache Spark:

Time series analysis has two components: time series manipulation and time series modeling.

Time series manipulation is the process of manipulating and transforming data into features for training a model. Time series manipulation is used for tasks like data cleaning and feature engineering. Typical functions in time series manipulation include:

Joining: joining two time-series datasets, usually by the time

Windowing: feature transformation based on a time window

Resampling: changing the frequency of the data

Filling in missing values or removing NA rows.

Time series modeling is the process of identifying patterns in time-series data and training models for prediction. It is a complex topic; it includes specific techniques such as ARIMA and autocorrelation, as well as all manner of general machine learning techniques (e.g., linear regression) applied to time series data.

Flint focuses on time series manipulation. In this blog post, we demonstrate Flint functionalities in time series manipulation and how it works with other libraries, e.g., Spark ML, for a simple time series modeling task.

Basho went all-in on a time-series product for Riak and it did not work out well for them. I’ll be curious to see if Flint has more staying power.

Comments closed

Against Multi-Cloud Models

Published 2018-09-17 by Kevin Feasel

Tyler Treat argues against companies looking at multi-cloud models:

A multi-cloud strategy looks great on paper, but it creates unneeded constraints and results in a wild-goose chase. For most, it ends up being a distraction, creating more problems than it solves and costing more money than it’s worth. I’m going to caveat that claim in just a bit because it’s a bold blanket statement, but bear with me. For now, just know that when I say “multi-cloud,” I’m referring to the idea of running the same services across vendors or designing applications in a way that allows them to move between providers effortlessly. I’m not speaking to the notion of leveraging the best parts of each cloud provider or using higher-level, value-added services across vendors.

Multi-cloud rears its head for a number of reasons, but they can largely be grouped into the following points: disaster recovery (DR), vendor lock-in, and pricing. I’m going to speak to each of these and then discuss where multi-cloud actually does come into play.

It’s an interesting article. I think that Tyler is right, but that companies should be capable of switching between cloud providers or even creating hybrid approaches should the need arise.

Comments closed

An Update To ssisUnit

Published 2018-09-17 by Kevin Feasel

Bartosz Ratajczyk has added some functionality to ssisUnit:

Second – you can get and set the properties of the project and its elements. Like – overwriting project connection managers (I designed it with this particular need on my mind). You can now set the connection string the different server (or database) – in the PropertyPath of the PropertyCommand use \Project\ConnectionManagers, write the name of the connection manager with the extension, and use one of the Properties. You can do it during the Test setup (or all tests setup), but not during the test suite setup, as ssisUnit is not aware of the project until it loads it into the memory.

Good on Bartosz for resurrecting a stable but moribund project and adding some enhancements.

Comments closed

Ad Hoc Functions In T-SQL

Published 2018-09-17 by Kevin Feasel

Riley Major shows a couple techniques for including ad hoc functions in T-SQL, namely Common Table Expressions and the APPLY operator:

It’s helpful to think of each APPLY as a pipe operation, taking the values from the previous derived table and passing them into the next to be manipulated. Programming T-SQL in this manner (loosely) approximates modern functional programming techniques.

It keeps each step of the logic smaller, so that it’s easier to understand. And you can expose the intermediary columns to help with debugging.

This is one of my favorite uses of the APPLY operator, as it lets you think through a problem step-by-step while still allowing the optimizer to create a set-based solution for you.

Comments closed

Limiting Azure Administrator Data Access

Published 2018-09-17 by Kevin Feasel

Melissa Coates gives us a look at one aspect of Azure security:

Recently a customer expressed concern that an owner of an Azure resource group automatically gains access to the data within the services contained in the resource group. In this case, the customer was specifically referring to data in Azure Data Lake Storage Gen 1 but this concept applies to Azure Storage and other data-oriented services in Azure as well. The customer’s comment prompted me to look into available alternatives. This is by no means a detailed security post…rather, I’m trying to share a few nuggets of what I learned.

Worth the read. Much of the latest round of regulatory push seems to be in the realm of limiting high-access insiders (like DBAs) from accessing sensitive information, and this post aligns with that.

Comments closed

Forcing MAXDOP In Azure SQL DB

Published 2018-09-17 by Kevin Feasel

Arun Sirpal shows us that you can change MAXDOP in Azure SQL Database:

In this quick post I will show you my parallel plan and how I use MAXDOP = 1 to suppress parallel plan generation so the operation will be executed serially. (Disclaimer – I am not saying this is the right thing to do, merely using it as an example of tweaking this setting, to be honest in 10 years I have changed MAXDOP = 1 twice). I executed a query in Azure. You can see the classic operators such as gather streams and repartition streams.

This change will affect all queries hitting that database, so it’s a coarser tool than changing cost threshold for parallelism (not allowed) or setting MAXDOP per-query (allowed).

Comments closed

Using Table-Valued Parameters With sp_executesql

Published 2018-09-17 by Kevin Feasel

Kenneth Fisher shows how to include table-valued parameters in a dynamic SQL query:

Recently I did a presentation on dynamic SQL. In the presentation I pointed out the similarity of using sp_executesql to creating a stored procedure to do the same task. After the session I was asked: If that’s the case, can I pass a TVP (table valued parameter) into sp_executesql?

Awesome question! Let’s give it a shot.

Read on to see how to do this.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Curated SQL Posts

First off, why would you want a managed disk over an unmanaged one?