Press "Enter" to skip to content

Curated SQL Posts

Alerting In Azure SQL Database

Arun Sirpal shows how to set up an alert for an Azure SQL Database:

I keep things simple and like to look at certain performance based metrics but before talking about what metrics are available let’s step through an example.

For this post I want to setup an alert for CPU percentage utilised that when it is greater than 50% over the last 5 minutes I would like to know about it. First step is to navigate to your Azure SQL Database.

Click through for a screenshot-driven guided tour.

Comments closed

Indexing Tradeoffs

Jeff Schwartz continues his series on index tuning, including a section on what happens when you have too many indexes on a table:

Larger numbers of indices create exponentially more query plan possibilities. When too many choices exist, the Optimizer will give up partway through and just pick the best plan thus far. As more indices are added the problem worsens and compilation times, i.e., processor times, increase to a point. This can be illustrated best by reviewing an actual customer example. In this case, one table had 144 indices attached to it and several others had between 20 and 130 indices. The queries were quite complex with as many as fifteen joins, many of which were outer joins. Query and index tuning were impossible because query performance was often counterintuitive and sometimes nonsensical. Adding an index that addressed a specific query need often made the query run worse one time and better the next. Note: Cached query plan issues, e.g., parameter sniffing or plan reuse were not problems in this case. The only solution was to tear down ENTIRE indexing structure and rebuild it with SQL Server’s guidance and nine days’ worth of production queries. Table 5 summarizes the results of the index restructuring project. The performance of 98 percent of the queries was comparable to or better than it was when the large numbers of indices were present.

Don’t be that company.

Comments closed

Connect(); Announcements, Including Azure Databricks

James Serra has a wrapup of Microsoft Connect(); announcements around the data platform space:

Microsoft Connect(); is a developer event from Nov 15-17, where plenty of announcements are made.  Here is a summary of the data platform related announcements:

  • Azure Databricks: In preview, this is a fast, easy, and collaborative Apache Spark based analytics platform optimized for Azure. It delivers one-click set up, streamlined workflows, and an interactive workspace all integrated with Azure SQL Data Warehouse, Azure Storage, Azure Cosmos DB, Azure Active Directory, and Power BI.  More info

  • Azure Cosmos DB with Apache Cassandra API: In preview, this enables Cassandra developers to simply use the Cassandra API in Azure Cosmos DB and enjoy the benefits of Azure Cosmos DB with the familiarity of the Cassandra SDKs and tools, with no code changes to their application.  More info.  See all Cosmos DB announcements

  • Microsoft joins the MariaDB Foundation: Microsoft is a platinum sponsor – MariaDB is a community of the MySQL relational database management system and Microsoft will be actively contributing to MariaDB and the MariaDB community.  More info

Click through for more.  And if you want more info on Azure Databricks, Matei Zaharia and Peter Carlin have more information:

So how is Azure Databricks put together? At a high level, the service launches and manages worker nodes in each Azure customer’s subscription, letting customers leverage existing management tools within their account.

Specifically, when a customer launches a cluster via Databricks, a “Databricks appliance” is deployed as an Azure resource in the customer’s subscription.   The customer specifies the types of VMs to use and how many, but Databricks manages all other aspects. In addition to this appliance, a managed resource group is deployed into the customer’s subscription that we populate with a VNet, a security group, and a storage account. These are concepts Azure users are familiar with. Once these services are ready, users can manage the Databricks cluster through the Azure Databricks UI or through features such as autoscaling. All metadata (such as scheduled jobs) is stored in an Azure Database with geo-replication for fault tolerance.

I’ve been a huge fan of the Databricks Community Edition.  We’ll see if there will be a Community Edition version for Azure as well.

Comments closed

Custom SQL Operations Studio Dashboard Widgets

Drew Furgiuele shows how easy it is to create a dashboard widget in SQL Operations Studio:

Before we go on, now’s your chance to name your widget. In my code above, I highlighted some of the changes I made. Also notice that this widget has a path to the query file we created; if this file doesn’t exist (or you can’t read from the path it exists on), it’ll stop working. Just a heads-up.

I named it something meaningful (but hey, do you), and I changed the vertical and horizontal size of the widget to be a little easier to read. Once you’re done, it’s time to add your widget to the dashboard.

I think people are going to like this product once it matures a bit.  This kind of flexibility without having to drop into .NET is great for DBAs for whom C# is a little intimidating.

Comments closed

Service Broker Security

Colleen Morrow is back with a new item in her Service Broker series, this time on securing Service Broker implementations:

There are 2 types of security in Service Broker: dialog and transport. Dialog security establishes a secure, authenticated connection between Service Broker Services or dialog endpoints. Transport security establishes an authenticated network connection between SQL Server instances or Service Broker endpoints. Clear as mud, right? Don’t worry, these are easily mixed up by both novice and experienced Service Broker admins. To illustrate, let’s go back to our taxes scenario. You’ve completed your forms, stamped your envelope and you’re ready to mail it in. You drop it in your nearest mailbox and what happens next? A postal worker will pick it up, it gets loaded into a truck and shipped between various sorting facilities (as you might have noticed I have no clue how the USPS works) until it is finally delivered to the IRS via yet another postal worker. Now, those postal workers all have the authority to transport your tax return from point to point. However, they do not have the authority to open up and read your return. That’s what transport security is. The IRS agent on the other end, though, he does have the authority to read your return. That’s dialog security.

It’s also worth noting that transport security is only needed in a distributed environment. Just like if the IRS agent lived with you, you wouldn’t need to go through the USPS. But that’s just weird.

This wraps up Colleen’s Service Broker series.  If you do find yourself interested in Service Broker, this is a great way to get your feet wet.

Comments closed

Building Dynamic Row Headers With ML Services

Dave Mason tries to get around his RESULT SETS limitation when using SQL Server Machine Learning Services:

The columns in the data frame clearly have names, but SQL Server isn’t using them. The data frame columns have types in R too (more on this in a moment). Now that makes me wonder about the data types for the data returned by SQL. How is that determined? If SQL isn’t using the column names, can I assume it isn’t making use of the R column types either?

For a point of reference, let’s run some more R code to show the column names and types. As before, the rvest package is used to scrape a web page, with each HTML <table> found becoming a data frame in the “tables” list (line 3). A data frame of table metadata is created by calling data.frame(). The first parameter is a vector of column names (line 4), the second parameter is a vector of column classes (line 5), and the third parameter causes the row “names” to be incrementing digits (line 6).

This is a work in progress as Dave continues his series.

Comments closed

Getting Started With Zeppelin

Sangeeta Gulia shows us how to get started building notebooks with Apache Zeppelin on top of Spark:

There are 3 interpreter modes available in Zeppelin.

1) Shared Mode

In Shared mode, a SparkContext and a Scala REPL is being shared among all interpreters in the group. So every Note will be sharing single SparkContext and single Scala REPL. In this mode, if NoteA defines variable ‘a’ then NoteB not only able to read variable ‘a’ but also able to override the variable.

2) Scoped Mode

In Scoped mode, each Note has its own Scala REPL. So variable defined in a Note can not be read or overridden in another Note. However, still single SparkContext serves all the Interpreter Groups. And all the jobs are submitted to this SparkContext and fair scheduler schedules the job. This could be useful when user does not want to share Scala session, but want to keep single Spark application and leverage its fair scheduler.

3) Isolated Mode

In Isolated mode, each Note has its own SparkContext and Scala REPL.

The default mode of %spark interpreter is ‘Globally Shared’.

This is mostly a step-by-step on installing Zeppelin, but does go into some detail on how Zeppelin works.

Comments closed

Everyone’s Data Is Dirty

Chirag Shivalker hits the highlights on dirty data:

It might sound a bit abrupt, but clean data is a myth. If your data is dirty, so is everyone else’s. Enterprises are more than dependent on data these days, and it is going to stay the same in coming years. They need to collect data in order to analyze it, which necessarily will not be 100% clean, pristine, or perfect in nature.

Nearly all companies face the challenge of dirty data in the form of a lot of duplicates, incorrect fields, and missing values. This happens due to omnichannel data influx, followed by hundreds, if not thousands, of employees wrestling and torturing that data to derive professional outcomes and insights. Don’t forget that even the best of the data has that tendency to decay in few weeks.

The saying goes that any analytics project is about 80% data cleansing and feature extraction.  I’d say that number’s probably closer to 90-95%, and dirty data is a big part of that.

Comments closed

Query Store Plan Forcing: You Can’t Always Get What You Want

Kendra Little shows an example where trying to force a Query Store plan results in an oddity:

This is not considered a “failure”

When I check the Query Store DMVs, force_failure_count is 0. The last_force_failure_reason_desc is NONE.

Query Store didn’t fail to apply the narrow plan. Instead, it’s just deciding not to give it to me, now that I’ve forced that plan.

Seems kinda like an adolescent, doesn’t it?

The answer remains a bit of a mystery, but read on to see how Kendra troubleshoots this.

Comments closed

What To Do With A Database In Source Control

Ed Elliott with Database Source Control 102:

This post is for a specific type of person if you are:

  • New to source control
  • Are getting started on your path to the continuous delivery nirvana
  • Have been able to get your database into some sort of source control system
  • Have more than one person checking in code to the database source
  • You are unsure what yo do next

Then this post is for you!

This is a nice post with some next-steps for when you have a database in source control but aren’t quite sure what to do next.

Comments closed