Press "Enter" to skip to content

Curated SQL Posts

Forcing Query Store Plans

Grant Fritchey wonders, if you force a plan using Query Store but the plan ages out of cache, do you still use the forced plan?

To start with, a small stored procedure that I use all the time for bad parameter sniffing demos that reliably gets different plans with different values due to statistics skew:

1
2
3
4
5
6
7
8
9
10
11
12
CREATE PROC dbo.spAddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;

If this procedure is called for a value of ‘London’ it gets a plan with a Merge Join. For most other value it gets a plan with a Loops Join. Here’s an example of the ‘London’ plan:

It’s a good question with a good answer.

Comments closed

Threat Modeling

Michael Howard discusses threat modeling in Azure:

Many conversations I have with customers go like this:

Customer: “We cannot deploy on Azure until we know that appropriate defenses are in place.”

Me: “I agree 100%, so let’s build a threat model for the proposed design and see what you need to do, and what Microsoft provides.”

A couple of days pass as we build and iterate on the threat model.

Now here’s when the customer has an “a-ha” moment. At the end of the process we have a list of defenses for each part of the architecture and we all agree that the defenses are correct and appropriate.

It’s at that point the customer realizes that they can deploy a cloud-based solution securely.

My tongue-in-cheek response is, of course a customer can deploy a cloud-based solution securely if they have Michael Howard walking them through it.  Michael does include some links on Azure security configuration and threat modeling resources so check those out.

Comments closed

Default Query Store Settings

Erin Stellato talks about default query store settings:

For each database where I enable Query Store, I’d consider the workload and then look at the settings.  I tend to think that the default value of 100MB for MAX_STORAGE_SIZE_MB is really low, and I would be inclined to bump up STALE_QUERY_THRESHOLD_DAYS from 30 to something a bit higher.  I’d also probably drop  DATA_FLUSH_INTERVAL_SECONDS to something lower than 900 seconds (15 minutes) if my storage can support it.  This setting determines how often Query Store data is flushed to disk.  If it’s every 15 minutes, then I could potentially lose 15 minutes of Query Store data if my server happened to crash before it could be written to disk.  I’d also think about changing INTERVAL_LENGTH_MINUTES to a value smaller than 60 if I wanted to aggregate my query data over a smaller amount of time.  Sometimes interesting events happen within a 60 minute time frame, and they get can lost when data is aggregated across that window.  However, aggregating more frequently means I’m adding processing overhead to the system – there’s a trade-off there to figure out.

In our environment at least, 100 MB of query store data would last, oh, a couple hours?  Definitely tweak your settings and keep an eye on them early on.

Comments closed

DBA Scripts Update

Rob Sewell has updated his DBA-Database project:

By making use of the dbo.InstanceList in my DBA database I am able to target instances, by SQL Version, OS Version, Environment, Data Centre, System, Client or any other variable I choose. An agent job that runs every night will automatically pick up the instances and the scripts that are marked as needing installing. This is great when people release updates to the above scripts allowing you to target the development environment and test before they get put onto live.

I talked to a lot of people in Hannover and they all suggested that I placed the scripts onto GitHub and after some how-to instructions from a few people (Thank you Luke) I spent the weekend updating and cleaning up the code and you can now find it on GitHub here

Check out his solution, especially if you do not already have an administrative database on your instances.

Comments closed

Reporting Services Accessibility

Andrew Notarian notes that Reporting Services reports still aren’t Section 508 compliant, even in 2016:

Last weekend I spent some time creating very basic tabular reports in SSRS 2016 to see how they handle accessibility, WCAG and Section 508 issues. It looks like things are largely unchanged since the first SQL 2008 Service Pack which introduced the AccessibleTablix property to the various render options (See this prior post). Now you will want to add the AccessibleTablix item to your HTML4, HTML5 and MHTML renderers. You will get tags linking the detail cells to the header but you will still be lacking a few of the items needed to pass a WCAG audit with flying colors (e.g. TH tags in the header row).

That’s a shame.

Comments closed

Exploring Spark

Adnan Masood has photos of slides from a Spark-related meetup:

Apache Spark is a general purpose cluster computing platform which extends map-reduce to support multiple computation types including but not limited to stream processing and interactive queries. Last week IBM’s Moktar Kandil presented at the Tampa Hadoop and Tampa Data Science Group Joint meetup on the topic of exploring Apache Spark.

Apache Spark for Azure HD-Insight

Following are some of the slides discussed in the meetup. To play with the ALS Recommendation engine notebook, please register at www.datascientistworkbench.com which is a free notebook for Apache Spark platform for educational purposes.

Check out the links.

Comments closed

Mapping German Postal Codes With R

Achim Rumberger shows how to map German postal codes using R:

Just at this time Ari published his webinar about getting shape files into R. Which also includes a introduction to shape files to get you going, if you are new to it, as I am. I remembered Ari from his mail course introducing his great R-package (choroplethr). By the way this is a terrible name, being a biologist by heart, I always type “chloroplethr”, as in “chlorophyll”, and this is not found by the R package manager. [Editor’s note: I agree!]

Next question, where do I get the shapefiles, describing Germany? A major search engine was of great help here. http://www.suche-postleitzahl.org/downloads?download=zuordnung_plz_ort.csv . Germany has some 8700 zip code areas, so expect some time for rendering the file, if you do on your computer. Right on this side one can also find a dataset which might act as a useful warm up practice to display statistical data in a geographical context. Other sources are https://datahub.io/de/dataset/postal-codes-de

This is really cool.

Comments closed

Early Metrics On Warehouse Performance

Sunil Agarwal shows some results from a sample workload indicating that SQL Server 2016 has improved two customers’ performance:

As part of SQL Server 2016 technology adoption program, during development, we work with many customers validating their production-like workload in a test environment and opportunistically take some of these workloads to production running on production-ready preview build.

In one such engagement, we worked with a customer in health industry who was running analytics workload on SYBASE IQ 15.4 system. Challenged by exponential data growth and the requirement for running analytics queries even faster for insights, the customer wanted to compare solutions from multiple vendors to see which analytical database could deliver the performance and features they need over the next 3-5 years. After extensive proof-of-concept projects, they concluded SQL Server 2016’s clustered columnstore delivered the best performance. The performance proof-of-concept tested the current database against Sybase IQ 16, MS SQL 2016, Oracle 12c, and SAP Hana using the central tables from the real-life data model filled with synthetic data in a cloud environment. MS SQL Server 2016 came out the clear winner. SAP Hana was second in performance, but also required much higher memory and displayed significant query performance outliers. Other contenders were out-performed by a factor of 2 or more.

Standard disclaimers apply:  your mileage may vary; we don’t get raw data; “all other things” are not necessarily equal.

Comments closed

DAX Time Zones With Power BI

Reza Rad shows a few ways to deal with date/time issues related to Power BI being in the cloud:

Power BI is a cloud service, and that means Power BI files are hosted somewhere. Some DAX functions such as Date/Time functions work on system date/time on the server their file is hosted on. So If you use DAX functions such as TODAY() or NOW() you will not get your local date/time, You will fetch server’s date/time. In this blog post I’ll explain methods of solving this issue, so you could use Power BI to resolve your specific time zone’s date and time. If you want to learn more about Power BI read Power BI online book; Power BI from Rookie to Rock Star.

This is your daily reminder that “the cloud” is just somebody else’s machine.

Comments closed

Kafka And MapR Streams

Ellen Friedman compares and contrasts Apache Kafka with MapR streams:

What’s the difference in MapR Streams and Kafka Streams?

This one’s easy: Different technologies for different purposes. There’s a difference between messagingtechnologies (Apache Kafka, MapR Streams) versus tools for processing streaming data (such as Apache Flink, Apache Spark Streaming, Apache Apex). Kafka Streams is a soon-to-be-released processing tool for simple transformations of streaming data. The more useful comparison is between its processing capabilities and those of more full-service stream processing technologies such as Spark Streaming or Flink.

Despite the similarity in names, Kafka Streams aims at a different purpose than MapR Streams. The latter was released in January 2016. MapR Streams is a stream messaging system that is integrated into the MapR Converged Platform. Using the Apache Kafka 0.9 API, MapR Streams provides a way to deliver messages from a range of data producer types (for instance IoT sensors, machine logs, clickstream data) to consumers that include but are not limited to real-time or near real-time processing applications.

This also includes an interesting discussion of how the same term, “broker,” can be used in two different products in the same general product space and mean two distinct things.

Comments closed