Press "Enter" to skip to content

Author: Kevin Feasel

Scaling Azure Data Warehouse

Vincent-Philippe Lauzon looks at how Azure Data Warehouse scales:

Which data gets stored in which database?

As long as you are doing simple select on one table and that your data is distributed evenly, you shouldn’t care, right?  The query will flow to the compute nodes, they will perform the query on each database and the result will be merged together by the control node.

But once you start joining data from multiple tables, ADW will have to swing data around from one database to another in order to join the data.  This is called Data Movement.  It is impossible to avoid in general but you should strive to minimize it to obtain better performance.

This is a look primarily at the underlying mechanics rather than testing a particular load.  Check it out.

Comments closed

Azure Data Lake ACLs

Saveen Reddy introduces file and folder level Access Control Lists for Azure Data Lake storage:

We’ve emphasized that Azure Data Lake Store is compatible with WebHDFS. Now that ACLs are fully available, it’s important to understand the ACL model in WebHDFS/HDFS because they are POSIX-style ACLs and not Windows-style ACLs.  Before we five deep into the details on the ACL model, here are key points to remember.

  • POSIX-STYLE ACLs DO NOT ALLOW INHERITANCE. For those of you familiar with POSIX ACLs, this is not a surprise. For those coming from a Windows background this is very important to keep in mind. For example, if Alice can read files in folder /foo, it does not mean that she can rad files in /foo/bar. She must be granted explicit permission to /foo/bar. The POSIX ACL model is different in some other interesting ways, but this lack of inheritance is the most important thing to keep in mind.

  • ADDING A NEW USER TO DATA LAKE ANALYTICS REQUIRES A FEW NEW STEPS. Fortunately, a portal wizard automates the most difficult steps for you.

This is an interesting development.

Comments closed

Reporting Services Cmdlets

Paul Turley discusses work within the community to get Reporting Services cmdlets:

We (along with Aaron Nelson, Data Platform MVP & Chrissy LeMaire, PowerShell MVP) are working with the SQL Server product teams to recommend the first set of CmdLets that we would like to see added to the PowerShell libraries.  Please help us by posting comments with your suggestions.  What are the most important SSRS-related tasks that you would like to automate using PS?  Give us your top five or so.

I’m glad to see Reporting Services get some Powershell love.

Comments closed

Table-Valued Parameter Performance

Dan Guzman shows that setting TVP column sizes correctly can have a major performance impact:

LOB values are especially problematic when a trace captures the RPC completed event of a TVP query. Tracing uses memory from the OBJECTSTORE_LBSS memory pool to build trace records that contain TVP LOB values. From my observations of the sys.dm_os_memory_clerks DMV, each LOB cell of a TVP requires about 8K during tracing regardless of the actual value length. This memory adds up very quickly when many rows and lob columns are passed via a TVP with a trace running. For example, the 10,000 row TVP with 10 LOB columns used in the earlier test required over 800MB memory for a single trace record. Consider that a large number of TVP LOB cells and/or concurrent TVP queries can cause queries to fail with insufficient memory errors. In extreme cases, the entire instance can become unstable and even crash under due to tracing of TVP queries.

This is a must-read if you use TVPs in your environment.

Comments closed

Email Delay

Dave Mason looks at the Delay Between Responses option in SQL Agent alerts:

Fortunately, I had set up a SQL Agent Alert for errors with Severity Level 17, which emailed me and several coworkers to alert us to the problem. But this was unfortunate too. Every one of those alert occurrences sent an email. Sure, it’s nice to know when there’s a problem, but a thousand or more emails is most certainly overkill. After addressing the transaction log issue, the alert emails kept coming. This query told me there were still a few thousand unsent emails:

Getting a message that something is down is important.  Getting several thousand messages is counterproductive.

Comments closed

Scatter Charts

Reza Rad shows how to use a scatter chart in Power BI:

Scatter chart is a built-in chart in Power BI that you can show up to three measure with a categorization in it. Three measures can be visualized in position of X axis, Y axis, and size of bubbles for scatter chart. You can also set up a date field in play axis, and then scatter chart will animate how measure values are compared to each other in each point of a time. Let’s start building something simple with this chart and see how it is working in action. At the end of example you will see a summary chart as below;

This is primarily for viewing changes in groups of data over time.  You don’t want too many data points on the map or it gets too confusing.

Comments closed

Kafka On AWS

Alex Loddengaard explains a few things you should think about when deploying Apache Kafka to AWS:

Kafka has built-in fault tolerance by replicating partitions across a configurable number of brokers. However, when a broker fails and a new replacement broker is added, the replacement broker fetches all data the original broker previously stored from other brokers in the cluster that host the other replicas. Depending on your application, this could involve copying tens of gigabytes or terabytes of data. Fetching this data takes time and increases network traffic, which could impact the performance of the Kafka cluster for the period the data transfer is happening.

EBS volumes are persisted when an instance fails or is terminated. When an EC2 instance running a Kafka broker fails or is terminated, the broker’s on-disk partition replicas remain intact and can be mounted by a new EC2 instance. By using EBS, most of the replica data for the replacement broker will already be in the EBS volume and hence won’t need to be transferred over the network. Only data produced since the original broker failed or was terminated will need to be fetched across the network.

There are some good insights here; read the whole thing if you’re thinking about running Kafka.

Comments closed

Automating Patching?

Kendra Little takes on the question of whether patching should be automated on SQL Server instances:

I used to spend a lot of time doing patching, and I had plenty of times when:

  • Servers wouldn’t come back up after a reboot. Someone had to go into the iLo/Rib card and give them a firm shove

  • Shutdown took forever. SQL Server can be super slow to shut down! I understand this better after reading a recent post on the “SQL Server According to Bob” blog. Bob Dorr explains that when SQL Server shuts down, it waits for all administrator (sa) level commands to complete. So, if you’ve got any servers where jobs or applications are running as sa, well….  hope they finish up fast.

  • Patching accidentally interrupted something important. Some process was running from an app server, etc, that failed because patching rebooted the server, and it fired off alarms that had to be cleaned up.

  • Something failed during startup after reboot. A service started and failed, or a database wasn’t online.  (Figuring out “was that database offline before we started?” was the first step. Ugh.)

  • Miscommunication caused a problem on a cluster.  Whoops, you were working on node2 while I was working on node1? BAD TIMES.

This is a really good post.  Kendra’s done a lot more patching than I have, and she’s definitely though about it in more detail.  Me, I’m waiting for the day—which is very close for some companies—in which you don’t patch servers.  Instead, you spin up and down virtual apps and virtual servers which are fully patched.  It’s a lot harder to do with databases compared to app servers, but if you separate data from compute, your compute centers are interchangeable.  When a new OS patch comes out, you spin up new machines which have this patch installed, they take over for the old ones, and after a safe period, you can delete the old versions forever.  If there’s a failure, shut down the new version, spin back up the old version, and you’re back alive.

Comments closed

Traversing DAX Hierarchies

Meagan Longoria shows how to implement hierarchical slicing and filtering using DAX:

My friend and coworker Melissa Coates (aka @sqlchick) messaged me the other day to see if I could help with a DAX formula. She had a Power BI dashboard in which she needed a very particular interaction to occur. She had slicers for geographic attributes such as Region and Territory, in addition to a chart that showed the percent of the regional total that each product type represented.  The product type was in the fact/data table. Region and territory were in a dimension/lookup table and formed a hierarchy where a region was made up of one or more territories and each territory had only one region.

It’s rare to hear me say “MDX was easier” but in this case, MDX was easier…

Comments closed

SQL Saturday Attendee Distances

I have a long, long post on figuring out how far SQL Saturday attendees tend to drive:

Before I begin, allow me to perform the data science Airing of Grievances.  This is an important part of data analysis which most people gloss over, instead jumping right into the “clean up the dirty data” phase.  But no, let’s revel in its filth for just a few moments.

Despite my protestations and complaints, I think there are some reasonable conclusions.  If you need to look like you’re working for a couple of hours (or at least want to play around a bit with SQL and R), this is the post for you.

Comments closed