Forwarded Records

Tara Kizer looks at forwarded records on heaps:

Forwarded records are rows in a heap that have been moved from the original page to a new page, leaving behind a forwarding record pointer on the original page to point at the new page. This occurs when you update a column that increases the size of the column and can no longer fit on the page. UPDATEs can cause forwarded records if the updated data does not fit on the page. Forwarding pointers are used to keep track of where the data is.

The comments are also worth reading.  Except for the terrible puns.

Securing The Data Plane

Michael Schiebel gives an overview of security architecture inside a data lake:

Existing platform based Hadoop architectures make several implicit assumptions on how users interact with the platform such as developmental research versus production applications.  While this was perfectly good in a research mode, as we move to a modern data application architecture we need to bring back modern application concepts to the Hadoop ecosystem.  For example, existing Hadoop architectures tightly couple the user interface with the source of data.  This is done for good reasons that apply in a data discovery research context, but cause significant issues in developing and maintaining a production application.  We see this in some of the popular user interfaces such as Kibana, Banana, Grafana, etc.  Each user interface is directly tied to a specific type of data lake and imposes schema choices on that data.

Read the whole thing.  Also, “Securing the data plane” sounds like a terrible ’90s action film.

Multidimensional Processing Work

SQL Sasquatch looks at Perfmon counters to get an idea of what Analysis Services is doing when it processes a cube:

OK.  CPU utilization tracks pretty will with rows read/converted outside of the mystery range following 9:30.
But now 3 additional interesting timeperiods are evident, in the red boxes.  CPU utilization is much higher in the red boxes than would be predicted by rows read/converted.
What’s the CPU doing at about 8:05, 8:45, 9:30? (Not forgetting the original question about 9:40 to about 10:15.)
Maybe there’s other kinds of work I have included in these graphs?

It’s interesting to see how various metrics tie together (or, as the case may be, don’t).

Syncing LDAP With Ranger

Colm O hEigeartaigh shows how to load users and groups into Apache Ranger from LDAP:

For the purposes of this tutorial, we will use OpenDS as the LDAP server. It contains a domain called “dc=example,dc=com”, and 5 users (alice/bob/dave/oscar/victor) and 2 groups (employee/manager). Victor, Oscar and Bob are employees, Alice and Dave are managers. Here is a screenshot using Apache Directory Studio:

Colm’s scenario uses OpenDS, but you can integrate with Active Directory as well.

Scaling Azure Data Warehouse

Kevin Feasel



Vincent-Philippe Lauzon looks at how Azure Data Warehouse scales:

Which data gets stored in which database?

As long as you are doing simple select on one table and that your data is distributed evenly, you shouldn’t care, right?  The query will flow to the compute nodes, they will perform the query on each database and the result will be merged together by the control node.

But once you start joining data from multiple tables, ADW will have to swing data around from one database to another in order to join the data.  This is called Data Movement.  It is impossible to avoid in general but you should strive to minimize it to obtain better performance.

This is a look primarily at the underlying mechanics rather than testing a particular load.  Check it out.

Azure Data Lake ACLs

Saveen Reddy introduces file and folder level Access Control Lists for Azure Data Lake storage:

We’ve emphasized that Azure Data Lake Store is compatible with WebHDFS. Now that ACLs are fully available, it’s important to understand the ACL model in WebHDFS/HDFS because they are POSIX-style ACLs and not Windows-style ACLs.  Before we five deep into the details on the ACL model, here are key points to remember.

  • POSIX-STYLE ACLs DO NOT ALLOW INHERITANCE. For those of you familiar with POSIX ACLs, this is not a surprise. For those coming from a Windows background this is very important to keep in mind. For example, if Alice can read files in folder /foo, it does not mean that she can rad files in /foo/bar. She must be granted explicit permission to /foo/bar. The POSIX ACL model is different in some other interesting ways, but this lack of inheritance is the most important thing to keep in mind.

  • ADDING A NEW USER TO DATA LAKE ANALYTICS REQUIRES A FEW NEW STEPS. Fortunately, a portal wizard automates the most difficult steps for you.

This is an interesting development.

Reporting Services Cmdlets

Paul Turley discusses work within the community to get Reporting Services cmdlets:

We (along with Aaron Nelson, Data Platform MVP & Chrissy LeMaire, PowerShell MVP) are working with the SQL Server product teams to recommend the first set of CmdLets that we would like to see added to the PowerShell libraries.  Please help us by posting comments with your suggestions.  What are the most important SSRS-related tasks that you would like to automate using PS?  Give us your top five or so.

I’m glad to see Reporting Services get some Powershell love.

Table-Valued Parameter Performance

Dan Guzman shows that setting TVP column sizes correctly can have a major performance impact:

LOB values are especially problematic when a trace captures the RPC completed event of a TVP query. Tracing uses memory from the OBJECTSTORE_LBSS memory pool to build trace records that contain TVP LOB values. From my observations of the sys.dm_os_memory_clerks DMV, each LOB cell of a TVP requires about 8K during tracing regardless of the actual value length. This memory adds up very quickly when many rows and lob columns are passed via a TVP with a trace running. For example, the 10,000 row TVP with 10 LOB columns used in the earlier test required over 800MB memory for a single trace record. Consider that a large number of TVP LOB cells and/or concurrent TVP queries can cause queries to fail with insufficient memory errors. In extreme cases, the entire instance can become unstable and even crash under due to tracing of TVP queries.

This is a must-read if you use TVPs in your environment.

Email Delay

Dave Mason looks at the Delay Between Responses option in SQL Agent alerts:

Fortunately, I had set up a SQL Agent Alert for errors with Severity Level 17, which emailed me and several coworkers to alert us to the problem. But this was unfortunate too. Every one of those alert occurrences sent an email. Sure, it’s nice to know when there’s a problem, but a thousand or more emails is most certainly overkill. After addressing the transaction log issue, the alert emails kept coming. This query told me there were still a few thousand unsent emails:

Getting a message that something is down is important.  Getting several thousand messages is counterproductive.

Scatter Charts

Reza Rad shows how to use a scatter chart in Power BI:

Scatter chart is a built-in chart in Power BI that you can show up to three measure with a categorization in it. Three measures can be visualized in position of X axis, Y axis, and size of bubbles for scatter chart. You can also set up a date field in play axis, and then scatter chart will animate how measure values are compared to each other in each point of a time. Let’s start building something simple with this chart and see how it is working in action. At the end of example you will see a summary chart as below;

This is primarily for viewing changes in groups of data over time.  You don’t want too many data points on the map or it gets too confusing.


August 2016
« Jul Sep »