Azure Data Lake ACLs

Saveen Reddy introduces file and folder level Access Control Lists for Azure Data Lake storage:

We’ve emphasized that Azure Data Lake Store is compatible with WebHDFS. Now that ACLs are fully available, it’s important to understand the ACL model in WebHDFS/HDFS because they are POSIX-style ACLs and not Windows-style ACLs.  Before we five deep into the details on the ACL model, here are key points to remember.

  • POSIX-STYLE ACLs DO NOT ALLOW INHERITANCE. For those of you familiar with POSIX ACLs, this is not a surprise. For those coming from a Windows background this is very important to keep in mind. For example, if Alice can read files in folder /foo, it does not mean that she can rad files in /foo/bar. She must be granted explicit permission to /foo/bar. The POSIX ACL model is different in some other interesting ways, but this lack of inheritance is the most important thing to keep in mind.

  • ADDING A NEW USER TO DATA LAKE ANALYTICS REQUIRES A FEW NEW STEPS. Fortunately, a portal wizard automates the most difficult steps for you.

This is an interesting development.

Reporting Services Cmdlets

Paul Turley discusses work within the community to get Reporting Services cmdlets:

We (along with Aaron Nelson, Data Platform MVP & Chrissy LeMaire, PowerShell MVP) are working with the SQL Server product teams to recommend the first set of CmdLets that we would like to see added to the PowerShell libraries.  Please help us by posting comments with your suggestions.  What are the most important SSRS-related tasks that you would like to automate using PS?  Give us your top five or so.

I’m glad to see Reporting Services get some Powershell love.

Table-Valued Parameter Performance

Dan Guzman shows that setting TVP column sizes correctly can have a major performance impact:

LOB values are especially problematic when a trace captures the RPC completed event of a TVP query. Tracing uses memory from the OBJECTSTORE_LBSS memory pool to build trace records that contain TVP LOB values. From my observations of the sys.dm_os_memory_clerks DMV, each LOB cell of a TVP requires about 8K during tracing regardless of the actual value length. This memory adds up very quickly when many rows and lob columns are passed via a TVP with a trace running. For example, the 10,000 row TVP with 10 LOB columns used in the earlier test required over 800MB memory for a single trace record. Consider that a large number of TVP LOB cells and/or concurrent TVP queries can cause queries to fail with insufficient memory errors. In extreme cases, the entire instance can become unstable and even crash under due to tracing of TVP queries.

This is a must-read if you use TVPs in your environment.

Email Delay

Dave Mason looks at the Delay Between Responses option in SQL Agent alerts:

Fortunately, I had set up a SQL Agent Alert for errors with Severity Level 17, which emailed me and several coworkers to alert us to the problem. But this was unfortunate too. Every one of those alert occurrences sent an email. Sure, it’s nice to know when there’s a problem, but a thousand or more emails is most certainly overkill. After addressing the transaction log issue, the alert emails kept coming. This query told me there were still a few thousand unsent emails:

Getting a message that something is down is important.  Getting several thousand messages is counterproductive.

Scatter Charts

Reza Rad shows how to use a scatter chart in Power BI:

Scatter chart is a built-in chart in Power BI that you can show up to three measure with a categorization in it. Three measures can be visualized in position of X axis, Y axis, and size of bubbles for scatter chart. You can also set up a date field in play axis, and then scatter chart will animate how measure values are compared to each other in each point of a time. Let’s start building something simple with this chart and see how it is working in action. At the end of example you will see a summary chart as below;

This is primarily for viewing changes in groups of data over time.  You don’t want too many data points on the map or it gets too confusing.

Kafka On AWS

Kevin Feasel

2016-07-29

Hadoop

Alex Loddengaard explains a few things you should think about when deploying Apache Kafka to AWS:

Kafka has built-in fault tolerance by replicating partitions across a configurable number of brokers. However, when a broker fails and a new replacement broker is added, the replacement broker fetches all data the original broker previously stored from other brokers in the cluster that host the other replicas. Depending on your application, this could involve copying tens of gigabytes or terabytes of data. Fetching this data takes time and increases network traffic, which could impact the performance of the Kafka cluster for the period the data transfer is happening.

EBS volumes are persisted when an instance fails or is terminated. When an EC2 instance running a Kafka broker fails or is terminated, the broker’s on-disk partition replicas remain intact and can be mounted by a new EC2 instance. By using EBS, most of the replica data for the replacement broker will already be in the EBS volume and hence won’t need to be transferred over the network. Only data produced since the original broker failed or was terminated will need to be fetched across the network.

There are some good insights here; read the whole thing if you’re thinking about running Kafka.

Automating Patching?

Kendra Little takes on the question of whether patching should be automated on SQL Server instances:

I used to spend a lot of time doing patching, and I had plenty of times when:

  • Servers wouldn’t come back up after a reboot. Someone had to go into the iLo/Rib card and give them a firm shove

  • Shutdown took forever. SQL Server can be super slow to shut down! I understand this better after reading a recent post on the “SQL Server According to Bob” blog. Bob Dorr explains that when SQL Server shuts down, it waits for all administrator (sa) level commands to complete. So, if you’ve got any servers where jobs or applications are running as sa, well….  hope they finish up fast.

  • Patching accidentally interrupted something important. Some process was running from an app server, etc, that failed because patching rebooted the server, and it fired off alarms that had to be cleaned up.

  • Something failed during startup after reboot. A service started and failed, or a database wasn’t online.  (Figuring out “was that database offline before we started?” was the first step. Ugh.)

  • Miscommunication caused a problem on a cluster.  Whoops, you were working on node2 while I was working on node1? BAD TIMES.

This is a really good post.  Kendra’s done a lot more patching than I have, and she’s definitely though about it in more detail.  Me, I’m waiting for the day—which is very close for some companies—in which you don’t patch servers.  Instead, you spin up and down virtual apps and virtual servers which are fully patched.  It’s a lot harder to do with databases compared to app servers, but if you separate data from compute, your compute centers are interchangeable.  When a new OS patch comes out, you spin up new machines which have this patch installed, they take over for the old ones, and after a safe period, you can delete the old versions forever.  If there’s a failure, shut down the new version, spin back up the old version, and you’re back alive.

Traversing DAX Hierarchies

Meagan Longoria shows how to implement hierarchical slicing and filtering using DAX:

My friend and coworker Melissa Coates (aka @sqlchick) messaged me the other day to see if I could help with a DAX formula. She had a Power BI dashboard in which she needed a very particular interaction to occur. She had slicers for geographic attributes such as Region and Territory, in addition to a chart that showed the percent of the regional total that each product type represented.  The product type was in the fact/data table. Region and territory were in a dimension/lookup table and formed a hierarchy where a region was made up of one or more territories and each territory had only one region.

It’s rare to hear me say “MDX was easier” but in this case, MDX was easier…

SQL Saturday Attendee Distances

I have a long, long post on figuring out how far SQL Saturday attendees tend to drive:

Before I begin, allow me to perform the data science Airing of Grievances.  This is an important part of data analysis which most people gloss over, instead jumping right into the “clean up the dirty data” phase.  But no, let’s revel in its filth for just a few moments.

Despite my protestations and complaints, I think there are some reasonable conclusions.  If you need to look like you’re working for a couple of hours (or at least want to play around a bit with SQL and R), this is the post for you.

Powershell Step Type Bug

Derik Hammer notes that there is a bug in 2016 with SQL Agent jobs which have Powershell step types:

When executed through the SQL Agent, the SQLPS.exe mini-shell is called and the current working directory is switched to the SQLSERVER:\ provider. When you call a cmdlet that uses the FILESYSTEM provider under the context of the SQLSERVER provider the cmdlet will fail.

Derik has a Connect ticket open for this bug as well.

Categories

October 2018
MTWTFSS
« Sep  
1234567
891011121314
15161718192021
22232425262728
293031