Automating Azure Data Lake Storage ACLs

Shannon Lowder shows how to automate Azure Data Lake Storage access control lists:

Now that you have these, you can use a for each loop to set your permissions.

foreach ($ACL in $ACLs) { write-host "Grant $useremail " $ACL[1] " access to " $ACL[0]; Set-AzureRmDataLakeStoreItemAclEntry -AccountName $adls -Path $ACL[0] -AceType User -Id $(Get-AzureRmADUser -Mail $useremail ).Id -Permissions $ACL[1] Set-AzureRmDataLakeStoreItemAclEntry -AccountName $adls -Path $ACL[0] -AceType User -Id $(Get-AzureRmADUser -Mail $useremail ).Id -Permissions $ACL[1] -Default

Now, for each permission, we’ll set the ACL and the default.  Why set both?  Well, when folders are created under each of the target folders, you want to cascade those permissions down from parent to child, right?  Well, that’s what the Default ACL controls.  If you skip the second Set-AzureRMDataLakeStoreItemAclEntry, then new folders would not inherit the permissions of the containing folder and your users would be unable to access their files properly.

Read the whole thing.  Shannon also has one of the very few valid use cases for 3D pie charts.

Dealing With Word Tensors

Chris Moody continues his series on natural language processing:

Counting and tensor decompositions are elegant and straightforward techniques. But these methods are grossly underepresented in business contexts. In this post we factorized an example made up of word skipgrams occurring within documents to arrive at word and document vectors simultaneously. This kind of analysis is effective, simple, and yields powerful concepts.

Look to your own data, and before throwing black-box deep learning machines at them, try out tensor factorizations!

He has a set of animated GIFs to help with learning, though I do wish they were about 30% slower so you can take a moment to read each section before it jumps to the next bit.

Cosmos DB Cheat Sheet

Kevin Feasel



Melody Zacharias shows us a cheat sheet for Cosmos DB:

The Cosmos DB by Microsoft is their globally distributed, horizontally scalable, multi-model database service that is available through Azure.  Released in 2014, it is the ideal DB for globally distributed applications.  Formerly called DocumentDB Cosmos it now supports querying documents using SQL as a JSON query language.  As a Schema-free platform, it provides automatic indexing of JSON documents without requiring an explicit schema or creation of secondary indexes. For those of use not well versed in JSON, this query cheat-sheet, has come to our rescue.  It outlines common queries to retrieve information from 2 JSON documents.

Microsoft has put together this cheat-sheet to help you write your queries faster.  This quick reference is a single page PDF that you can print, or keep in a handy computer file.  This is version 4, so it just keeps getting better!

Click through for the link to the cheat sheet.

Phantom Reads

Arun Sirpal sees not-quite-there-yet transactions:

With Halloween around the corner what better topic to discuss than phantom reads. A phantom read occurs when rows have been inserted after a read operation and becomes visible in a follow-up read operation within the same transaction.

I will show you what this looks like with an example. Please note the code will not work for you because I dropped the foreign key constraint within the person.person table just so I could show you the example.

Arun mentions that the serializable isolation level prevents phantom reads.  So do the repeatable read and snapshot isolation levels.

Updating Multiple Tables In One Statement

Kevin Feasel



Denis Gobo tries different methods of updating two tables in one statement:

Every now and then you will get someone asking how to update two tables with one statement in SQL Server. The answer is usually, no that is not possible… the person then walks away muttering something about how MySQL allows it.

So I decided to try to see if I could update two tables with one statement. I decided to try a couple of different things

  • view

  • common table expression

  • indexed view

  • instead of trigger

The results are generally negative.  I get the benefit of updating multiple tables in a single query, particularly when you have fairly complex logic around determining which records to update, but if you just wrap things in a transaction, at least you get the benefits of atomicity.  And if you often have to update a bunch of different tables at the same time, there might be an issue with your data model.

Stress Testing SQL Server

Jes Borland shows how to use ostress to perform load testing against a SQL Server instance:

Ostress allows you to specify one file, or a folder that contains multiple files, to run. You can also specify a number of connections to be made to the database, to simulate multiple users or applications running the same query. Each connection can then run the file one or more times.

The next thing you’ll need is one or more .sql files that the tool will run.

To run a load test, you’ll open RML cmd prompt and enter your command.

Ostress isn’t as nice as a replayable trace for generating production loads, but it’s an easy method to stress test a server.


October 2017
« Sep Nov »