Press "Enter" to skip to content

Author: Kevin Feasel

AWS Data Lake

Nick Corbett announces that Amazon is rolling out their own data lake solution:

Separating storage from processing can also help to reduce the cost of your data lake. Until you choose to analyze your data, you need to pay only for S3 storage. This model also makes it easier to attribute costs to individual projects. With the correct tagging policy in place, you can allocate the costs to each of your analytical projects based on the infrastructure that they consume. In turn, this makes it easy to work out which projects provide most value to your organization.

The data lake stores metadata in both DynamoDB and Amazon ES. DynamoDB is used as the system of record. Each change of metadata that you make is saved, so you have a complete audit trail of how your package has changed over time. You can see this on the data lake console by choosing History in the package view:

Having a competitor in the data lake space is a good thing for us, though based on this intro post, it seems that Amazon and Microsoft are taking different approaches to the data lake, where Microsoft wants you to stay in the data lake (e.g., writing U-SQL or Python statements to query the data lake) and Amazon wants you to shop the data lake and check out the specific S3 buckets and files for your own processing.

Comments closed

Python Support In Azure Data Lake

Saveen Reddy announces that Python is now a first-tier language in the Azure Data Lake:

This week, were are now making announcing even more support for Python. As of today Python is now a first-class language supported by our management SDKs. This enables you to develop applications or automate the Data Lake services. Check out or Getting Started articles that now include many python samples

Saveen has a Jupyter notebook which demonstrates Python in Azure Data Lake Store.

Comments closed

Canary Tests

Kendra Little gives her thoughts on how to identify a good DBA team:

What I learned was not to judge a team by their SQL Server. Some configurations may look problematic, but make a lot more sense when I talk to the team and dig into problems they’re facing.

For instance, there’ve been many times when a team was facing a performance issue, and at first glance their SQL Server looked stupidly underprovisioned in terms of memory. Upon digging into the problem I found that adding more memory wouldn’t solve their particular problem. One size doesn’t usually fit all.

Read on for hints and thoughts.

Comments closed

Azure Functions

Steph Locke has taken a shine to Azure Functions:

Azure Functions take care of all the hosting, all the retry logic, all the parallelisation, all the authentication gubbins, all the monitoring for you. The only bits of code you really have to write is the important stuff – the code that implements the business process. This makes a coding project go from >500 lines to <50, and it should be better quality too! This is super handy for data integration, and I would recommend it over and above Data Factory, unless you need to do some Hadoop stuff and maybe not even then.

The wag in me says that with F#, you could take it from 50 lines to 10…  Read the whole thing.

Comments closed

Row-Level Security With Power BI

Callum Green shows how to use row-level security with Power BI Desktop:

In the June 2016 monthly Power BI release, Row Level Security (RLS) was introduced into Power BI desktop. This is great news for people using the application, especially as the configuration is stored within the Power BI model.  Previously, you had to create the security in the web environment, which could easily be overwritten when publishing multiple times from a desktop workbook.

In this blog, I will show you how to set up RLS in Power BI desktop and how to test it works. My example uses the AdventureWorksDW2014 database (download here), specifically applying permissions for a manager. Each manager will only be able to see data for the Sales Representatives that report to them.

This is different from the SQL Server 2016 feature of the same name, but the concept is the same.

Comments closed

SQL Server For Linux Tools

Sanjay Nagamangalam looks at different tools you can use to connect to SQL Server:

  • New SQL command line tools for Linux: We’ve created Linux-native versions of your favorite SQL command line tools such as sqlcmdand bcp and sqlpackage and also added the new mssql-conf tool that lets you configure various properties for the SQL Server instance on Linux (e.g., SA password, TCP port and collation).

  • New versions of SSMS, SSDT and SQL PowerShell: We have released updated versions (v17.0 RC1) of our flagship SQL Server tools including SQL Server Management Studio (SSMS), Visual Studio SQL Server Data Tools (SSDT) and SQL PowerShell with support for the SQL Server v.Next on Windows and Linux.

They also have a plugin for Visual Studio Code, which can be helpful if you’re running on Linux.

Comments closed

OUTPUT With Identity Columns

Kenneth Fisher walks through a scenario trying to archive data using the OUTPUT clause, but where there is an identity column involved:

So what’s the problem? Well, to start with by using the SELECT TOP 0 INTO Archive method of creating the Archive table you’ve created the archive table with an identity column. Get rid of that identity property (you still want the column) and you don’t have a problem. But the really cool part of this is that it reminded me that the OUTPUT clause has a field list.

Also check out the comments for another solution.

Comments closed

Sharing Power Query Queries

Chris Webb shows how to use Azure Data Catalog to share queries from Power Query:

While I’m really happy to have this functionality back, and I think a lot of people will find it useful, there’s still a lot of room for improvement. Some thoughts:

  • This really needs to extended to work with Power BI Desktop too. In fact, it’s such an obvious thing to do it must be happening soon…?

Given how quickly the Power BI team iterates, that’s probably the case.  Anyhow, read the whole thing.

Comments closed

Forcing Polybase External Pushdown

I have a post showing how to control predicate pushdown in Polybase:

As a reminder, in order to allow predicate pushdown to occur, we need to hit a Hadoop cluster; we can’t use predicate pushdown on other systems like Azure Blob Storage.  Second, we need to have a resource manager link set up in our external data source.  Third, we need to make sure that everything is configured correctly on the Polybase side.  But once you have those items in place, it’s possible to use the FORCE EXTERNALPUSHDOWN command like so:

There’s also discussion of preventing MapReduce job creation as well as a pushdown-related error I had received in the past.

Comments closed

Memory-Optimized Transaction Logging

Raul Gonzalez looks at how transaction logging works with memory-optimized tables:

Now we need double the rows, because for each row we’ve said it’s been deleted, we have to tell SQL Server that was not actually deleted (COMPENSATION due to ROLLBACK) in case of recovery (crash recovery or backup recovery). That’s so bad.

But not everything is lost yet :) let’s check how the In-Memory engine deal with this problem

Memory-optimized tables are pretty neat.

Comments closed