Author: Kevin Feasel

Azure SQL Data Warehouse is a massively parallel processing (MPP) architecture designed for large-scale data warehouses. An MPP system creates logical / physical slices of the data. In SQL Data Warehouse’s case, the data has 60 logical slices, at all performance tiers. This means that a single table can have up to 60 different object_ids. This is why, in SQL Data Warehouse, there is the concept of physical and logical object_ids along with physical names.

Below is a query for finding row counts of tables in SQL Data Warehouse which accounts for the differences in architecture between my earlier script, written for SQL Server, and SQL Data Warehouse.

Click through for the script.

Comments closed

Embedding Base-64 Encoded Images Into Power BI

Published 2018-01-03 by Kevin Feasel

Jason Thomas shows how to embed an image into Power BI without using an image URL:

And I completely understood his concerns as I had the same issue with some of the public facing reports that I made, for eg., the US Election report that I had made 1 year back. The images for the candidates were sourced from Wikipedia and certain candidates like George Bush, Donald Trump, etc. are not displayed, because the image URLs are no longer valid.

This is where you can use my workaround to embed the images within the report by converting the images into Base64.

It’s an interesting approach when you need to solve this problem.

Comments closed

DAX’s EARLIER And EARLIEST Functions

Published 2018-01-03 by Kevin Feasel

Matt Allington contrasts the EARLIER function with the EARLIEST function in DAX:

The EARLIER function by default refers to the row context that immediately preceded the current row context being considered. In the examples used here there have only been 2 row contexts, the outer and the inner. Given there are only 2, when using EARLIER within the inner row context it is always referring to the outer row context. In this case, EARLIER and EARLIEST refer to the same outer row context. If there are more than 2 row contexts, the EARLIER function accesses the previous row context by default (starting from the inner and moving out) and the EARLIEST function refers to the outermost row context. But did you know that EARLIER has a second optional parameter?

Read the whole thing.

Comments closed

How The DBA Role Is Changing

Published 2018-01-02 by Kevin Feasel

Tom Smith spoke to 22 executives from 21 companies about how the role of Database Administrator is changing:

While developers don’t think they need them, DBAs are still needed for governance to make it easier to analyze data.
DBAs have gone from managing databases tobeing data engineers across multiple systems. They focus on how data moves from one database to another, the consumption of data, tuning of the data, and management of the data process across the data landscape is critical until it is distributed and executed automatically.
DBAs have moved from being focused on individual products like SQLServer and Oracle to having to deal with bringing companies’ big data implementation to life.

There are a lot of points here. I agree with many, disagree with a few, and think that some of them are quite context-sensitive. But all are worth thinking about.

Comments closed

Data Lake Zones

Published 2018-01-02 by Kevin Feasel

Melissa Coates walks us through the different layers of a data lake:

As we are approaching the end of 2017, many people have resolutions or goals for the new year. How about a goal to get organized…in your data lake?

The most important aspect of organizing a data lake is optimal data retrieval.

Click through for a great visual showing the various zones in a data lake.

Comments closed

Scraping The PASS Budget

Published 2018-01-02 by Kevin Feasel

Steph Locke shows us how to scrape a PDF, specifically, the PASS operating budget:

With tabulizer, if the data is relatively well formatted in a PDF you can use tabulizer::extract_tables(). This gives you a bunch of data.frames which you can process. Unfortunately, in the case of the PASS budget with 22 pages of tables, including tables that span multiple pages, we’re not so lucky!

We need to fall back to tabulizer::extract_text() and do a lot of wrangling to reconstruct the tables.

Steph shows her work, so click through to see the scripts.

Comments closed

More On Machine Learning Services

Published 2018-01-02 by Kevin Feasel

Ginger Grant continues her Machine Learning Services series with a couple more posts. First up is on memory allocation:

Enabling Machine Learning Services on SQL Server which I discussed in a previous blog post, requires you to enable external scripts. Machine Learning Services are run as external processes to SQLPAL. This means that when you are running Python or R code you are running it outside of the managed processes of SQL Server and SQLPAL. This design means that the resources used to run Machine Learning Services will run outside of the resources allocated for SQL Server. If you are planning on using Machine Learning Services you will want to review the server memory options which you may have set for SQL Server. If you have set the max server memory For example, if your server has 16 GB of RAM memory, and you have allocated 8 GB to SQL Server and you estimate that the operating system will use an additional 4 GB, that means that machine learning services will have 4 GB remaining which it can use.

By design, Machine Learning Services will not starve out all of the memory for SQL Server because it doesn’t use it. This means DBAs to not have to worry about SQL Server processes not running because some R program is using all the memory as it does not use the memory SQL Server has allocated. You do have to worry about the amount of memory allocated to Machine Learning Services as by default, using our previous example where there was 4 GB which Machine Learning Services can use, it will only use 20% of the available memory or 819 KB of memory. That is not a lot of memory. Most likely if you are doing a lot of Machine Learning Services work you will want to use more memory which means you will want to change the default memory allocation for external services.

Ginger also talks about the Launchpad service:

When calling external processes, internally SQL Server uses User IDs to call the Launchpad service, which is installed as part of Machine Learning Services and must be running for SQL Server to be able to execute code written in R or Python. The number of users is set by default. To change the number of users, open up SQL Server Configuration Manager by typing SQLServerManager14.msc at the run prompt. For some unknowable reason Microsoft decided to hide this application which was previously available by looking at the installed programs on the server. Now for some reason they think everyone should memorize this obscure command. Once you have the SQL Server Configuration Manager open, right click on the SQL Server Launchpad service and select the properties which will show the window, as shown below. You will notice I am running an instance called SQLServer2017 which is listed in parenthesis in the window name.

Both are worth reading.

Comments closed

In-Memory OLTP: When You’re Out Of Space

Published 2018-01-02 by Kevin Feasel

Ned Otter shows us what happens when you run out of disk space and you’re using memory-optimized objects:

In my lab, I’m running Windows Server 2012. Let’s use Powershell to install the File System Resource Manager, which will allow us to create a quota for the relevant folder:

add-windowsfeature –name fs-resource-manager –includemanagementtools

After installing the Windows feature we can set the quota for the folder, but we shouldn’t enable it just yet, because first we have to verify the current size of the folder.

On my server, I created a quota of 1.5GB, and then enabled it.

Now let’s INSERT rows into the table, in batches of 1000, until we reach the limit (the INSERT script is listed in Part 2, I’m trying to keep this post from getting too long).

Click through to see what happens. It’s not exactly a swath of carnage, but it’s also something you really don’t want to happen.

Comments closed

Simulating Network Latency

Published 2018-01-02 by Kevin Feasel

John Paul Cook shows how to use WANem to simulate network latency in a Hyper-V environment:

Access WANem from either SQL Server virtual machine using a case sensitive URL that includes WANem’s IP address. In this example, the URL is http://99.99.99.99/WANem. Inside the SQL Server virtual machines, I set the browser’s start page to the WANem home page. Create a delay of 1000 msec and retest SQL Server to SQL Server connectivity.

It looks like a good way of proving out whether your setup can handle extreme latency before you build it for real.

Comments closed

Avoid Impersonation And The Trustworthy Flag

Published 2018-01-02 by Kevin Feasel

Solomon Rutzky explains how you can use module signing to avoid the security risks which come with impersonation and setting Trustworthy on:

Admittedly, using Cross-Database Ownership Chaining and/or Impersonation and/or TRUSTWORTHY are quicker and easier to implement than Module Signing. However, the relative simplicity in understanding and implementing these options comes at a cost: the security of your system.

Cross-DB Ownership Chaining:

security risk (can spoof User / DB-level)

db_ddladmin & db_owner users can create objects for other owners

Users with CREATE DATABASE permission can create new databases and attach existing databases

Impersonation:

If IMPERSONATE permission is required:

can be used any time

No granular control over permissions

Cross-DB operations need TRUSTWORTHY ON

Need to use ORIGINAL_LOGIN() for Auditing

Elevated permissions last until process / sub-process ends or REVERT

TRUSTWORTHY:

Bigger security risk

can also spoof Logins, such as “sa” !

If using SQLCLR Assemblies, no per-Assembly control of ability to be marked as either EXTERNAL_ACCESS or UNSAFE; all Assemblies are eligible to be marked as either of those elevated permission sets.

The common theme across all three areas is no control, within a Database, over who or what can make use of the feature / option, or when it can be used.

Read the whole thing.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31