Kevin Feasel – Page 976

As you add more and more extensions to Azure Data Studio, keeping them up to date could present a challenge. Of course we will want to install all the extensions that we think might be helpful to us. Luckily Azure Data Studio has a nice way to let us know if there is an extension that is out of date.

Click through to see how.

Comments closed

Bulk Removing Reshare Permissions in Power BI

Published 2019-10-08 by Kevin Feasel

Gilbert Quevauvilliers takes us through bulk changing dataset permissions in Power BI:

Whilst this sometimes is what is required, I always suggest to people to remove this option and rather manage it by exception.
If you do leave this option enabled, it means if I had to share a report with Bob, Bob would then be able to reshare the report with someone else. This could potentially lead to other people gaining access to a report or data that they should not see.
Fortunately, I found a way to quickly remove the reshare option in the Power BI Service.

Click through to see how it’s done.

Comments closed

Figuring Dataflow Boundaries

Published 2019-10-08 by Kevin Feasel

Matthew Roche gives some advice on how large to make artifacts in Power BI dataflows:

This post started as a response to this question from Mark, who was commenting on last week’s data lineage post:
How would you decide how big or how small to make each artifact in the lineage, in terms of the amount of transformations taking place inside the artifact? In my case they would only be shared with 2-3 other users.
For instance I could go all out and have every step that would previously take place in a query editor result in a new link in the data lineage chain, but that would probably be overkill.
I agree that “one step per dataflow” would be overkill, but beyond that the answer is largely “it depends.”

Read on to see on which factors it depends.

Comments closed

Mitigating SQL Injection with SQL Server 2019

Published 2019-10-08 by Kevin Feasel

Grant Fritchey is sick of SQL injection:

Instead, let’s talk about some of the common vectors of SQL Injection. Obviously, building and executing strings is the biggest issue. Appropriate use of parameters will do more to the fix the problem than almost any other step. However, it’s also enhanced by bad code on the front-end which doesn’t appropriately clean the data, inappropriate error handling, bad security, bad data isolation, and more.
The keys to the attack are to get back a few bits of information, usually in error messages in the case of a normal attack, or, through the use of the WAITFOR command in a blind attack (for more detail, I’m talking about this stuff at the PASS Summit). Getting error messages with information about the database makes it easier for me to hack your system (if I was evil). Knowing that I have a SQL Injection vector through the WAITFOR command helps me target appropriate systems (if I was evil).

For the most part, SQL injection isn’t a SQL problem—it’s an application problem (save for the case when you generate dynamic SQL and concatenate in input parameters). SQL Server-based solutions will only do a little bit; fixing the app code is the best answer.

Comments closed

Installing Python Libraries on EMR Clusters with Notebooks

Published 2019-10-07 by Kevin Feasel

Parag Chaudhari shows how we can install Python libraries on existing ElasticMapReduce clusters using EMR Notebooks:

The notebook-scoped libraries discussed previously require your EMR cluster to have access to a PyPI repository. If you cannot connect your EMR cluster to a repository, use the Python libraries pre-packaged with EMR Notebooks to analyze and visualize your results locally within the notebook. Unlike the notebook-scoped libraries, these local libraries are only available to the Python kernel and are not available to the Spark environment on the cluster. To use these local libraries, export your results from your Spark driver on the cluster to your notebook and use the notebook magic to plot your results locally. Because you are using the notebook and not the cluster to analyze and render your plots, the dataset that you export to the notebook has to be small (recommend less than 100 MB).

Read the whole thing.

Comments closed

From Kafka to Pulsar

Published 2019-10-07 by Kevin Feasel

Avaro Santos Andres has arguments for migrating from Apache Kafka to Apache Pulsar:

Imagine you have thousands or millions of devices sending data to your data lake. This data must be managed with speed, security, and reliability. In addition, for legal reasons you must partition data by country, device, and city. These requirements seem reasonable, and in 2019, stream-processing platforms must be able to deal with them.
But how well do they? Kafka is not known to work well when there are thousands of topics and partitions even if the data is not massive. You can see how complicated it can be to try to solve performance challenges in these scenarios.

I like this sort of competition, as I know Kafka will step up their game as a result.

Comments closed

Dynamically Controlling Power Query Columns

Published 2019-10-07 by Kevin Feasel

Erik Svensen wants to display columns dynamically in Power Query:

This means that even though we might add new columns to the ProductsAttributes table – it will still only be Brand that is expanded and only that column.
The bolded arguments is 2 lists that contains the Column names to expand and the new names of the columns – the last argument is optional so we can actually skip that if we want the original names – https://docs.microsoft.com/en-us/powerquery-m/table-expandtablecolumn

Read on to see how to do this.

Comments closed

Ordered Clustered Columnstore Indexes in Azure SQL DW

Published 2019-10-07 by Kevin Feasel

Niko Neugebauer takes us through a new feature in preview for Azure SQL Data Warehouse:

After creating (or dropping and recreating a Clustered Columnstore Index we can specify the reserved word ORDER and then one or !!!MULTIPLE!!! columns. This looks like an extremely promising feature!
On Azure SQL Data Warehouse one can of course define table as a Columnstore and with that specification it is also possible to define an ORDER option with one or multiple columns.
For the syntax and basic functionality testing purposes on Azure SQL Data Warehouse, let us then create a table with a Clustered Columnstore Index, load some data and see if by recreating an Ordered Clustered Columnstore Index we can achieve some improvements.

Niko has a few hard-earned lessons from this post.

Comments closed

When Readers Block Writers

Published 2019-10-07 by Kevin Feasel

Erik Darling takes us through a scenario where readers can block writers for an extended amount of time:

To hold onto Shared locks, you’d need to use an isolation level escalation hint, like REPEATABLE READ.
I could do that here if I were a lazy cheater.
Instead, I’m going to show you a more common and interesting scenario.

This leaves the classes of non-lazy cheater and a lazy non-cheater (because non-lazy non-cheater sounds batty). Regardless of your answer, great post by Erik.

Comments closed

Deploying CUs to Multiple Instances with Powershell

Published 2019-10-07 by Kevin Feasel

Jeff Iannucci embraces the power of the shell:

This all started because we had some 14 new SQL Server 2017 instances that we were setting up, but we hadn’t yet applied the most recent cumulative update that we are using in our environment. I started using the Update-DbaInstance cmdlet in the script below to apply to one server, but then I looked at the list of outstanding requests and thought about something Buck Woody once told me.
“You don’t have time for that. You’re going to be dead soon.”
He’s a fantastic fellow, but we should all be grateful he didn’t become a physician.

Click through for the five-line script and an explanation of what each line does.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel

Keeping Track of ADS Extension Updates

Bulk Removing Reshare Permissions in Power BI

Figuring Dataflow Boundaries

Mitigating SQL Injection with SQL Server 2019

Installing Python Libraries on EMR Clusters with Notebooks

From Kafka to Pulsar

Dynamically Controlling Power Query Columns

Ordered Clustered Columnstore Indexes in Azure SQL DW

When Readers Block Writers

Deploying CUs to Multiple Instances with Powershell