Author: Kevin Feasel

SMO And Clear-Text Passwords

Published 2018-11-05 by Kevin Feasel

Cody Konior looks at a case where SMO can leak SQL authentication passwords:

SMO connects to SQL Server using the ADO.NET SQLClient library which has 13+ years of features which help mask the passwords you pass in for SQL Authentication. SMO bypasses some of those features to often leak the passwords in clear-text.

Even where it would normally be hidden.

Even where you use Persist Security Info introduced in 2005.

Even where you use System.Security.SecureString introduced in 2012.

Though thankfully not where you use System.Data.SqlClient.SqlCredential also introduced in 2012. However… there’s some caveats here too.

We’ll prove it through repeatable tests that can be used to track if Microsoft fix the problem or not.

Read the whole thing.

Comments closed

Automated Testing With Power Query

Published 2018-11-05 by Kevin Feasel

Fred Kaffenberger walks us through query failure with Power Query:

I loved Nar’s post on Automated Testing using DAX. I especially like the rule of always including controls so that business readers can share responsibility for data quality. For my part, I sometimes use hidden pages in Power BI reports to assure myself of data quality. I also set alerts on testing dashboards in the Power BI Service to notify me if something is not right. Sometimes, however, a more proactive approach is needed. So, we’ll be doing automated testing with Power Query.

If the query can’t connect to the data source, it will fail. When this happens, the report in Power BI Service is stale, but accurate. I’m fine with this. It can also happen that the query succeeds but is incomplete. In this case, the result is that the report is wrong. Why does this happen? It can happen because of an overtaxed transactional data source. The ERP or CRM or work order system just can’t deliver the amount of data. Maybe it’s linked SQL tables using ODBC. For whatever reason, the query succeeds, but data is missing. I’m NOT fine with this. The long-term solution is to move to a more reliable data source (data warehouse, anybody?). In the short run, refreshes must be stopped. Stale data is better than bad data.

Also check out the comments.

Comments closed

Protecting Hadoop Clusters From Malware

Published 2018-11-02 by Kevin Feasel

Michael Yoder and Suraj Acharya remind us that Hadoop clusters are made up of computers on a network, which means people will try to install malicious software:

Roughly two years ago there were a spate of attacks against the open source database solution MongoDB, as well as Hadoop. These attacks were ransomware: the attacker wiped or encrypted data and then demanded money to restore that data. Just like the recent attacks, the only Hadoop clusters affected were those that were directly connected to the internet and had no security features enabled. Cloudera published a blog post about this threat in January 2017. That blog post laid out how to ensure that your Hadoop cluster is not directly connected to the internet and encouraged the reader to enable Cloudera’s security and governance features.

That blog post has renewed relevance today with the advent of XBash and DemonBot.

The origin story of XBash and DemonBot illustrates how security researchers view the Hadoop ecosystem and the lifecycle of a vulnerability. Back in 2016 at the Hack.lu conference in Luxembourg, two security researchers gave a talk entitled Hadoop Safari: Hunting for Vulnerabilities. They described Hadoop and its security model and then suggested some “attacks” against clusters that had no security features enabled. These attacks are akin to breaking in to a house while the front door is wide open.

Their advice is simple, but simple is good here: it means you should be able to implement the advice without much trouble.

Comments closed

t-closeness And Data Anonymity

Published 2018-11-02 by Kevin Feasel

John Cook shares some thoughts about k-anonymity and t-closeness:

The idea of k-anonymity is that every database record appears at least k times. If you have a lot of records and few fields, your value of k could be high. But as you get more fields, it becomes more likely that a combination of fields is unique. If k = 1, then k-anonymity offers no anonymity.

Another problem with k-anonymity is that it doesn’t offer group privacy. A database could be k-anonymous but reveal information about a group if that group is homogeneous with respect to some field. That is, the method is subject to a homogeneity attack.

This is intended to be a “get you thinking” type of post, and John does have links to related posts which flesh things out a bit more.

Comments closed

Useful Powershell Aliases For Docker

Published 2018-11-02 by Kevin Feasel

Elton Stoneman shares a few useful aliases in Powershell for managing Docker containers:

Docker PowerShell Alias #2 – drmf

Removes all containers, whether they’re running or not. Useful when you want to reset your running containers and get back to zero:
function Remove-AllContainers {  
    docker container rm -f $(docker container ls -aq)
}
Set-Alias drmf  Remove-AllContainers  
Use with caution

Elton shares several more at the link and also includes a link to a Github gist with them all.

Comments closed

Azure ML Studio Supports R 3.4

Published 2018-11-02 by Kevin Feasel

David Smith notes that Azure ML Studio now supports R version 3.4:

With the Execute R Script module you can immediately use more than 650 R packages which come preinstalled in the Azure ML Studio environment. You can also use other R packages (including packages not on CRAN) and source in R scripts you develop elsewhere (as shown above), although this does require the time to install them in the Studio environment. You can even create custom ML Studio models encapsulating R code for others to use in the drag-and-drop environment.

If you’re new to Azure ML Studio, check out the Quickstart Tutorial for R to learn how use the Execute R Script module, and to check out what’s new in the latest update follow the link below.

Click through for more details.

Comments closed

Finding Who Changed Auto-Tuning Settings On Azure SQL DB

Published 2018-11-02 by Kevin Feasel

Arun Sirpal shows us the extended event to watch to learn who changed that auto-tuning setting:

It is said to be safe, reliable and proven using complex algorithms and built-in intelligence where it can do the following (see this link for more details: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning)

CREATE INDEX – identifies indexes that may improve performance of your workload, creates indexes, and automatically verifies that performance of queries has improved.

DROP INDEX – identifies redundant and duplicate indexes daily, except for unique indexes, and indexes that were not used for a long time (>90 days). Please note that at this time the option is not compatible with applications using partition switching and index hints.

FORCE LAST GOOD PLAN – identifies SQL queries using execution plan that is slower than the previous good plan, and queries using the last known good plan instead of the regressed plan.

Personally I don’t enable the option where it is allowed a “free-for-all” when creating/dropping indexes and forcing certain query plans. I like controlling the change, especially for production databases. To force this concept I wanted to use Extended Events to know when / if someone changed my settings for automatic tuning against my database.

Click through for the script.

Comments closed

Using Microsoft Flow To Find Power BI Data Sources In Use

Published 2018-11-02 by Kevin Feasel

Chris Webb continues his series on using Microsoft Flow to extend Power BI:

The problem with self-service BI is that you never quite know what your users are up to. For example, what data sources are they using? Are there hundreds of Excel files being used as data sources for reports that you don’t know about? If so, where are they? Could they and should they be replaced by a database or something else more robust? In this post I’ll show you how you can use Microsoft Flow and the Power BI REST API (see part 1 to find out how to create a Flow custom connector to call the Power BI API) to get the details of all the data sources used in all of the workspaces of your Power BI tenant.

I’ll admit that doing this turned out to be a bit trickier than I had expected. My plan was to use the GetDatasetsAsAdmin endpoint to get a list of all datasets, loop over each one and then call the (undocumented, but in the REST API’s Swagger file and therefore in my custom connector) GetDatsourcesAsAdmin endpoint to get the datasources in each dataset. Both these endpoints require administrative permissions to call, so I made sure my custom connector had the correct permissions (at least Tenant.Read.All – you can check this in the Azure Portal on the app you registered in Azure Active Directory) and I ran the Flow as a user with Power BI Admin permissions. But I kept getting 404 errors when requesting the data sources for certain datasets .

Chris explains why those 404s appear and what you can do about them.

Comments closed

Investigating UK Traffic With Principal Component Analysis

Published 2018-11-01 by Kevin Feasel

Michael Grogan shows us how to use Principal Component Analysis (PCA) to classify route segments in UK transportation data:

Specifically, let us assume that we wish to analyze traffic density for buses and coaches. The main thing we are interested in is the frequency of traffic across a particular route.

Let’s take an example. If buses cover 100 miles on a route that is 5 miles long within a certain timeframe, then the frequency will be greater than 100 miles covered on a route that is 10 miles long over the same time period.

Read on for an interesting example.

Comments closed

Checking Functional Dependencies In R Data Frames

Published 2018-11-01 by Kevin Feasel

John Mount shows us how to use the psagg function in wrapr to ensure that functional dependencies are valid:

Notice only grouping columns and columns passed through an aggregating calculation (such as max()) are passed through (the column zis not in the result). Now because y is a function of x no substantial aggregation is going on, we call this situation a “pseudo aggregation” and we have taught this before. This is also why we made the seemingly strange choice of keeping the variable name y (instead of picking a new name such as max_y), we expect the y values coming out to be the same as the one coming in- just with changes of length. Pseudo aggregation (using the projection y[[1]]) was also used in the solutions of the column indexing problem.

Our wrapr package now supplies a special case pseudo-aggregator (or in a mathematical sense: projection): psagg(). It works as follows.

In this post, John calls the act of grouping functional dependencies (where we can determine the value of y based on the value of x, for any number of columns in y or x) pseudo-aggregation.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel

Docker PowerShell Alias #2 – drmf

Docker PowerShell Alias #2 – `drmf`