Press "Enter" to skip to content

Author: Kevin Feasel

Community Localization For Crossplatform Tools

Mona Nasr and Andy Gonzalez are looking for tool translation support:

Community has completed the translations for VScode SQL Server extension for six languages: Brazilian, French, Japanese, Italian, Russian, and Spanish.

We still need help with other languages. If you know anyone with language expertise, refer them to the Team Page.

Your contributions are valuable and will help us improve the product in your languages. We hope to continue working with the community in future projects.

Hit up the Team Page link to learn more about how to contribute.

Comments closed

Linear Regression In SQL

Phil Factor shows how to generate a quick linear regression using SQL, Powershell, and Gnuplot:

It looks a bit like someone has fired a shotgun at a wall but is there a relationship between the two variables? If so, what is it? There seems to be a weak positive linear relationship between the two variables here so we can be fairly confident of plotting a trendline.

Here is the data, and we will proceed to calculate the slope and intercept. We will also calculate the correlation.

It’s good to know that this is possible, but I’d switch to R or Python long before.

Comments closed

Power BI Row-Level Security

Steve Hughes has some resources on implementing row-level security in Power BI:

Row level security is the ability to filter content based on a users role. There are two primary ways to implement row level security in Power BI – through Power BI or using SSAS. Power BI has the ability in the desktop to create roles based on DAX filters which affect what users see in the various assets in Power BI.

In order for this to work, you will need to deploy to a Workspace where users only have read permissions. If the members of the group associated to the Workspace have edit permissions, row level security in Power BI will be ignored.

Read on for more details as well as a set of how-to links.

Comments closed

Managing The Pace Of Change

Kellan Danielson and the rest of the Power Pivot Pro team discuss the pace of change in the data platform:

@djharshany I’ve found Pocket (https://getpocket.com/) really useful for saving items for later. I’m on a schedule as well – I save a lot of articles and then pour through them when I’m on an airplane or waiting in line somewhere. #productivityhack

I think this furious pace of technological development has made me much more aware 1) of the amount of noise out in the world that I’m safe ignoring and 2) of how we need to stay vigilant in producing content that cuts through the noise.

Given that these are people who specialize in the fastest-moving part of the Microsoft data platform, it’s worth getting their thoughts on the rapid pace of change.

Comments closed

Text Normalization With Spark

Engineers at Treselle Systems have put together a two-part series on text normalization using Apache Spark.  First, they walk through normalizing the text:

We have used Spark shared variable “broadcast” to achieve distributed caching. Broadcast variables are useful when large datasets need to be cached in executors. “stopwords_en.txt” is not a large dataset but we have used in our use case to make use of that feature.

What are Broadcast Variables?
Broadcast variables in Apache Spark is a mechanism for sharing variables across executors that are meant to be read-only. Without broadcast variables, these variables would be shipped to each executor for every transformation and action, which can cause network overhead. However, with broadcast variables, they are shipped once to all executors and are cached for future reference.

From there, they dig into details on what the Spark engine did and why we see what we do:

Note: Stage 2 has both reduceByKey() and sortByKey() operations and as indicated in job summary “saveAsTextFile()” action triggered Job 2. Do you have any guess whether Stage 2 will be further divided into other stages in Job 2? The answer is: yes Job 2 DAG: This job is triggered due to saveAsTextFile() action operation. The job DAG clearly indicates the list of operations used before the saveAsTextFile() operations.Stage 2 in Job 1 is further divided into another stage as Stage 2. In Stage 2 has both reduceByKey() and sortByKey() operations and both operations can shuffle the data so that Stage 2 in Job 1 is broken down into Stage 4 and Stage 5 in Job 2. There are three stages in this job. But, Stage 3 is skipped. The answer for the skipped stage is provided below “What does “Skipped Stages” mean in Spark?” section.

There’s some good information here if you want to become more familiar with how Spark works.

Comments closed

SQL Server Migration With dbatools

Garry Bargsley gives an example of how quickly you can migrate a SQL Server instance:

So in just 1 minute and 34 seconds you have migrated all of the following from one server to another.

  • All SP_Configure settings

  • Any Custom Error Messages

  • Any SQL Credentials

  • All Linked Servers

  • Database Mail along with Configuration and Profiles and Accounts

  • All user objects in System Databases

  • All Backup Devices

  • Any System Triggers

  • All User Databases

  • All Logins

  • Any Data collectors

  • Any Security Audits

  • All Endpoints, Policy Management, Resource Governor, Extended Events

  • And Finally All SQL Server Agent Jobs, Schedules, Operators, Alerts

These are probably very small databases (as it was a test instance), but dbatools is quite impressive.

Comments closed

Tidyverse Updates

Hadley Wickham has two announcements.  First, for a slew of tidyverse packages:

Over the couple of months there have been a bunch of smaller releases to packages in the tidyverse. This includes:

  • forcats 0.2.0, for working with factors.
  • readr 1.1.0, for reading flat-files from disk.
  • stringr 1.2.0, for manipulating strings.
  • tibble 1.3.0, a modern re-imagining of the data frame.

This blog post summarises the most important new features, and points to the full release notes where you can learn more.

Second, a new version of dplyr is coming:

dplyr 0.6.0 is a major release including over 100 bug fixes and improvements. There are three big changes that I want to touch on here:

  • Databases
  • Improved encoding support (particularly for CJK on windows)
  • Tidyeval, a new framework for programming with dplyr

You can see a complete list of changes in the draft release notes.

You can already get a tech preview of the new dplyr if you’re interested in trying it out.

Comments closed

Data Classification In Power BI

Steve Hughes describes how Power BI data classification works:

Power BI Privacy Levels “specify an isolation level that defines the degree that one data source will be isolated from other data sources”. After working through some testing scenarios and trying to discover the real impact to data security, I was unable to effectively show how this might have any bearing on data security in Power BI. During one test was I shown a warning about using data from a website with data I had marked Organizational and Private. In all cases, I was able to merge the data in the query and in the relationships with no warning or filtering. All of the documentation makes the same statement and most bloggers are restating what is found in the Power BI documentation as were not helpful. My takeaway after reviewing this for a significant amount of time is to not consider these settings when evaluating data security in Power BI. I welcome comments or additional references which actually demonstrate how this isolation actually works in practice. In most cases, we are using organizational data within our Power BI solutions and will not be impacted by this setting and my find improved performance when disabling it.

As Steve notes, this is not really a security feature.  Instead, it’s intended to be more a warning to users about which data is confidential and which is publicly-sharable .

Comments closed