Kevin Feasel – Page 1018

Two Performance Tricks for Spark SQL

Published 2020-02-20 by Kevin Feasel

Divyansh Jain shares a couple of tips when optimizing Apache Spark code:

1. Avoid UDFs. But why..?
Because internally, Catalyst doesn’t optimize and process UDFs at all, which results in losing the optimization level. Instead, try using SparkSql API to develop your application.

Click through for a demo and for the second tip.

Comments closed

Digging Into Bar Charts

Published 2020-02-20 by Kevin Feasel

Alex Velez takes us through the humble bar chart:

Our eyes start at the base and scan towards the end of each bar. We measure the lengths relative to both the baseline and the other bars, so it’s a straightforward process to identify the smallest or the largest bar. We can also see the negative space between varying heights of bars to compare the incremental difference between them.
Not only are these graphs easy to read, but they are also widely recognized. Chances are, you’ve already encountered a standard horizontal or vertical bar chart. But bars come in many shapes and sizes. I’ll list below a few of the most common variations, with links to examples.

Click through for some good information on bar charts, including design tips.

Comments closed

Disaster Recovery for Your Workstation

Published 2020-02-20 by Kevin Feasel

Randolph West explains that disaster recovery isn’t just for your servers:

I just completed a chapter for another book where I spoke about the Recovery Point Objective (how much data you are prepared to lose) and Recovery Time Objective (how long you have to bring your environment up again) after a disaster, and while I never get tired of repeating myself, that’s SQL Server. What happens if your development environment — or workstation — experiences a catastrophic failure?
Or what if, say, you’re on a cruise ship in the middle of the ocean with Internet access and a phone (but no laptop) and your on-call person just died? (I’ll leave this as an exercise for the reader to decide if this really happened.)
The answer is, if we do a careful bit of planning using the same disaster recovery principles we already know, the impact could be minimal. Note that this post assumes that you have Internet access and are using Microsoft Windows as your environment.

Click through for some useful suggestions.

Comments closed

Reading Azure DevOps Results in Powershell

Published 2020-02-20 by Kevin Feasel

Mark Broadbent doesn’t let the lack of an official Powershell module get in the way:

In my post Using Azure CLI to query Azure DevOps I explained how you can use the Azure CLI to query Azure DevOps so you can obtain useful information on builds, releases, and other useful information. The solution required a certain level of skill with JMESPath to manipulate your result sets -which as explained can be a little confusing.
However once you have a bare bones result set, it is likely that you will want to consume these results in a more user-friendly environment such as PowerShell so that you can build upon these data sets. I thought this would be an easy thing to do, but as you will see below it was anything but.

Read on for some thoughts and a sample script.

Comments closed

Working with Unicode in Powershell

Published 2020-02-20 by Kevin Feasel

Mark Wilkinson takes us through various problems when working with Unicode text in Powershell on Windows:

This post is inspired by an odd situation I ran into in a project I’m working on. I have the need to pull specific revisions of files out of a git repository, save those files, and then execute the contents. This all worked fine until it didn’t. I received some complaints that unicode characters in the files we getting mangled, and sure enough they were. But why? In this post I’ll explain what happened to me, and ways you can avoid it yourself.

Read on to learn how.

Comments closed

Removing Ad Hoc Plans from the Query Store

Published 2020-02-20 by Kevin Feasel

Jeff Iannucci has a script which removes ad hoc plans from the Query Store:

Now, rather than being my usual rambling self I want to be very direct here: this solution will NOT give you the same behavior as “optimize for ad hoc workloads.” That setting keeps query info without the plan during the first execution, but then keeps the plan after the second execution.
That’s kinda like a surgeon with a scalpel. What is below is much more drastic. We’re going to break out a chainsaw for Query Store.

Chainsaw solutions to scalpel problems? Now you have my interest.

Comments closed

Finding the Physical Location of a Row

Published 2020-02-20 by Kevin Feasel

Max Vernon breaks out the internals toolbag:

Occasionally I’ve needed to determine the physical location of a row stored in SQL Server. The code in this post uses the undocumented feature, %%PHYSLOC%%, which returns a binary representation in hexadecimal of the location of each row returned in a SELECT statement. The system table valued function, fn_PhysLocCracker, is used to decode the binary value returned by %%PHYSLOC%% to provide the file_id, page_id, and slot_id for each row.

Read on for a demo. Unlike most demos of this sort, Max is using a partitioned table, so that’s something new.

Comments closed

Options for Read-Only Licensing with Power BI

Published 2020-02-20 by Kevin Feasel

Reza Rad explains that, depending on how much you’re willing to pay, there are ways of letting users view your dashboards for free:

In most of my presentations all around the world, I still get this question often: “Is there a Read-Only license for Power BI?”, and often starts with “I have some end-users, who are not building any reports, I don’t want to pay for Developer License for them”. I have written about Licensing in Power BI previously, however, I believe that the article is not explaining it clearly enough and there are still some questions around it. So here I am going to talk about this only: The Read-Only license for Power BI.

Read on for the answers. It’s not all terrible news, but at the very low end, the answer isn’t great.

Comments closed

Gartner’s Magic Quadrant for Data Science + ML

Published 2020-02-19 by Kevin Feasel

Adam Conway, et al, review Gartner’s 2020 Data Science and Machine Learning Platforms Magic Quadrant:

Gartner has released its 2020 Data Science and Machine Learning Platforms
Magic Quadrant, and we are excited to announce that Databricks has been recognized as a Leader.

Click through for the quadrant and explanation. It’s an interesting set of results, that’s for sure.

Comments closed

Using Sqoop to Import Data into HDFS

Published 2020-02-19 by Kevin Feasel

Jon Morisi has a primer on Sqoop:

In this article, I’ll walk through using Sqoop to import data to Hadoop (HDFS).
“Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.”

With respect to SQL Server, Sqoop has two good use cases: pulling data from SQL Server into HDFS, and pulling data from HDFS into a staging table in SQL Server.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Author: Kevin Feasel