Kevin Feasel – Page 1049

From pandas to Spark with koalas

Published 2019-05-24 by Kevin Feasel

Python is widely used programming language when it comes to Data science workloads and Python has way too many different libraries to back this fact. Most of the data scientists are familiar with Python and pandas mostly. But the main issue with Pandas is it works great for small and medium datasets but not so good on big-data workloads. The challenge now becomes to convert the existing pandas code to pyspark code. This is just not straight forward and has a lot of performance hits if python UDFs are used without much care.
Koalas tries to address the first problem ie lessen the friction of learning different APIs to port their existing Pandas code to Pyspark. With Koalas, we can just directly replace the existing pandas code with Koalas. As far as the performance goes, there are no numbers yet as it is still in the initial phase of the project. But this definitely looks promising though.

Read on for some initial thoughts on the product, including a few gotchas.

Comments closed

Storing Large Images in Power BI

Published 2019-05-24 by Kevin Feasel

Chris Webb shows us how to store a large image in Power BI:

Jason Thomas and Gerhard Brueckl have both blogged on the subject of storing images as text inside a Power BI dataset:
http://sqljason.com/2018/01/embedding-images-in-power-bi-using-base64.html
https://blog.gbrueckl.at/2018/01/storing-images-powerbi-analysis-services-data-models/
Since they wrote those posts, however, Power BI has added the ability to set the Data Category property on measures as well as columns in tables. This means it is now possible to have the output of a DAX measure displayed as an image in a Power BI report and this in turn opens up a lot of new possibilities – including the ability to work around the maximum size of a text value that can be loaded into Power BI (see my previous blog post for more details) and therefore work with larger images.

I don’t understand why they make this so complicated. I have a Grafana dashboard widget that I show in a Power BI dashboard and have it scaled way down so it fits in under 32K. I appreciate Chris’s answer but that’s a lot of work to show an image.

Comments closed

Creating Containers and Volumes

Published 2019-05-24 by Kevin Feasel

Grant Fritchey continues a dive into containers. First up is running a Docker container:

Let’s break this down a bit so you know what you just did. The two ‘-e’ statements are setting environment variables. The first is accepting Microsoft’s end user license agreement, EULA. The second is setting the SA password. By default, we’re running a Developer Edition of SQL Server here. If you want to, you can change to a version that you have a specific license for using the MSSQL_PID environment variable. Documentation for that is located here.

Next is using volumes:

Now, let’s create a new container, but, let’s use the same volume:
docker run -e 'ACCEPT_EULA=Y' ` -e 'SA_PASSWORD=$cthulhu1988' ` -p 1450:1433 ` --name DockerDemo19 ` -v sqlvol:/var/opt/mssql ` -d mcr.microsoft.com/mssql/server:2019-CTP2.5-ubuntu
What happens next is marvelous.

It’s an exciting time to get into containers and if you’re feeling a little trepidatious, they’re containers—the worst thing you can do is mess one up and then you just blow it away and start over.

Comments closed

New Query Store Functionality in 2019

Published 2019-05-24 by Kevin Feasel

Erin Stellato is excited about SQL Server 2019 CTP 3.0:

Friends, CTP 3.0 dropped today, and it includes some changes for Query Store in SQL Server 2019! I am so excited!! I’ve downloaded it and have WideWorldImporters installed and have a lot of testing planned, but if you’re impatient, guess what? The documentation is already updated! If you check out the ALTER DATABASE SET page you will see that Query Store now has a new option for QUERY_CAPTURE_MODE: CUSTOM. For those of you with ad hoc workloads, this will help.

Read on to see how it can help.

Comments closed

Debugging DAX Calculations

Published 2019-05-24 by Kevin Feasel

Imke Feldmann has a debugger measure for DAX:

This is a measure that returns a text-value, showing the number of rows of the adjusted filter context table, the MIN and MAX value of the selected column as well as up to the first 10 values. Just place this measure beneath the CALCULATE-measure in question and try to find the error
Just have in mind, that this only works for standalone CALCULATE-functions and not for those who are nested in other functions (who modify the evaluation context).

This looks to be quite useful.

Comments closed

Quick Hits on Managed Instance Backup / Restore

Published 2019-05-24 by Kevin Feasel

Jovan Popovic has some pieces of advice for backing up and restoring databases on Azure SQL Managed Instances:

Managed Instance takes automatic backups (full backups every week, differential every 12 hours, and log backups every 5-10 min) that you can use to restore a database to some point of time in past within the retention period, restore accidentally deleted database. For more information, see Automated backups. Managed Instance also enables you to restore a database from a backup file placed on Azure Blob Storage, backup a database to Azure Blob Storage. Managed Instance currently don’t support backup retention longer than 35 days, but you can use backups to blob storage as an alternative.
If you are experiencing some issues with any backup or restore operation, the following troubleshooting steps might help you to identify the issue.

Click through for those hints.

Comments closed

Using Filtered Indexes

Published 2019-05-24 by Kevin Feasel

Monica Rathbun fills us in on filtered indexes:

What is a filtered index?
Simply it’s an index with a where clause. It is an optimized non clustered index that can be narrowed down in scope to better fit a subset of data. Example being date ranges, years, non NULLs or specific product types.

I wish filtered indexes were better than they are because they can solve some interesting problems but get stuck on parameterized queries.

Comments closed

Exploratory Data Analysis with inspectdf

Published 2019-05-23 by Kevin Feasel

Laura Ellis continues a dive into Exploratory Data Analysis, this time using the inspectdf package:

I like this package because it’s got a lot of functionality and it’s incredibly straightforward to use. In short, it allows you to understand and visualize column types, sizes, values, value imbalance & distributions as well as correlations. Better yet, you can run each of these features for an individual data frame, or compare the differences between two data frames.
I liked the inspectdf package so much that in this blog, I’m going to extend my previous EDA tutorial with an overview of the package.

There are some interesting functions which make EDA easier, so check it out.

Comments closed

MRAN Changes and a Survey

Published 2019-05-23 by Kevin Feasel

David Smith discusses potential changes to MRAN:

As CRAN has grown and changes to packages have become more frequent, maintaining MRAN is an increasingly resource-intensive process. We’re contemplating changes, like changing the frequency of snapshots, or thinning the archive of snapshots that haven’t been used. But before we do that we’d like to hear from the community first. Have you used MRAN snapshots? If so, how are you using them? How many different snapshots have you used, and how often do you change that up? Please leave your feedback at the survey link below by June 14, and we’ll use the feedback we gather in our decision-making process. Responses are anonymous, and we’ll summarize the responses in a future blog post. Thanks in advance!

Please take the survey as well. If you’ve used SQL Server Machine Learning Services (or SQL Server R Services), you’ve used MRAN.

Comments closed

Debugging a Pivot

Published 2019-05-23 by Kevin Feasel

Ed Elliott takes us through problems with the PIVOT statement:

If you have a PIVOT query and it isn’t returning the data you expect, what can you do to troubleshoot it? The thing to do is to break it down into the constituent parts. First, lets take a look at a query and see what we can do to help.

Click through for potential problems and their solutions.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel