Kevin Feasel – Page 877

There are so many properties in Spark that affect the way you can add jars to a Spark application. We understand it could be confusing and this post is aimed at giving you clarity on different options and when to use which option.

Read on for the options.

Comments closed

ML.Net and One-Class Matrix Factorization

Published 2021-01-07 by Kevin Feasel

Sergey Tihon notices a problem:

After reading all these 3 samples I realised that I do not fully understand what is Label column is used for. Later I came to a conclusion that all three samples most likely are incorrect and here is why.

Click through for a description of the problem as well as the answer.

Comments closed

Configuring a Linked Server to Oracle

Published 2021-01-07 by Kevin Feasel

Emanuele Meazzo needs to pull data from Oracle into SQL Server:

The most atrocious part of my search for glory was without doubt navigating all the packages to download and install for each component, between broken links and differences between the instructions and the actual content, it’s a mess.

It took a while, based on Emanuele’s tone. With SQL Server 2019, you can avoid some of this pain by using PolyBase. But for prior versions of SQL Server, your options are more limited.

Comments closed

From Azure Synapse Analytics to Power BI

Published 2021-01-07 by Kevin Feasel

Wolfgang Strasser has a project for us:

In todays blog post I would like to build an end-to-end solution to combine data coming from different sources and stored in different form factors into a single Power BI data model using Azure Synapse Analytics.

Click through for the full demo.

Comments closed

Top N with Others in Power BI

Published 2021-01-07 by Kevin Feasel

Marco Russo and Alberto Ferrari cover a pain point in Power BI:

The VisibleProducts variable contains a list of products for the selection currently displayed in the visual. In the example, we have the top 3 products for each Product Category included in our report. The ranking that is returned is only up to the value selected in the TopN parameter – for this reason, we can use the result of Ranking by Sales to filter the visual, including only the products ranked in the 1-to-TopN Value range. We use a filter in the Power BI filter pane to accomplish this task.

This is a common enough pattern that I do wish Power BI made it easy.

Comments closed

Choosing an ML Algorithm

Published 2021-01-06 by Kevin Feasel

Hui Li developed a flow for determining appropriate machine learning algorithms:

Since the cheat sheet is designed for beginner data scientists and analysts, we will make some simplified assumptions when talking about the algorithms.
The algorithms recommended here result from compiled feedback and tips from several data scientists and machine learning experts and developers. There are several issues on which we have not reached an agreement and for these issues we try to highlight the commonality and reconcile the difference.
Additional algorithms will be added in later as our library grows to encompass a more complete set of available methods.

Read the whole thing.

Comments closed

Basic Theory on Correlation Analysis, Using R

Published 2021-01-06 by Kevin Feasel

Petr Baranovskiy wants to take us through the key concepts of correlation analysis, starting with basic theory:

When I was learning statistics, I was surprised by how few learning materials I personally found to be clear and accessible. This might be just me, but I suspect I am not the only one who feels this way. Also, everyone’s brain works differently, and different people would prefer different explanations. So I hope that this will be useful for people like myself – social scientists and economists – who may need a simpler and more hands-on approach.
These series are based on my notes and summaries of what I personally consider some the best textbooks and articles on basic stats, combined with the R code to illustrate the concepts and to give practical examples. Likely there are people out there whose cognitive processes are similar to mine, and who will hopefully find this series useful.

This is clear and well-written, so check it out even if you feel like you have a solid understanding of the topic.

Comments closed

Performance Impact of Foreign Keys with Non-Default ON UPDATE or ON DELETE

Published 2021-01-06 by Kevin Feasel

Hugo Kornelis continues a dive into foreign keys:

Welcome to part fifteen of the plansplaining series. In the three previous parts I looked at the operators and properties in an execution plan that check a modification doesn’t violate foreign key constraints. That part is done. But I’m not done with foreign keys yet.
We normally expect foreign keys to throw an error on violations. But that’s actually only the default option: they can also be set to be self-correcting. This is done using the ON UPDATE and ON DELETE clauses, which provide the user with several choices on how to handle child data that would become orphaned, and hence violate the constraint, as a result of a change in the parent table.

Read on to see how these operate in SQL Server.

Comments closed

Writing a Python Language Extension for ML Services

Published 2021-01-06 by Kevin Feasel

Niels Berglund shows how you can bring your own Python 3.9 runtime to SQL Server Machine Learning Services:

When I wrote we’d look at it in a future post I thought to myself; “how hard can it be?”. I had read the steps of how to build a Python language extension for Windows here, and it didn’t seem that hard: some Boost, CMake, compile, and Bob’s your uncle! Well, it turned out it was somewhat more complicated than what I anticipated. So, if you are interested – read on!

I was going to say that the steps seem a bit complicated but not overly terrible, though Niels’s conclusion leaves me wondering.

Comments closed

Contrasting Data Warehouses with Power BI Dataflows

Published 2021-01-06 by Kevin Feasel

Reza Rad makes a comparison:

Dataflow is the data transformation service in Power BI, and also some other Power Platform services. Data Warehouse is the cloud storage and also compute engine for data. I often get this question that: “Now that we have dataflow in Power BI, should we not use the Data warehouse? What are the differences? which is better? When to use what?” This article and video, explains answer to these questions.

I’m probably a bit lower on self-service BI compared to others. When I see something like Dataflows, it reminds me too much of a mess of Excel spreadsheets on shared drives. There’s a lot of relevant business knowledge embedded in those disbursed locations, and bringing it together becomes as much a forensic exercise as it is architectural.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Author: Kevin Feasel

Adding Jars to a Spark Application

ML.Net and One-Class Matrix Factorization

Configuring a Linked Server to Oracle

From Azure Synapse Analytics to Power BI

Top N with Others in Power BI

Choosing an ML Algorithm

Basic Theory on Correlation Analysis, Using R

Performance Impact of Foreign Keys with Non-Default ON UPDATE or ON DELETE

Writing a Python Language Extension for ML Services

Contrasting Data Warehouses with Power BI Dataflows