Press "Enter" to skip to content

Author: Kevin Feasel

Installing .NET Notebooks for Powershell

Max Trinidad shows us how to install .NET Interactive on Linux:

In Windows, just takes a few steps to set it up. For Linux, it takes a few extra steps but still is quick enough to get you started.

For Windows, follow the instructions found at the .NET Interactive page in Github.

For Linux, for Ubuntu 18.04, follow the blog post “Ubuntu 18.04 Package Manager – Install .NET Core“.

Basically, in either operating systems, you install:

Install the .NET Core SDK
Install the ASP.NET Core runtime
Install the .NET Core runtime

Click through for the step-by-step instructions. Once you have it done, you get not only Powershell but also F# and C#.

Comments closed

T-SQL Tuesday Life Hacks

Jess Pomfret wraps up a T-SQL Tuesday on life hacks:

A lot of people had more than one life hack so I recommend reading all of the posts linked to below. For this summary post I’ve tried to pick one or two hacks from each post and group them into logical buckets.

There are several hacks shared that I plan on integrating into my life, and I hope this post will serve as a good reference for us all going forward.

There were some interesting entries in here.

Comments closed

Reducing Visual Count to Improve Performance

Chris Webb explains that you might get better performance in Power BI with fewer visuals:

Before we go any further, I don’t want you to go and change your reports if you’re not going to get any benefit from doing so. Use Performance Analyzer (as shown here) to determine which visuals on your report are the cause of slow performance – there’s no point redesigning visuals that are fast anyway.

As a general rule the more visuals you put on a report page the slower it’s going to get. It’s logical if you think about it: the more visuals there are, the more queries have to be run against your dataset and the more work Power BI has to do to render the report. I know there is a tendency to try to pack as much information onto a page as possible and this often happens when someone else has designed the report you’re trying to build, but you should always try to resist this. Splitting a single large page into multiple smaller pages, using slicers or filters to reduce the amount of data shown at any one time and avoiding gigantic Excel-like tables are a good idea.

It certainly doesn’t mean “get rid of all of your visuals;” after all, speed is only one part of the story. Read the whole thing.

Comments closed

Slow Record Cleanup

Jared Poche investigates a slow record deletion process:

I encountered a curious issue recently, and immediately knew I needed to blog about it. Having already blogged about implicit conversions and how the TOP operator interacts with blocking operators, I found a problem that looked like the combination of the two.

I reviewed a garbage collection process that’s been in place for some time. The procedure populates a temp table with the key values for the table that is central to the GC. We use the temp table to delete from the related tables, then delete from the primary table. However, the query populating our temp table was taking far too long, 84 seconds when I tested it.

Read on to understand why.

Comments closed

Benchmarking Apache Hadoop Ozone

Istvan Fajth and Mukul Kumar Singh take us through a benchmarking test of Apache Hadoop Ozone:

Apache Hadoop Ozone was designed to address the scale limitation of HDFS with respect to small files and the total number of file system objects. On current data center hardware, HDFS has a limit of about 350 million files and 700 million file system objects. Ozone’s architecture addresses these limitations[4]. This article compares the performance of Ozone with HDFS, the de-facto big data file system. 

We chose a widely used benchmark, TPC-DS, for this test and a conventional Hadoop stack consisting of Hive, Tez, YARN, and HDFS side by side with Ozone. True to the current industry need for separation of compute and storage, which enables dense storage nodes and elastic compute, we run these tests with the datanodes and node managers segregated. The fundamental ambition of this endeavor, and the subsequent effort in optimizing the product, is to be comparable in terms of stability and performance to HDFS. To that end, we would like to call out the amazing amount of work put in by the community over the past several months towards this goal.

It’s interesting to watch the Hadoop community work through these sorts of challenges, where the hardware paradigm has differed quite a bit from when HDFS was created.

Comments closed

An Overview of Generative Adversarial Networks

Mohammad Waseem takes us through an overview of Generative Adversarial Networks:

Generative models are nothing but those models that use an Unsupervised Learning approach. In a generative model, there are samples in the data i.e input variables X, but it lacks the output variable Y. We use only the input variables to train the generative model and it recognizes patterns from the input variables to generate an output that is unknown and based on the training data only.

In Supervised Learning, we are more aligned towards creating predictive models from the input variables, this type of modeling is known as discriminative modeling. In a classification problem, the model has to discriminate as to which class the example belongs to. On the other hand, unsupervised models are used to create or generate new examples in the input distribution.

To define generative models in layman’s terms we can say, generative models, are able to generate new examples from the sample that are not only similar to other examples but are indistinguishable as well.

Click through for the overview.

Comments closed

Monitoring Power BI Data Gateway Performance

Olya Musokhranova shows how we can track throughput and utilization of the Power BI On-Premises Data Gateway:

In June 2019, Microsoft made Gateway Monitoring much easier via the addition of structured logs that can be enabled in the configuration file, and by providing a starter Gateway Performance Power BI report template to visualize the results.

These logs give full access to the information, minimizing the need to pull counter data from IT management systems, like the Microsoft System Center Operations Manager (SCOM); or run the Windows Performance Toolkit and dig through dust-covered gateway info and error log files; or pull refresh error messages from the Power BI Service. Enabling structured Gateway Monitoring logging doesn’t add any significant load to the server, so that’s another win.

Read on for a schema and inspiration on how to set up and configure your own dashboard.

Comments closed

NULL Handling Features not in T-SQL

Itzik Ben-Gan continues a series complexity around NULL:

When using the offset window functions LAG, LEAD, FIRST_VALUE and LAST_VALUE, sometimes you need to control the NULL treatment behavior. By default, these functions return the result of the requested expression in the requested position, irrespective of whether the result of the expression is an actual value or a NULL. However, sometimes you want to continue moving in the relevant direction, (backward for LAG and LAST_VALUE, forward for LEAD and FIRST_VALUE), and return the first non-NULL value if present, and NULL otherwise. The standard gives you control over this behavior using a NULL treatment clause with the following syntax:

offset_function(<expression>) IGNORE_NULLS | RESPECT NULLS OVER(<window specification>)

There are three good examples of functionality around handling NULL which the current implementation of T-SQL is missing.

Comments closed

T-SQL Checker ADS Extension

Daniel Janik has a new Azure Data Studio extension for us:

I’ve created a sample ads extension that checks TSQL syntax in real-time for potential bad practice.

Right now the extension is using regex, which isn’t the best for parsing SQL but it works for the few test cases I’ve added.

I’m hoping that the community can help evolve the project into something really cool; so, I’m asking for your help in making that happen.

Click through for a demo, and check out the GitHub repo.

Comments closed

Power BI Query Diagnostics

Paul Turley has a video covering Power BI query performance:

This post demonstrates how the order of steps added to a query can make a big performance difference and drastically effect the number of steps generated by the designer. I’ll demonstrate how to use the new query Diagnostics tools to compare and understand query performance.

The Power Query Editor for Power BI simplifies data transformation processing by generating query steps for each action you perform in the query designer. This whiteboard diagram shows the high-level flow of information through a Power BI solution. Every query has a source (“SRC” in the diagram) followed by a connection. The query consists of a series of transformations (“XForm”) prior to populating a table in the data model.

Read on for a high-level explanation followed by a video which covers the Query Diagnostics feature.

Comments closed