Kevin Feasel – Page 421

Measuring Bandwidth with Powershell

Published 2023-06-22 by Kevin Feasel

Patrick Gruenauer checks download speed:

Short explanation: A file will be downloaded from my homepage. It’s a video about Powershell Profiles in German language. During the download of the file the process is measured with the Measure-Command cmdlet. At the end we get the final result and the time it took to download the file.

You can, of course, change the file location to whatever makes sense, but it’s nice to have something handy and not need to go to any websites for a speed test.

Comments closed

The Cost and Difficulty Level of Changes

Published 2023-06-22 by Kevin Feasel

Richard Swinbank has an image for us:

I spent some time working at a property portal, where users could look at online listings of homes for sale or rent, then go on to book a viewing appointment with an estate agent. On one occasion we were asked to build a Power BI report showing:

the number of appointments booked by portal users

the percentage of appointments where the user had viewed the online listing more than once.

Sounds easy enough, right?

Click through for the image. It makes intuitive sense, but is a good visual depiction of why some data requests are more challenging than others.

Comments closed

An Overview of Statistics in SQL Server

Published 2023-06-22 by Kevin Feasel

Matthew McGiffen has a primer for us:

Statistics are vitally important in allowing SQL Server to find the most efficient way to execute your queries. In this post we learn more about them, what they are and how they are used.

Read on for plenty of information about stats. One thing I’d emphasize in this is that, for the most part, auto-generated stats are fine. The only time I’ve ever seen any value in creating my own stats is in multi-column statistics (which auto-generated stats doesn’t do), and even that’s of fairly limited value, at least in my experience. I’m sure that there’s a point in which hand-crafting your own statistics makes a tiny but noticeable marginal difference, but the systems I’ve worked with tend not to be anywhere near that point. There are usually much bigger fish to fry.

Comments closed

Running SqlBulkCopy in Parallel from Powershell

Published 2023-06-22 by Kevin Feasel

Jose Manuel Jurado Diaz has a script for us:

Today, we encountered an interesting service request of attempting to reduce the load times for 100,000 records from a table with 97 varchar(320) fields in an Azure SQL HyperScale database. Following, I would like to share my lessons learned here.

The idea is to split in different concurrent process the execution of multiples SqlBulkCopy. In this case, we are going to split this process in 5 processes running in parallel inserting 20,000 rows, let’s try to know the total size.

Read on for the script, as well as a rough idea of how long it’ll take inserting into an Azure SQL DB Hyperscale instance.

Comments closed

Indexes and Stats on Tables with Always Encrypted

Published 2023-06-21 by Kevin Feasel

Matthew McGiffen gives us the low-down:

In a previous post we looked at executing queries against columns encrypted using Always Encrypted. In this short post we look at the ability to be able to have (or not) indexes on those columns. This information is relevant to the case where you are using Always Encrypted without enclaves, we’ll look at working with enclaves later on.

Click through to see how this all works.

Comments closed

Tools for Optimizing Azure SQL MI Performance

Published 2023-06-21 by Kevin Feasel

Rie Merritt breaks out the toolbox:

Azure SQL Managed Instance provides options within and outside Azure portal for troubleshooting and optimizing performance. Within the portal, you can leverage automatic tuning and Intelligent Insights. Outside of the Azure Portal, you can take advantage of the capabilities that are already in the database engine, such as query store and dynamic management views (DMV). In addition, Microsoft offers several monitoring options that are in preview: Azure SQL Insights inside Azure Monitor, which requires an agent on a VM you own, Azure SQL Analytics, and Azure diagnostic telemetry.

Automatic tuning in SQL Managed Instance supports FORCE LAST GOOD PLAN, which identifies queries using an execution plan that is slower than the previous good plan. It forces queries to use the last known good execution plan. Since the system automatically monitors the workload performance, in case of changing workloads, the system dynamically adjusts to force the best performing query execution plan.

Many of the things Rie describes are also available on-premises, though Azure SQL Analytics is only available in Azure SQL DB and Azure SQL MI, as of the time of this post.

Comments closed

Power BI Visual Has Exceeded the Available Resources

Published 2023-06-21 by Kevin Feasel

Chris Webb diagnoses an error:

One of my most popular blog posts of the last few years is this one on the “Visual has exceeded available resources” error in the Power BI Service:

https://blog.crossjoin.co.uk/2020/01/20/visual-has-exceeded-the-available-resources-error-power-bi/

This error only used to appear in the Power BI Service, but the good news is – and trust me, this is good news – it may now appear in Power BI Desktop too following the May 2023 release.

It’s not often good news that you get a new error, but knowing that the Service will behave a certain way and replicating that on Desktop, at least it prevents an issue from popping up in production that you can’t find during initial development.

Comments closed

A Primer on Microsoft Fabric Notebooks

Published 2023-06-21 by Kevin Feasel

Leila Etaati provides an explanation:

In Fabric, there are tools for different personas of the users to work with. For example, for a citizen data analyst, Dataflows and Power BI Datasets are the tools with which the analyst can build the data model. For Data Engineers and Scientists, one of the tools is Notebook.

The Notebook is a place to write and run codes in languages such as; PySpark (Python), Spark (Scala), Spark SQL, and SparkR (R). These languages are usually familiar languages for data engineers and data scientists. The Notebook provides an editor to write code in these languages, run it in the same place, and see the results. Consider this as the coding tool for the data engineer and scientist.

Click through for a video, as well as a regular blog post.

Comments closed

Code Is a Liability

Published 2023-06-21 by Kevin Feasel

Nate McMaster (indirectly) talks dollars and cents:

Early in my software engineering career, a senior engineer at Microsoft told me “the best solution is one that requires no new code.” At the time, I thought this was nonsense. Is not my role as a software engineer to write code? Why would writing less or no code be better? More code means more bug fixes, more features, more services, and more tools. So why is more not always better?

Fast forward to 2023 – now I am the most senior engineer on a team, and I give the same guidance. Prefer solutions that require less or no code.

What led to this shift in perspective?

Read on for Nate’s answer, which is well-written and makes a lot of sense. It’s also close to a topic I’ve written about in the past.

Comments closed

Verti-Parquet and DirectLake in Fabric

Published 2023-06-21 by Kevin Feasel

Jordan Witcombe provides an explanation:

The VertiPaq engine cleverly uses columnar storage for efficient querying and processing. It employs multiple compression techniques, including Run-Length Encoding (RLE) and Dictionary Encoding, to minimise storage space. Through finding optimal sort orders and value encoding, it achieves maximum space efficiency and performance. VertiPaq also utilises ‘In-Memory Column Store’ for fast query performance, ‘Predicate Pushdown’ to eliminate unnecessary data at query time, and ‘Block Decompression’ to only decompress relevant data blocks, making it a powerhouse for data management and retrieval.

Now, because of these ingenious tricks, we wave goodbye to traditional file formats like JSON or CSV. Instead, all data stored within the managed area of Fabric and OneLake uses either Parquet or Delta. It’s time to embrace these efficient, high-performing formats that bring the best out of VertiPaq’s compressive power. Let’s explore these further in the next section.

Read on for some comparisons in file size between Fabric and Databricks, as well as how they perform in Power BI.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Author: Kevin Feasel