March 2022 – Page 2 – Curated SQL

Filling Values to the Right (or Left) in Power Query

Published 2022-03-30 by Kevin Feasel

The first function argument takes the table you want to apply the function on. The second argument is the list of column names that shall be filled up into empty values to the right. In the example in the function documentation, this is: {“H1”, “H2”, “H3”}. The curly brackets define a list object in Power Query and its list elements must be put in quotes if they shall represent strings. So here the columns H1, H2 and H3 are included.
An optional 3rd argument can be used to fill to the left instead. You can fill in any value there, so once it is used, the fill will work to the left instead.

This might not be something you use on a daily basis but I will say I’ve run into situations in which having a function like this at hand would have been quite valuable.

Comments closed

Migrating SSIS On-Prem Workloads into Azure

Published 2022-03-30 by Kevin Feasel

Jitendra Yadeo has put together a how-to guide:

– There can be scenario where organization wants to migrate there existing SSIS ETL process on cloud so instead of rewriting SSIS package using Cloud specific ETL tools like Azure Data Factory we can directly migrate SSIS packages and call it through Azure Data Factory.
– Goal of this blog is to show how SSIS packages hosted on on-premise can be migrated to Azure Data Factory (ADF) using Azure-SSIS Integration Runtime (IR).

Read on for a step-by-step guide.

Comments closed

Simple Parameterization and Data Types

Published 2022-03-30 by Kevin Feasel

Paul White continues a series on simple parameterization:

The details of how each data type is guessed are complex and incompletely documented. As a starting point, SQL Server infers a basic type from the textual representation of the value, then uses the smallest compatible subtype.
For a string of numbers without quotation marks or a decimal point, SQL Server chooses from tinyint, smallint, and integer. For such numbers beyond the range of an integer, SQL Server uses numeric with the smallest possible precision. For example, the number 2,147,483,648 is typed as numeric(10,0). The bigint type isn’t used for server-side parameterization. This paragraph explains the data types selected in the prior examples.

Read the whole thing to see what’s in and what’s out, as well as what this all means.

Comments closed

Exponential Smoothing in Excel

Published 2022-03-30 by Kevin Feasel

Chris Webb starts an interesting series:

[This function] calculates or predicts a future value based on existing (historical) values by using the AAA version of the Exponential Smoothing (ETS) algorithm. The predicted value is a continuation of the historical values in the specified target date, which should be a continuation of the timeline. You can use this function to predict future sales, inventory requirements, or consumer trends.
Recently I started playing around with this function to see how it could be used with cube functions and since I learned so many interesting things I thought it would make a good series of blog posts.

Read on for an example of the normal way to use this function.

Comments closed

Visualizing Air Pressure Spikes from the Hunga Tonga Eruption in R

Published 2022-03-29 by Kevin Feasel

Neil Saunders reviews some personal weather station data:

Wow. Now, pause for a moment and try to recall the last time you read any news about Tonga since the event.
The eruption sent an atmospheric pressure wave, clearly visible in this imagery, around the world. Friends online reported that this was detected by their personal weather stations (PWS) which made me wonder: was the wave apparent in online weather station data and can it be visualized using R?
The answers are yes and yes again.

Read on to see how.

Comments closed

Writing a Single JSON File in Databricks

Published 2022-03-29 by Kevin Feasel

Falek Miah performs a surprisingly difficult task:

When writing to a JSON destination using the DataFrameWriter the dataset is split into multiple files to reflect the number of RDD partitions in the dataframe when in memory – this is the most efficient way for Spark to write data out.
However, this creates a directory containing the data files, as well as Spark metadata files…but what if you just wanted a single JSON file? It’s a scenario that comes up a lot with our clients and, despite it not being the most efficient way to use Spark, we need to implement it all the same.

Click through to see how to do this, including the removal of all metadata files (committed, started, and success files).

Comments closed

Configuring FIDO2 for Azure Active Directory

Published 2022-03-29 by Kevin Feasel

Joey D’Antoni takes us through a process:

If this sounds scary, and it does to me, who is by far not an expert in all things security, but knows a little bit, you may ask, what are some alternative solutions? The answer to that question is Fido2, a different protocol for MFA and auth. Remember all of that stuff Microsoft talks about with passwordless login? That’s all based around Fido2. I configured this for DCAC’s Azure Active Directory yesterday, and I wanted to walk you through the steps.

Click through to learn how.

Comments closed

Intelligent Cache for Spark in Synapse

Published 2022-03-29 by Kevin Feasel

Avinanda Chattapadday makes an announcement:

Traditionally, when querying a file or table from your data lake, the Apache Spark engine in Synapse makes a call to your remote ADLS Gen2 storage for each read of the data. For workloads with frequent repeat queries, this process can be redundant and add latency to the overall processing time. Although Apache Spark provides a great caching feature, it must be manually set and released to minimize the latency and improve overall performance. It can also result in queries of stale data if the underlying data changes. This is where the intelligent cache in Azure Synapse can simplify the process; by automatically detecting changes to the underlying files and automatically refreshing them in the cache, you ensure you have access to the most recent data. When the cache reaches its size limit, it will automatically release the least-read data to make space for more recent data.

Click through to see how you can enable this, as well as a few more details on the process.

Comments closed

The Practical Costs of Index Fragmentation

Published 2022-03-29 by Kevin Feasel

Tibor Karaszi digs into index performance:

See numbers and diagrams at the end, or at the top. I measured a few cases: the difference between no external fragmentation and severe external fragmentation (over 99%). I have both a narrow index and a wide index, and I read one (1), 10,000 and 100,000 rows using index searches (“range scan”). There were obviously no difference reading 1 row so I exclude that from my discussion below. For the other cases the extra time with an extreme level of external fragmentation is (from lowest impact to highest) 7%, 10%, 13% and 32%. The highest number (32%) is when reading many rows from a narrow index, i.e. many rows per page. Again, this is with an extreme level of fragmentation.

What’s interesting is that for the most part, there’s a negligible difference between ~0% internal fragmentation and ~99% internal fragmentation. The follow-on question is, how much are defrag operations costing you in performance and when is the benefit worth the cost?

Comments closed

Finding Free Space in a SQL Server Filegroup

Published 2022-03-29 by Kevin Feasel

John McCormack does some digging:

I just realised that in all my scripts that I use on a regular basis, I didn’t have one for working out free space in SQL Server filegroups. It’s not something that comes up too often but it’s handy to know. For methods of working out space in individual files, you could refer to this post on mssqltips.

Click through for the query and congrats to John on 100 posts.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Month: March 2022