Kevin Feasel – Page 592

The Data Professional Salary Survey

Published 2021-12-02 by Kevin Feasel

Brent Ozar has re-opened the data professional salary survey:

We’re data people, you and I. We make better decisions when we work off data instead of feelings.
It’s time for our annual salary survey to find out what data professionals make. You fill out the data, we open source the whole thing, and you can analyze the data to spot trends and do a better job of negotiating your own salary:

Click through for the link to the survey. It looks like most of the questions have stayed the same this year, which is good for longer-term analysis.

Comments closed

Powershell Equality Operations

Published 2021-12-02 by Kevin Feasel

Dave Mason is not amused:

When comparing two values in PowerShell, you’ll have to march to the beat of a different drum. The syntax is drastically different:

The short reason why Powershell uses equality operators like -eq is that Bash uses them. Though the funny thing is that Bash actually uses == for string equality comparison and only uses -eq for numeric equality comparisons. The norm for POSIX is =, adding yet another level of fun.

Comments closed

Updates in Azure Synapse Analytics

Published 2021-12-01 by Kevin Feasel

Saveen Reddy shows how the Synapse product team has been busy this year:

Previously, Synapse workspaces had a kind of database called a Spark Database. Spark databases had two key characteristics:
– Tables in Spark databases kept their underlying data in Azure Storage accounts (i.e. data lakes)
– Tables in Spark databases could be queried by both Spark pools and by serverless SQL pools.
To help make it clear that these databases are supported by both Spark and SQL and to clarify their relationship to data lakes, we have renamed Spark databases to Lake databases. Lake databases work just like Spark databases did before. They just have a new name.

Okay, this is the kind of change I can do without. That’s a really dumb name. Spark databases tell you what a thing is. It’s a database which lives in Apache Spark. Lake databases run what? Apache Spark. But if anything really should be called a Lake database, it’d be a serverless SQL pool’s database because everything in there is built on top of the data lake—it’s all external tables pointing to a lake. So calling a Spark database a Lake database brings more confusion than elucidation.

Most of the other changes on that list? Really cool. This one? Not at all.

Comments closed

Variables and Scope in Powershell

Published 2021-12-01 by Kevin Feasel

Dave Mason continues a quest into the bowels of Powershell:

Let’s talk a little bit about PowerShell variables and how long they exist within the scopes they’re defined. I’ve encountered some behavior that for me, was unexpected. It’s made my development efforts unproductive–especially when it comes to debugging.

Just like with notebooks, it’s important to remember that the Powershell prompt has a session, and that you aren’t running fresh every time. You can also use Dave’s solution to the problem, which makes sense as well.

Comments closed

Beyond the Basics with Powershell Enums

Published 2021-12-01 by Kevin Feasel

Robert Cain hits us again on the topic of enumerations in Powershell:

In a previous post, Fun with PowerShell Enums I introduced the concept of Enums.
In this post we’ll dive deeper into enums, taking a look at more of its properties a well as other ways to use enums.

Read on to see what happens when you accidentally include a particular value twice, as well as more about using enums.

Comments closed

A Heap of Pain

Published 2021-12-01 by Kevin Feasel

Chad Callihan explains the dislike for heaps in SQL Server:

A table is considered a heap when it is created without a clustered index. Data isn’t in any type of ordered state. Some data is over here, some data is over there.
When you are inserting data into a heap, that data is tossed in wherever. Think of it like your junk drawer. It’s not organized into its own little sections. What do you do when you have something to add such as a pair of scissors or an old pen? You open the drawer, toss it in, and close it up without giving it a second thought.

Like Chad mentions, there are uses for heaps. And when you move to Azure Synapse Analytics, there are more uses for heaps. But with on-premises SQL Server, a heap is usually a mistake.

Comments closed

Using the Fail Activity in Azure Data Factory

Published 2021-12-01 by Kevin Feasel

Rayis Imayev thinks about failure:

Recently, Microsoft introduced a new Fail activity (https://docs.microsoft.com/en-us/azure/data-factory/control-flow-fail-activity) in the Azure Data Factory (ADF) and I wondered about a reason to fail a pipeline in ADF when my internal being tries very hard to make the pipelines successful once and for all. Yes, I understand a documented explanation that this activity can help to “customize both its error message and error code”, but why?

Click through for Rayis’s take. I’ll just be here cracking jokes about how Fail activities are banned in my code because I expect it to have a positive outlook on life.

Comments closed

Building an ETL Pipeline with Airflow and Containers

Published 2021-11-30 by Kevin Feasel

Nikita Vasilev needs to move some data:

Obviously, we can use one of the many ready-made ETL systems that implement the functions of loading information into the corporate data warehouse. Informatica PowerCenter, Oracle Data Integrator, SAP Data Services, Oracle Warehouse Builder, Talend Open Studio, Pentaho are just a sliver of off-the-shelf solutions. However, when it comes to large volumes of data at high speeds and Big Data infrastructure already in place, boxed solutions fall flat to satisfy your needs.
Therefore, Big Data pipelines require something like Apache Airflow. It’s an open-source set of libraries for developing, planning, and monitoring workflows. Airflow is written in Python and allows you to create and configure task chains both visually with a clear web-GUI and to write Python program code.

Click through for an example using Airflow with AWS’s Elastic Container Service.

Comments closed

Building 2048 in T-SQL

Published 2021-11-30 by Kevin Feasel

Tomaz Kastrun gives you a way to slack off at work while everybody else thinks you’re working on a really important SQL problem:

What is 2048 game? It is a classical puzzle game, that is easy and fun to play. The objective of the game is to move the numbers (tiles in the matrix/board) in a way to combine them to create a tile with the number 2048.

Click through to see how to use it and check out the scripts on Tomaz’s GitHub repo. This definitely merits the Wacky Ideas category.

Comments closed

Data Processing in Data Explorer Pools

Published 2021-11-30 by Kevin Feasel

Tsuyoshi Matsuzaki shows us how Data Explorer pools work in Azure Synapse Analytics:

In Microsoft Ignite 2021, new Data Explorer (DX) pool in Azure Synapse Analytics is released in preview. You might wonder which one to choose among 3 different analytical pools – Spark pool, Dedicated SQL pool, and DX pool.
In this post, I’ll briefly summarize how data is processed in Data Explorer (Kusto) – Azure Data Explorer (ADX) and Azure Synapse Data Explorer (DX) pool.
I hope this will give you a hint for your optimal analytical platform.

Read on for this explanation.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel