February 2024 – Page 6

Notes on Data Engineering in Microsoft Fabric

Published 2024-02-20 by Kevin Feasel

John Miner shares some notes. Part 1 looks at getting started and tables, both managed and unmanaged:

The architectural diagram shows how information flows from a source system, into a delta lake house, transformed by programs, and used by end users. To get source data into the lake, we can use any of the three methods to retrieve the data as files: pipelines – traditional Azure Data Factory components, dataflows – wrangling data flows based on Power Query and shortcuts – the ability to link external storage to the lake. Once the data is in the lake, there are two types of programs that can transform the data files: spark notebooks and data flows.

Part 2 covers file and folder management:

In practice, I have seen an additional quality zone called raw be used to stage files in their native format before converting to a delta file format. Please note, the lake house uses either shortcuts or pipelines to get files into the lake. We will talk more about bronze, silver and gold zones when I cover full and incremental loading later in this article.

Read on for John’s thoughts.

Comments closed

Environmental Deployment in Microsoft Fabric

Published 2024-02-20 by Kevin Feasel

Kevin Chant takes us through deployment pipelines in Microsoft Fabric:

One question that I get frequently asked is how many workspaces are required? In reality, the answer is that it depends.

However, if you want your solution to be flexible and loosely coupled I do recommend at the very least one Microsoft Fabric workspace per environment.

That’s also required if you’re using deployment pipelines, as each stage in the pipeline pushes to a unique workspace.

Comments closed

The apply() Functions in R

Published 2024-02-19 by Kevin Feasel

Steven Sanderson talks about a series of functions:

Welcome, fellow R warriors! Today, we delve into the heart of vectorized operations with R’s “apply” family: apply(), lapply(), sapply(), and tapply(). These functions are your secret weapons for efficiency and elegance, so buckle up and prepare to be amazed!

But first, the “why”: Loops are great, but for repetitive tasks on data structures, vectorization reigns supreme. It’s faster, cleaner, and lets you focus on the “what” instead of the “how” of your analysis. Enter the apply family, each member offering a unique twist on applying functions to your data.

The trickiest part about the apply() series is remembering which one does what. This is where purrr’s map() function does a better job, I think.

Comments closed

Maximizing the Savings for SQL Developers

Published 2024-02-19 by Kevin Feasel

Bob Ward is speaking my language:

Whether you build applications for SQL Server on-premises or in Azure, there are several options for you to develop or test for free, or with substantial cost savings. Some of these options you may be familiar with, but this post will help explain details and answer some questions Microsoft often gets to provide you a clear story.

The part in particular I want to focus on is Azure Dev/Test pricing. I give a talk on saving money in the cloud and this is one of the bits most people know very little about. With a Dev/Test subscription in Azure (not just a regular subscription that you call “dev” or “test”!), you can avoid Windows and SQL Server licensing fees, get discounts on certain services like Azure Application Services, and do a bit more with it. It’s something most people don’t know about and the documentation is a little confusing, but Bob does a good job of clarifying things.

Comments closed

Enabling SQL Agent after Upgrading from Express Edition

Published 2024-02-19 by Kevin Feasel

Warwick Rudd ponies up the cash:

With the above limitations, this product can provide all of the functionality for your application to operate as required. Because there is no SQL Agent service in this edition this makes it more difficult for you to implement scheduled database maintenance. We are not going to discuss ways to implement database maintenance on an Express edition of SQL Server in this post.

But at some point in time you may find yourself in the position that you need to perform an upgrade of editions. This may or may not include and upgrade of versions. An upgrade to the version can be achieved and can bring benefits of its own. Normally, I would not be writing about and recommending performing in-place upgrades. However, sometimes due to restrictions of vendor applications it is a requirement to perform an in-place upgrade.

Read on to see what you’d need to do, knowing that Express Edition does not come with a SQL Server Agent and does not perform the appropriate setup for one.

Comments closed

Indexing for Sorted Data

Published 2024-02-19 by Kevin Feasel

Erik Darling digs into indexes and sorts:

Without things sorted the way you’re looking for them, it’s a lot like hitting shuffle until you get to the song you want. Who knows when you’ll find it, or how many clicks it will take to get there.

The longer your playlist is, well, you get the idea. And people get all excited about Skip Scans. Sheesh.

Anyway, let’s look at poor optimizer choices, and save the poor playlist choices for another day.

Click through for an interesting problem and solution.

Comments closed

Selective Column Replication in Postgres

Published 2024-02-19 by Kevin Feasel

Semab Tariq shows how to replicate a limited number of columns:

Selective column replication, a feature of logical replication in PostgreSQL, enables the selective transfer of data changes from one database to another. This feature offers flexibility by addressing bandwidth and storage optimization concerns. It allows users to choose specific columns to replicate rather than replicating entire tables, ensuring that only essential data is transferred. This selective approach reduces network traffic and storage space required for replication while maintaining data consistency across databases.

In this blog, I will demonstrate the process of replicating specific columns from a table using logical replication. Additionally, we will see how you can create multiple subscribers connected to a single publisher to replicate data across multiple locations.

Replication seems to be a lot easier to set up in Postgres than SQL Server, though check out the FAQ at the end for a few kicks in the pants around Postgres replication.

Comments closed

Upcoming SSMS 20 Connection Security Changes

Published 2024-02-19 by Kevin Feasel

Erin Stellato announces an “above-the-fold” change:

In SSMS 20 we have made changes to the connection dialog. In the screenshot below, you will see the Login and Connection Properties pages for the connection dialogs for SSMS 19 and 20 compared side-by-side.

Moving the Connection Security section to the Login tab (and it presumably showing up without needing to select Options) is a good change. It’s also interesting to see SSMS support Mandatory encryption, a part of Tabular Data Stream (TDS) 8.0 that Azure Data Studio supports but SSMS (and most other SQL Server-related things) don’t yet.

Comments closed

Cannot Create Property ‘groupid’ On String

Published 2024-02-16 by Kevin Feasel

Barney Lawrence fixes a problem in Azure Data Studio:

This is a solution to a bug I’ve encountered in Azure Data Studio when trying to create new connections. that leads to an error message of cannot create property ‘groupid’ on string ‘.

File this post under “things I couldn’t find a neat answer to in Google so I figure I’d best create a page for it”.

Read on for Barney’s answer as I lament how far downhill Google has gone as a search engine.

Comments closed

The Importance of Source Control for DBAs

Published 2024-02-16 by Kevin Feasel

Steve Jones explains that Git isn’t just for developers:

Git has become a fantastic tool for me, and many other technologists, over the last ten years. It’s almost ubiquitous in most of my clients, and so many people are comfortable with it. Many others aren’t, which is why I started a Git series for DBAs (and other Ops people) on my blog.

Quite a few people asked me why I recommend git over a file share for storing code that a team of Ops people or DBAs might use. Why isn’t a global file share a better choice in an organization? I think I have a few good reasons, but if you disagree, let me know in the discussion for this piece.

There are some annoyances around Git but good UI tools minimize a fair amount of the pain and the benefits are huge.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29

Month: February 2024