Press "Enter" to skip to content

Author: Kevin Feasel

Azure Data Factory Deployment Methods

Kamil Nowinski contrasts two methods for deploying Azure Data Factory pipelines:

Turned out that two-third of people use Microsoft’s deployment way, according to their answers on that poll (including few people who publish the code manually). In 1/3 cases people prefer to deploy directly from code. You may ask: what’re the differences? What characterizes both methods? Which one is better?
Before I start answering these questions, let me present both methods of publishing.

Read on to learn more about these approaches.

Comments closed

Loading a Spark DataFrame in .NET

Ed Elliott shows how to get data and convert it into a Spark DataFrame using .NET:

When I first started working with Apache Spark, one of the things I struggled with was that I would have some variable or data in my code that I wanted to work on with Apache Spark. To get the data in a state that Apache Spark can process it involves putting the data into a DataFrame. How do you take some data and get it into a DataFrame?

This post will cover all the ways to get data into a DataFrame in .NET for Apache Spark.

Click through for several methods.

Comments closed

Auto-Pausing Dedicated SQL Pools in Azure Synapse Analytics

Fonseca Sergio automates an important cost-saving measure when working with Azure Synapse Analytics dedicated SQL pools:

As Synapse engineer or Synapse Support Engineer you may need to start and test some Pools, and you want this to be the most cost efficient possible. Leaving some Synapse with a lot of DWU left turned on during the weekend because you forget to pause the DW after you shutdown your computers is not a good approach and we can quickly resolve this by using Powershell + Automation accounts.

This is also a good introduction to Azure Automation if you aren’t familiar with it.

Comments closed

Working with Calendar Tables

Peter Schott hits on one of my favorite concepts:

Maybe you’ve worked with data warehouses before, in which case the concept of a “Date Dimension” is going to be familiar. If not, the general idea behind a Calendar or Date table is that you have a table of Dates and metadata about those dates. This can include business-specific flags, alternate Quarter structures, alternate Week Start data, or whatever fits your needs

By pre-populating all of the data about a date in a table, it makes querying for specific date-based criteria a lot easier, especially when your fiscal year isn’t aligned with the calendar year or you need to deal with multiple fiscal years. It also helps with those holidays which are aligned with lunar calendars and thus “change date” every year.

Comments closed

When Power Query Doesn’t Load All Excel Data

Chris Webb lets us know about some new functionality in Power Query:

It’s a new option on the Excel.Workbook function called InferSheetDimensions. Here’s what the docs for the Excel.Workbook function say:

Can be null or a logical (true/false) value indicating whether the area of a worksheet that contains data should be inferred by reading the worksheet itself, rather than by reading the dimensions metadata from the file. This can be useful in cases where the dimensions metadata is incorrect. Note that this option is only supported for Open XML Excel files, not for legacy Excel files. Default: false.

Read on to understand when that might happen.

Comments closed

Power BI and Unix Epoch Time

Ed Hansberry knows what time it is:

You may need to write a Power BI report that works with Unix Epoch Time, converting either to or from it. Epoch Time is simply the number of seconds that have passed since January 1, 1970, at 12:00 am UTC. So at 1 am of that date, Epoch Time would be 3600, since 3,600 seconds have transpired. Today Epoch Time is around 1610644465, or 1,610,644,465 to make it easier to read, though you will never see the commas in a database.

Converting to and from this time format is deceptively simple, as long as you understand a few functions in Power Query and the concept of Duration.

Read on for a demo.

Comments closed

The Benefit of Multiple Visuals

Elizabeth Ricks continues a series on data visualization techniques:

Data can be visualized many different ways, creating an often daunting task to select the best chart to use. We also sometimes complicate our audience’s experience, packing too much information into a single graph, in the hopes that it will answer every possible question. Instead of making the Swiss Army Knife of charts, we might instead create different views of the same data, examine what each iteration allows a viewer to see, and then decide which version(s) to share with our audience. 

Read on for more.

Comments closed

Including Database Name in Extended Events

Aaron Bertrand has a change of heart:

In my previous tip about SQL Server Extended Events, I discussed the trade-offs involved with choosing to store the database name instead of looking up the name via database_id later. At the time, we decided to use the latter approach since we are generally not concerned about the edge case where DDL is performed against a database that is then dropped before collection occurs.

Something else that has since come up: we want to filter out activity against a set of databases matching a specific name pattern since the DDL activity there is system-generated and does not need to be audited. We quickly learned that you have to be careful with how you use the database name in an Extended Events predicate.

Click through to understand the problem as well as the solution.

Comments closed

Saving Money on Backups to Azure Blob Storage

John McCormack has a few tips for saving some cash:

You have 5 databases on a SQL Server Instance. You take daily full backups of each database on your instance. You also take log backups every 15 minutes as each database is in full recovery mode. This means in 1 week, you will have 35 full backups and 3,360 transaction log backups. This multiplies to 1,820 full and 174,720 t-log backups over 52 weeks. Multiply this for 7 years or more and the costs can get very expensive.

Click through to see how you can save a considerable amount with a bit of planning.

Comments closed

Change Tracking with Temporal Tables

Erik Darling continues a line of thought:

Temporal Tables are different, because it will add new, and drop removed columns from the history table as they change in the base table. It’ll also push data type changes across. But the history table will not have the various permutations tracked. It just mirrors current state.

The one thing I do want to mention is that the column Erik calls JunkDate isn’t junk at all in its proper context—it’s the row end date for a Kimball-style dimension. I get that it isn’t useful in the “type 1” table, but it’s quite useful in the “type 2” table and the sorts of queries temporal tables were expected to solve use the row start date and the row end date together to give you an idea of whether the version of the row you’re seeing is the current version or not.

Comments closed