Press "Enter" to skip to content

Day: January 18, 2021

Driving Action with Graphs

Elizabeth Ricks completes a series on designing visuals. Question 4 involves the audience:

When communicating with data, always consider what broader purpose it serves. Too often, we don’t pause and think about the context (understandable in the real-world maelstrom of deadlines, constraints and organizational politics!). Rather, we force the visuals that we used to analyze the data upon our audience, hoping they’ll come to the same conclusion. 

Question 5 looks at the text on a visual:

In data visualization, words can be more powerful than we realize. When designing graphs, the non-negotiable text includes explicit axes labels, descriptive chart titles, supporting annotations, and footnotes with relevant methodology and assumptions. Never assume it’s apparent what’s being shown—make it your default to include this important context so your audience doesn’t have to guess what they’re seeing. 

Be sure to read both.

Comments closed

Loading a Spark DataFrame in .NET

Ed Elliott shows how to get data and convert it into a Spark DataFrame using .NET:

When I first started working with Apache Spark, one of the things I struggled with was that I would have some variable or data in my code that I wanted to work on with Apache Spark. To get the data in a state that Apache Spark can process it involves putting the data into a DataFrame. How do you take some data and get it into a DataFrame?

This post will cover all the ways to get data into a DataFrame in .NET for Apache Spark.

Click through for several methods.

Comments closed

Azure Data Factory Deployment Methods

Kamil Nowinski contrasts two methods for deploying Azure Data Factory pipelines:

Turned out that two-third of people use Microsoft’s deployment way, according to their answers on that poll (including few people who publish the code manually). In 1/3 cases people prefer to deploy directly from code. You may ask: what’re the differences? What characterizes both methods? Which one is better?
Before I start answering these questions, let me present both methods of publishing.

Read on to learn more about these approaches.

Comments closed

Working with Calendar Tables

Peter Schott hits on one of my favorite concepts:

Maybe you’ve worked with data warehouses before, in which case the concept of a “Date Dimension” is going to be familiar. If not, the general idea behind a Calendar or Date table is that you have a table of Dates and metadata about those dates. This can include business-specific flags, alternate Quarter structures, alternate Week Start data, or whatever fits your needs

By pre-populating all of the data about a date in a table, it makes querying for specific date-based criteria a lot easier, especially when your fiscal year isn’t aligned with the calendar year or you need to deal with multiple fiscal years. It also helps with those holidays which are aligned with lunar calendars and thus “change date” every year.

Comments closed

Auto-Pausing Dedicated SQL Pools in Azure Synapse Analytics

Fonseca Sergio automates an important cost-saving measure when working with Azure Synapse Analytics dedicated SQL pools:

As Synapse engineer or Synapse Support Engineer you may need to start and test some Pools, and you want this to be the most cost efficient possible. Leaving some Synapse with a lot of DWU left turned on during the weekend because you forget to pause the DW after you shutdown your computers is not a good approach and we can quickly resolve this by using Powershell + Automation accounts.

This is also a good introduction to Azure Automation if you aren’t familiar with it.

Comments closed

Power BI and Unix Epoch Time

Ed Hansberry knows what time it is:

You may need to write a Power BI report that works with Unix Epoch Time, converting either to or from it. Epoch Time is simply the number of seconds that have passed since January 1, 1970, at 12:00 am UTC. So at 1 am of that date, Epoch Time would be 3600, since 3,600 seconds have transpired. Today Epoch Time is around 1610644465, or 1,610,644,465 to make it easier to read, though you will never see the commas in a database.

Converting to and from this time format is deceptively simple, as long as you understand a few functions in Power Query and the concept of Duration.

Read on for a demo.

Comments closed

When Power Query Doesn’t Load All Excel Data

Chris Webb lets us know about some new functionality in Power Query:

It’s a new option on the Excel.Workbook function called InferSheetDimensions. Here’s what the docs for the Excel.Workbook function say:

Can be null or a logical (true/false) value indicating whether the area of a worksheet that contains data should be inferred by reading the worksheet itself, rather than by reading the dimensions metadata from the file. This can be useful in cases where the dimensions metadata is incorrect. Note that this option is only supported for Open XML Excel files, not for legacy Excel files. Default: false.

Read on to understand when that might happen.

Comments closed