Press "Enter" to skip to content

Month: June 2022

Reading from and Writing to Excel with R

Benjamin Smith needs to modify an Excel file:

I was recently asked as part of a larger task to combine multiple sheets from an excel workbook into a into a single sheet. When approached about the problem I immediately was asked if I was going to use VBA to do it. While I know my way around VBA, since VBA does not have a native way to undo its operations I was uncomfortable with the potential hazard using VBA would yield if a mistake was made or something wrong happens.

In this blog I share how its possible to combine and format sheets using the openxlsx package and base R. Since I’m limiting myself to one library and base R, I will be employing base R’s pipe operator – |>, instead of the superior magrittr pipe – %>% (my opinion only, don’t take it too seriously).

Can confirm, the magrittr “default” pipe is better.

Leave a Comment

SharePoint Lists Showing 100 Items in Logic Apps

Koen Verbeeck needs more than 100 results:

I was reading a SharePoint List using the “Get Items” activity in an Azure Logic App. I explain how you can create such a Logic App in the blog post Reading a SharePoint List with Azure Logic App.

It all worked fine for a while, but recently the list grew larger than 100 items. Suddenly, I started getting complaints that some items didn’t make it into the data warehouse. What’s going on? I ran the Logic App and I could see only 100 items were inserted into the SQL Server table:

Read on to see how you can bump that number past 100.

Leave a Comment

Using a Tree Map as a Legend in Power BI

Prathy Kamasani makes clever use of a tree map:

I recently worked on two projects where the client wanted to show multiple metrics sliced by the same categorical data. For example, seeing how various metrics are performing over different regions or different product groups. A use case like this can be achieved in many ways; probably the best approach is to use small multiples functionality or to keep it simple, five same visuals with different metrics.

Let’s look into it with energy consumption data. Here, I want to show metrics 1 to 5 on different income levels over the years.

I like this solution when you have multiple graphs off of the same base data, like in the small multiples scenario Prathy shows us.

Leave a Comment

Restoring SQL Managed Instance Backups to SQL Server 2022

Mladen Andzic has a preview around how we can take a Managed Instance backup and go on-premises:

Restoring a backup file is the easiest way to copy a SQL Server database to another instance. It allows you to create a copy of your production database for easier troubleshooting or debugging of an issue, to provide a copy of a database to your end users or eligible third parties, or as a light-weight business continuity/disaster recovery solution to restore functionality on another instance of SQL Server. These are just a few use cases, and the list is much longer and there are some very inventive ways of using backup-restore in the wild.

This article explains the challenges of cross-release restore to an older version of SQL engine and announces the private preview of a capability to restore a backup of a database taken from Azure SQL Managed Instance to instance of SQL Server 2022. 

So much SQL Server functionality has been built with the idea of getting you from on-premises into the cloud (specifically Azure) but it’s good to see them spend some development effort on the entirely reasonable and realistic scenario that Azure is not the best choice for a company and there are many such companies still willing to throw money at Microsoft for a good product.

Leave a Comment

“Unsafe Repository” When using Git

Niels Berglund sees something odd:

Every 6 – 9 months (or so), I clean up my development PC just to keep it “lean and mean”. I do it by formatting the hard-drive partition the OS (in this case, Windows) is on, followed by a new install. Recently I had a four-day weekend here in SA. Four glorious days off, a perfect time to “nuke” my PC and re-install!

Off I go, everything goes to plan (Chocolatey is my friend), and after a while, I am done (or as done as one can be). At this stage, I needed to do my weekly roundup blog post for the week gone by, and as I had done some changes to the GitHub repo from my MacBook Pro, I wanted to do a git pull in the repo directory for my blog. Part of the story is that on my dev PC, I have all my repos on a separate partition from the system partition, so the non-system partition was un-affected by the reformat (or so I thought). Imagine my surprise when doing the git pull I got:

Click through to see the error and root cause.

Leave a Comment

Report-Level Measures in Power BI

Soheil Bakhshi explains the concept of report-level measures:

Report level measures are the measures created by the report writers within a Thin Report. Hence, the report level measures are available within the hosting Thin Report only which means the report level measures are not written back to the underlying dataset and hence they are not available to any other reports.

Read on to see how you can create one. I don’t think I’d ever heard them called that before but the name makes sense.

Leave a Comment

Connection Types and DirectQuery in Power BI

Reza Rad has two posts covering connection types in Power BI. First up is an overview of data importation:

Power BI is one of the BI tools in the market that supports more than one type of connection. Each connection type has pros and cons. In this section, we are going to cover everything about Import Data or Scheduled Refresh type of connection. You will learn briefly how Power BI stores data into xVelociy in-memory engine, and what are pros and cons of this method in detail.

But wait, there’s more!:

In the last post, you learned about Import Data or Scheduled Refresh as a connection type. In this post, you’ll learn about the second type of connection named; DirectQuery. This type of connection is only supported by a limited number of data sources, and mainly targets systems with a huge amount of data. DirectQuery is different from another type of connection which I’ll talk about it in the next post named Live Connection.

Understanding when (and how) to use each is important for long-term success in Power BI.

Leave a Comment

Software Engineering Practices for Notebooks

Rafi Kurlansik and Austin Ford explain how to get the most out of notebooks, using Databricks as an example:

Notebooks are a popular way to start working with data quickly without configuring a complicated environment. Notebook authors can quickly go from interactive analysis to sharing a collaborative workflow, mixing explanatory text with code. Often, notebooks that begin as exploration evolve into production artifacts. For example,

1. A report that runs regularly based on newer data and evolving business logic.

2. An ETL pipeline that needs to run on a regular schedule, or continuously.

3. A machine learning model that must be re-trained when new data arrives.

Perhaps surprisingly, many Databricks customers find that with small adjustments, notebooks can be packaged into production assets, and integrated with best practices such as code review, testing, modularity, continuous integration, and versioned deployment.

Read on for several tips and recommendations.

Leave a Comment

Semantics Layers for Data Lakehouses

Jans Aasman explains why semantic modeling is so important for a data lakehouse:

Data lakehouses would not exist — especially not at enterprise scale — without semantic consistency. The provisioning of a universal semantic layer is not only one of the key attributes of this emergent data architecture, but also one of its cardinal enablers.

In fact, the critical distinction between a data lake and a data lakehouse is that the latter supplies a vital semantic understanding of data so users can view and comprehend these enterprise assets. It paves the way for data governance, metadata management, role-based access, and data quality.

For a deeper dive into the topic, Kyle Hale has a post covering this with Databricks and Power BI as examples.

Leave a Comment

The First 30 Days as a DBA

Tracy Boggiano has some experience with new jobs:

Over the last four years, ok it seems longer than that, I’ve started four jobs. A couple just weren’t good fits. One I was at for three years. I currently just finished my first 30 days at my fourth one. Having done the first 30 days several times over the last few years, I’ve searched each time what you would do when you start that new position to take over the environment. What would you evaluate, where to start with everything, what to do first? With no luck mind you. So, I’m going to blog about my journey through this as I’ve done it several times over my career and believe it can help others as they start new positions know where to begin. Coincidentally, Aaron Bertrand (t) just blogged about his first month at Stack Overflow as a DBRE.

I can’t believe it’s been four years since Tracy and I worked together. Somebody’s been messing with my time machine, right?

Leave a Comment