Press "Enter" to skip to content

Month: November 2023

Exploring a Dataset for Microsoft Fabric Suitability

Eugene Meidinger continues a series on learning Microsoft Fabric:

This is week 1 where I try to take Magic the Gathering draft data to learn Microsoft Fabric. Check out week 0 for some reasoning why.

So, before I do anything else, I want to get a sense of the data I’m looking at to see if it’s suitable for this project. I download the data, and because it’s gzipped, I use 7-zip to open it up on windows 10, or Windows explorer on Windows 11. In either case, the first thing I notice is the huge size disparity. When compressed, it is a quarter of a gigabyte. Uncompressed, it’s about 10 GB. This tells us something.

Read on to learn more about the dataset and how Eugene tackled some of the exploratory data analysis.

I also agree completely with Eugene’s point about serendipity. Keeping your metaphorical eyes open will increase the likelihood that you’ll just happen upon something that can help you later, or something that serves a need you didn’t know you had. I used to wander around the library back in my university days because I didn’t know what I didn’t know about topics (that is, the “unknown unknown” quadrant), so I’d just pick up some books that caught my eye. Not all of them are hits, though enough were to make the strategy worthwhile.

Comments closed

When an Update Doesn’t Update

Aaron Bertrand offers some troubleshooting advice:

Tell me if you’ve heard this one before:

I changed data from my application, but when I checked the database, I couldn’t see the change!

I’ve seen this. Loads. It can be quite perplexing for folks because they expect to see an error message if the insert, update, or delete failed. I put this post together to provide some things you can investigate if this happens to you – you are sure that you updated the data, but when you check using SQL Server Management Studio (SSMS), your change isn’t there. For the remainder of the post, I’m going to use the word “update” to mean any change to the data, even MERGE {shudder}.

Read on for three major classes of reason. One bonus reason: you left the transaction open. Most application frameworks will close transactions after a statement, but if you’re hand-writing transaction logic in your app, forgetting a COMMIT can happen.

Comments closed

Constraints in Microsoft Fabric Data Warehouses

Brian Bønk slips out of the constraints:

When working with data and building data models, I personally seldom use the constraints feature on a database. Call me lazy – but I think constraints are adding unnessesary complexity when building data models for reporting. Especially if you are working with the some of new platforms – like Microsoft Fabric, where you are using staleless compute, aka. data storage is seperated from the compute layer.

I understand the need for contraints on other database systems like OLTP systems.

In reporting models it can be somewhat usefull to have constraints between tables, as they help/force you to some level of governance in your datamodel.

But how can we use this in Microsoft Fabric and are they easy to work with?

Read on for those answers. I will note that I’m a stickler about constraints in transactional systems, though I agree that constraints in warehouses are not critical—assuming, at least, that you’re following the Kimball approach and have one and only one mechanism to write data, and that you have other mechanisms for vetting data quality.

Comments closed

Operating on Time Series Data in R

Dario Radečić understands that time is a flat circle:

If there’s one type of data no company has a shortage of, it has to be time series data. Yet, many beginner and intermediate R developers struggle to grasp their heads around basic R time series concepts, such as manipulating datetime values, visualizing time data over time, and handling missing date values.

Lucky for you, that will all be a thing of the past in a couple of minutes. This article brings you the basic introduction to the world of R time series analysis. We’ll cover many concepts, from key characteristics of time series datasets, loading such data in R, visualizing it, and even doing some basic operations such as smoothing the curve and visualizing a trendline.

We have a lot of work to do, so let’s jump straight in!

Click through for a high-level overview. H/T R-Bloggers.

Comments closed

Building an App with Streamlit

Riqo Chaar demonstrates Streamlit:

Off-the-shelf solutions for interactive data app development such as Microsoft Power BI are great – they allow users to easily develop data apps using a GUI. However, Power BI’s ease of use comes at the expense of reduced functionality. This is where programming languages such as Python, JavaScript or C# shine – you can practically code anything you like!

This blog will focus on Streamlit as a means of building interactive data apps. Streamlit is an open-source Python library that enables rapid creation of web apps (including, but not limited to, data apps) with minimal code. It acts as an intermediary between the easy-to-use, but functionally-limited characteristics of Power BI and the functionally-enhanced, but difficult-to-use characteristics of other programming tools such as JavaScript or C#.

I’ve grown to like Streamlit a lot. It’s really simple to put together a good-looking page, similar to Shiny in R.

Comments closed

Scraping the Microsoft Fabric Road Map with Microsoft Fabric

Prathy Kamasani wants a report, not a webpage:

Like many I am also playing with Fabric, many of my clients are also excited about Fabric and want to know more about it. Being a solution architect in the consulting world one of the most common questions I get asked is: “When certain features will be available, Where are they in the roadmap?”. That’s what sparked the idea of scraping the Microsoft Fabric Roadmap and creating this Power BI report. It is based on a Direct Lake connection, so it has been a bit temperamental.

So, what did I do it? If you are not interested in the whole story. Here is Python code you can run to get a road map. If you are interested in my process carry on reading 

Click through for the process and explanation.

Comments closed

An Overview of Docker Security Principles

Jagdish Mohite talks security:

Docker incorporates several inherent security features that contribute to its overall security posture. When you use Docker to quickly create an environment and test some code, security is important enough (especially if you execute any , but when using Docker for production, multi-user environments, it is essential to treat the container as you would any other server environment.

The following is a list of some of the basic security principles that are baked into Docker.

This includes some of the things Docker does for your automatically, limitations around securing containers, and common attack modes. It’s a high-level overview but interesting to read.

Comments closed

Controlling Fallback Behavior in Direct Lake

Sandeep Pawar talks about fallback options:

When you create a Direct Lake semantic model, by default it is in Direct Lake mode, i.e. you will directly query the delta table from the lakehouse/warehouse. This is what we want because the query performance will be very much comparable to the import mode. However, under certain circumstances, the DAX query can fallback to DirectQuery if Direct Lake limitations are hit.

Read on to learn more about circumstances in which this could happen and ways to change the default behavior.

Comments closed