Press "Enter" to skip to content

Day: August 3, 2022

Visualizing Data in Python

Mark Litwintschik provides some recommendations:

There are two major phases of data analysis. The first is building up a basic understanding of a new dataset. Once this is done there is a second phase of understanding what’s changing over time and if there are any new outliers.

For the first phase, I find Tableau to be more productive than writing code in a Jupyter Notebook. For the second phase, I like to build periotic Airflow jobs that send charts and Excel files to operational channels on Slack. These are formatted to be mobile-friendly and allow me to do more of my work on a phone rather than being chained to a laptop. This also means access is controlled via Slack rather than a custom web app.

Mark also covers some examples with Altair.

Comments closed

Worrying over Columns or Rows

John Mount explains an attitude difference:

I say: if you are a data scientist or working on an analytics projectworry over columns not rows.

In analytics “rows” are instances, and “columns” are possible measurements. For example: each click on a website might generate a row recording the visit, and this row would be populated with columns describing what was clicked on (and if you are lucky there are more records recording what else was presented and not clicked on).

Read the whole thing. This is also why formats like Parquet and ORC are so popular for data analysis. Same goes for business intelligence people, who reason mostly over columns, leading to columnstore indexes being so useful.

Comments closed

SQL Server 2019 CU 16 and Log Shipping

Lee Markum notes a change in SQL Server 2019 CU 16:

As a data professional managing SQL Servers, you need to be thinking about Windows and SQL Server patching.

You want to stay up to date, but you also have to be careful because sometimes staying fully up to date comes with the risk that a new patch might break a feature you are using. This is possibly the case for SQL Server 2019 CU 16. If you’re using Log Shipping on a database that has TDE enabled and compression is enabled on those backups, then you need to take heed.

I’ll walk you through what can happen and what I’ve seen work to resolve the problem.

This is the tricky part of getting rid of service packs: when I think cumulative update, I don’t think “Here’s a thing that could break backwards compatibility with other SQL Server instances which have not been patched.” Read on to see an example of this in action.

Comments closed

Organizing Data Domains in a Data Mesh

Paul Andrew continues a series on data mesh architecture:

Defining an organisation hierarchy is always hard, even more so for large enterprises with massive amounts of interlock between business functions. In the context of data analytics, we attempt to tackle the problem by creating an organisation dimension as part of our star schema data model. This could include things like region, operating company, branch, department, team etc.

So, my friends, how do we go about handling this when considering a data mesh architecture and the de-centralised domains that support the natural scalability we crave. For me, it feels like we are just frontloading the dimensional modelling problem. Tackling it from the beginning in the very foundations of our data platform. But, with a twist.

Read on for that twist and for some solid guidance on data domains in practice compared to the theory.

Comments closed

Showing Filter Selections on Power BI Reports

Mara Pereira makes a Power BI breadcrumb:

When I’m developing reports, I’m asked multiple times to “hide” the slicers/filters from the report page. Usually this is to make space in the page for other visuals and because customers don’t want to use the filter pane for some reason.

This happened so many times, and only in the last couple of months I decided to try some things out and get a bit creative with Power BI.

I came up with two solutions, which I think work great in these scenarios.

Read on to see the solutions.

Comments closed

T-SQL Improvements in CTP 2.1

Itzik Ben-Gan looks at ways recent T-SQL improvements:

One of the complexities that is of a specific importance to this section is comparisons that potentially involve NULL comparands, such as ones that you use in filter and join predicates. Most operators that you use in such comparisons, including the equals (=) and different than (<>) operators, use three-valued logic. This means that there are three possible truth values as a result of a predicate that uses such operators: true, false and unknown. When both comparands are non-NULL, such operators return true or false as you would intuitively expect. When any of the comparands is NULL, including when both are NULL, such comparisons return the unknown truth value. Both filter and join predicates consider unknown as a non-match, and sometimes that’s not the behavior that you’re after.

The distinct predicate (IS [NOT] DISTINCT FROM) should simplify a lot of code in the wild.

Comments closed

Variables in DAX

Marco Russo and Alberto Ferrari show us how to use variables in DAX:

Variables were introduced in DAX in 2015 and so far, they have proven to be the best enhancement of the DAX language ever. When presented with the concept of variables, most newbies focus on performance improvement, thinking that you introduce variables in your code mainly to obtain better performance. Although variables can improve performance, performance is a minor advantage. There are several more important considerations that should encourage any DAX developer to make extensive use of variables. In this article we share a few considerations, along with best practices about variables and DAX.

Read on to see what makes variables so powerful.

Comments closed

Power BI CAT Guidance

Matthew Roche advises customers:

I’ve written previously about some of what the Power BI CAT team does, but the Power BI guidance documentation only gets a passing mention… and it’s worth going into more deeply.

A lot of what the Power BI CAT team does involves working with large enterprise customers. These customers are often trying to achieve difficult goals that often involve complex data architectures, and Power BI is often a significant part of their end-to-end information supply chain. We get involved when these enterprise customers need help achieving their strategic goals, and this help often includes helping them effectively use the existing capabilities of Power BI.

Read on to learn more about this guidance documentation.

Comments closed