Press "Enter" to skip to content

Curated SQL Posts

Finding Power BI V1 Workspace Owners

Brett Powell has a process to find Power BI V1 workspace owners:

As most readers of this blog likely know, there are two very different kinds of workspaces in Power BI – V1 or ‘classic’ workspaces which are tied to Office 365 groups and V2 or ‘modern’ workspaces which are not. V2 workspaces have many advantages beyond their independence from Office 365 which you can read about elsewhere but for a bit of context you can read the GA announcement of V2 workspaces from back in April.

Since upgrading to V2 workspaces has been a manual process thus far, most Power BI tenants contain a mix of V1 and V2 workspaces. You may also have read the recent announcement of a new feature in the Power BI service available to workspace admins to upgrade their V1 workspaces. This blog post is all about identifying these V1 workspaces and their admins.

Click through for the process, as well as Brett’s recommendation regarding migration to V2 right now.

Comments closed

Power BI Premium Capacity Testing

Matthew Roche announces an interesting tool:

This new tool was included as part of the BRK2046  session on Power BI Premium at MBAS. The whole session is valuable, but the tool itself comes in around the the 32 minute mark. There’s a demo at the 37 minute mark. The tool is available today on github.

This tool will help Power BI Premium customers better plan for how their specific workloads (reports, dashboards, datasets, dataflows, and patterns of access) will perform on a given Premium capacity.

Click through for instructions and a description of how it works.

Comments closed

MSDTC and the Firewall

Josh Smith shows how you can enable MSDTC in a buttoned-down environment:

This is just a fancy way of saying you need to be better friends with who ever is managing your enterprise firewall. I hadn’t had to touch the DTC until a recent vendor insisted their application wouldn’t work without it (despite their only having a single data store). The MSDTC was developed to coordinate transactions that would span multiple machines and was originally introduced in SQL Server 2000.

In theory it’s not super complicated: just enable the DTC service/communication on the servers in question and turn on some built in firewall rules on the servers right? Almost.

Read on for the full set of instructions.

Comments closed

Partition Switching to Make Table Changes

Daniel Hutmacher shows a couple things you can change with near-zero downtime using partition switching:

Look, I’m not saying that you’re the type that would make a change in production while users are working.

But suppose that you would want to add an identity column to dbo.Demo, and change the clustered index to include that identity column, and make the index unique? Because it’s the table’s clustered index, you’re effectively talking about rebuilding the table (remember, the clustered index is the table), which involves reorganizing all of the rows into a new b-tree structure. While SQL Server is busy doing that, nobody will be able to read the contents of the table.

Daniel mentions a read-only table, though you could also do this with a read-write table as long as you have triggers to keep the two tables in sync until go time. That adds to the complexity, but it is an option if you need it.

Comments closed

Columnar File Formats in Hadoop

Matthew Rathbone gives us an overview of the benefits behind the ORC and Parquet file formats:

People throw this term around a lot, but I don’t think it is always clear exactly what this means in practice.

The textbook definition is that columnar file formats store data by column, not by row. CSV, TSV, JSON, and Avro, are traditional row-based file formats. Parquet, and ORC file are columnar file formats.

Read on for a comparison and example. In the SQL Server world, think columnstore versus rowstore indexes and you won’t be too far off.

Comments closed

Aggregations in Power BI

Shabnam Watson takes us through aggregations in Power BI:

In Power BI, Aggregations start as tables just like any other table in a model. They can be based off a view or table in the source database, or created in Power BI with Power Query. They can be in Import or Direct Query storage mode.

Once in the model, these tables can be configured so that the engine can use them instead of a detail table to answer queries when possible. The process of creating and configuring aggregations in Power BI is significantly easier than the process of creating aggregations in SSAS multidimensional.

Once an aggregation table is configured, it becomes hidden from end users. Report developers and end users don’t know that it exists and don’t need to change anything in how they query the dataset.

This was one of the key benefits to a multidimensional model. Shabnam has an excellent, detailed article here, so give it a read if you are a Power BI developer.

Comments closed

Decomposition Trees in Power BI

Tomaz Kastrun takes us through a new visual in Power BI:

Decomposition tree is a data presentation of slicing and dicing of selected metrics based on the attributes of these metrics or with combination of other metrics. Another great aspect of this visual is to analyze the selected variable with many metrics or attributes (dimensions) as the same time.

It’s not the type of visual I’d want to see on a dashboard, but I can see it as quite useful in exploratory data analysis.

Comments closed

Profiling Hive Jobs on Tez

Dmitry Tolpeko takes us through Hive query diagnostics:

I was asked to diagnose and tune a long and complex ad-hoc Hive query that spent more than 4 hours on the reduce stage. The fetch from the map tasks and the merge phase completed fairly quickly (within 10 minutes) and the reducers spent most of their time iterating the input rows and performing the aggregations defined by the query – MIN, SUM, COUNT and PERCENTILE_APPROX and others on the specific columns.

After the merge phase a Tez reducer does not output many log records to help you diagnose the performance issues and find the bottlenecks. In this article I will describe how you can profile an already running Tez task without restarting the job.

Click through for the process, as well as the root cause of the problem.

Comments closed