Press "Enter" to skip to content

Curated SQL Posts

The Importance of Power BI Deployment Pipelines

Marc Lelijveld explains the importance of Power BI Deployment Pipelines:

You might have seen the announcement, Power BI Deployment Pipelines have been released in May 2020. It is around for about two months now. On different social channels I have seen a lot of buzz around it already, both positive and negative honestly. Though, I think this is a great step forward!

Back in 2018, I posted a blog about multi-tier architecture and continuous delivery with Power BI. If you are not familiar with a DTAP approach and why this helps you to structure your development processes, I advise you to first read that blog. Personally, I am really excited about Deployment Pipelines! With this functionality, Microsoft starts offering an out-of-the-box functionality that helps you to easier move your Power BI content through you DTAP pipeline.

I think it’s a pretty big step in the right direction, though the “Why this isn’t so great” section is a bit lengthy.

Comments closed

Parallel Processing in Analysis Services

Kasper de Jonge takes us through parallel processing of partitions and tables in Analysis Services:

One thing that has come up several times in the last few weeks is a misconception that you cannot process multiple partitions on the same table or tables in parallel as it would cause a lock. This could be true if you try to do the parallelism yourself. Like in SQL Server you need to think about transactions, the AS engine is a transactional system too.

So, the AS engine is definitely capable of loading data in parallel but only if you let him do the puzzling on concurrency. This means you must send processing commands to the AS engine in one transaction so the AS engine can manage the locks itself. There are other benefits of letting AS doing the work like recalculating the calculated items (tables, columns etc) once instead of multiple times which improves processing performance.

Read on for an example.

Comments closed

Migrating From Cosmos DB to SQL Server

Eitan Blumin builds an app:

The general idea is this:

The app executes a Cosmos DB query and collects a number of records into its “buffer”.

Once that “buffer” reaches a certain number of records (configurable), it’s time to “flush” it into the SQL Server. That could be either a database table receiving a Bulk Copy stream, or a stored procedure receiving a table valued parameter (again, configurable).

After the buffer is flushed, we have the option to execute a “merge” procedure. This is a stored procedure that would implement an “upsert” logic from the “staging” table and into the actual destination table.

Read on for more explanation and check out Eitan’s GitHub repo.

Comments closed

Apache Flink 1.1.0 Released

Marta Paes announces Apache Flink version 1.11:

Change Data Capture (CDC) has become a popular pattern to capture committed changes from a database and propagate those changes to downstream consumers, for example to keep multiple datastores in sync and avoid common pitfalls such as dual writes. Being able to easily ingest and interpret these changelogs into the Table API/SQL has been a highly demanded feature in the Flink community — and it’s now possible with Flink 1.11.

Click through for the full list of updates.

Comments closed

Query Scheduling with Apache Hive

Zoltan Haindrich and Jesus Camacho Rodriguez walk us through scheduled queries in Apache Hive:

To fulfill that purpose, recently Apache Hive introduced a new feature called scheduled queries. Using SQL statements, users can schedule Hive queries to run on a recurring basis, monitor their progress, and optionally disable a query schedule.

In a nutshell, every scheduled query in Hive consists of (i) a unique name to identify the schedule, (ii)  the actual SQL statement to be executed, and (iii) the schedule at which the query should be executed defined by a Quartz cron expression. In addition, a scheduled query belongs to a namespace, i.e., a collection of HiveServer2 instances that are responsible to execute the query.

Read on for examples of how you might create, use, and learn about scheduled queries running on a system.

Comments closed

Iterating over JSON and XML Data in SQL Server

Steve Stedman explains how you can iterate through XML and JSON data using the APPLY operator:

The results are what we are looking for in this specific example, but where they break down is when there are more employees represented in the XML, for each employee we need to add another UNION to bring the results together. That is not very iterative and since the title of this post includes the word iterating, we need to focus on how to do that.

Now we introduce the CROSS APPLY functionality that can be used like a JOIN to take a value from one result set (table) and apply it to a function that gets called once for each row. You can reference my JOIN TYPES poster for using CROSS APPLY

Click through for the full set of examples.

Comments closed

Moving a Semantic Model from SSAS to Power BI

Teo Lachev explains why it might make sense to move a semantic model from SSAS over to Power BI:

We decided to move the semantic model to Power BI so that Power BI owns the data. Besides potentially improving the report load time, this architecture has also other important advantages (to learn more, read my “Power BI Large Datasets: The Good, the Bad, and the Ugly” post). If you’re not on Power BI Premium, that “movement” might not easy if you have opted to use Visual Studio or Tabular Editor for development. That’s because Power BI Pro doesn’t expose the XMLA endpoint, so your only option is to migrate the model to Power BI Desktop. But migrating an SSAS Tabular project to Power BI Desktop is not officially supported and there is no automatic migration path.

Click through for the full explanation of why, as well as notes on the process.

Comments closed

Dropping Temporary Tables at the Beginning of a Procedure

Richard Swinbank walks through a technique to drop temp tables as a procedure begins:

Local temporary tables – tables with names that begin with a single # character – are dropped automatically by SQL Server when they are no longer in scope. So why drop them explicitly at all?

I’m writing this from the perspective of a data engineer. ETL processes or their components are often implemented as T-SQL stored procedures (SPs) which might move significant amounts of data around and take a while to run. I’m not talking about the sort of SP that needs to run in 30ms to guarantee application responsiveness!

Sometimes it’s useful to store intermediate results in temporary tables, sometimes it’s necessary, and sometimes you find yourself maintaining code built by someone who just really loves them. Adding explicit DROPs to code like this can make development and maintenance a bit smoother.

My concern with doing this is that it prevents temporary object caching. Granted, Richard explicitly calls out that these are long-running procedures (and probably one of a particular procedure running at a time), so that does reduce the concern accordingly.

Comments closed

Deploying ADF via Azure DevOps

Kamil Nowinski has part two on a series about releasing Azure Data Factory code:

Struggling with #ADF deployment? adf_publish branch doesn’t suit your purposes? Don’t have skills with PowerShell? I have good news for you. There is a new tool in the market. It’s a task for Azure DevOps Release Pipeline to deploy whole ADF from code (JSON files) to ADF instance in Azure. Behind the scenes, it runs the PowerShell module which does all job for you.
Sounds unbelievable? But it’s real! Check it out for yourself.

Click through for the video.

Comments closed