Press "Enter" to skip to content

Curated SQL Posts

Experimenting With Power BI Data Privacy Settings

Chris Webb takes us through some of the intricacies of Power BI data privacy settings and what it means when data sets are Private:

Not only does it only show the Package Search endpoint, there is a warning that says:
“Some data sources may not be listed because of hand-authored queries”
This refers to the output step in the query that calls the Package Show endpoint with the dynamically-generated url.
Closing this dialog and going back to the Query Editor, if you click the Edit Credentials button, you can set credentials for the data source (anonymous access is fine in this case). These credentials can be set at all levels in the path down to https://data.gov.uk/api/3/action/package_search.

Read the whole thing.

Comments closed

Using Slicers In Power BI To Filter Chart Categories

Prathy Kamasani shows how you can use slicers in Power BI to filter out specific categories in a line chart:

The logic is to create a table with the DAX function UNION. Each Table expression in UNION function represents a value of slicer. Apart from that slicer related value, all the rest of the values are blanks.  It is key to have them as blanks than zero’s, we don’t see any data.

In other words, pivoting the table to turn one measure with several different category values into one measure per category.  If you know the number of categories (4 in this case), this solution can work well for you.

Comments closed

Making dbatools Load Faster

Chrissy LeMaire has found a technique to reduce the load time for dbatools:

I don’t know the in-depth internals of Import-Module, but I know that importing a DLL filled with C# cmdlets is extremely fast. For instance, Microsoft’s SqlServer module imports 100 commands in less than a second. Sometimes it’s closer to half a second!
I wondered if we could somehow compile our commands into a C# binary but that seemed far-fetched. One thing we could do, though, is use compression! It works for SQL Server backups, could it work for PowerShell?

Spoilers:  the answer is “yes.”  Read on to see how much faster the initial import gets.

Comments closed

Building A Secure Microservice Which Uses Kafka Streams

George Vetticaden has a reference architecture with sample code for a secured microservice running atop Kafka Streams:

One of the key benefits of using Kafka Streams over other streaming engines is that the stream processing apps / microservices don’t need a cluster. Rather, each microservice can be run as a standalone app (e.g: jvm container). You can then spin multiple instances of each to scale up the microservice. Kafka will treat this as a single consumer group with multiple instances. Kafka streams takes care of consumer partition reassignments for scalability.
You can see how to start these three microservices here.

If you’re trying to wrap your head around streaming apps, give this a try.  George has all of the code in his GitHub repo.

Comments closed

Using Databricks Delta In Lieu Of Lambda Architecture

Jose Mendes contrasts the Lambda architecture with the Databricks Delta architecture and gives us a quick example of using Databricks Delta:

The major problem of the Lambda architecture is that we have to build two separate pipelines, which can be very complex, and, ultimately, difficult to combine the processing of batch and real-time data, however, it is now possible to overcome such limitation if we have the possibility to change our approach.
Databricks Delta delivers a powerful transactional storage layer by harnessing the power of Apache Spark and Databricks File System (DBFS). It is a single data management tool that combines the scale of a data lake, the reliability and performance of a data warehouse, and the low latency of streaming in a single system. The core abstraction of Databricks Delta is an optimized Spark table that stores data as parquet files in DBFS and maintains a transaction log that tracks changes to the table.

It’s an interesting contrast and I recommend reading the whole thing.

Comments closed

An Overview Of Apache Kafka

Leona Zhang has a series going on Apache Kafka.  Part one covers some of the concepts around messaging systems:

There is a difference between batch processing applications and stream processing applications. A visible boundary determines the most significant difference between batch processing and stream processing. If it exists, it is called batch processing. For example, a client collects the data once every hour, sends this data to the server for statistics, and then saves the statistical results in the statistical database.
If the boundary doesn’t exist, the processing is called streaming data (stream processing). Here is an example of stream processing: logs and orders are generated continuously on a large website just like a data flow. If the processing of each log and order takes less than several hundred milliseconds or several seconds after its generation, the application is called a stream application. If the collection of logs and orders happens once every hour followed by a unified transmission, the original stream data converts into batch data.
Occasionally, stream processing becomes imperative. For example, Jack Ma wanted to display the orders and sales on Tmall for November 11 on a large screen. If the data center works in a T+1 mode and can obtain data for November 11 on November 12, Jack Ma would not be happy.

Part two is an overview of the architectural components used in Kafka:

Kafka uses the group concept to integrate the producer/consumer and publisher/subscriber models.
One topic may have multiple groups, and one group may include multiple consumers. Only one consumer in the group can consume one message. For different groups, consumers are in the publisher/subscriber model. All groups receive one message. 
Note: Allocate one partition to only one consumer in the same group. If there are three partitions and four consumers in one of the groups, one consumer is redundant and cannot receive any data.

This looks to be the start to a good series.

Comments closed

Integrating PowerApps With Power BI

Wolfgang Strasser continues a series on the PowerPlatform with a post showing how to integrate an existing PowerApp with Power BI:

When creating a new PowerApp using the Power BI integration, you get an additional data source – PowerBIIntegration that serves as the connection to the Power BI report. Whenever a filtering action occurs in the Power BI report, this information is available in this property.
During the PowerApps creation action I selected the action to add a new form which in the next step needs to get a connection to the Article table (which holds the additional article details).

Check out the entire series too.

Comments closed

The Kitchen Sink Powershell Prompt

Jeffery Hicks shares a heavy-duty Powershell prompt:

Since it is Friday and time for some more PowerShell fun, and I’ve been sharing some of my prompt functions, I thought I’d re-share my kitchen sink prompt. This PowerShell prompt function does *a lot* to things and gives you a snapshot view of your system everytime you press enter. It will work cross-platform, but because the function is using Get-CimInstance to retrieve system information it needs Windows. The prompt function will not only customize the onscreen prompt but also the title bar.

Click through for a link to the prompt as well as seeing it in action.

Comments closed

Working With Files In Azure Cloud Shell

Melissa Coates shows us how to export a Power BI report to Azure using Azure Cloud Shell:

Cloud Shell is a lightweight way to run scripts using either Bash or PowerShell. You can run scripts in a browser using the Azure portal or shell.azure.com, with the Azure mobile app, or using the VS Code Azure Account extension. If you have seen the “Try it now” links in Azure documentation pages, that will direct you to use Cloud Shell.

The rest of this post focuses on using PowerShell with Cloud Shell.

Click through for the demo.

Comments closed

Azure Data Lake Storage Generation 2

James Baker announces updates to Azure Data Lake Storage Gen2:

As we’ve discussed many times, the performance of the storage layer has an outsized impact on the total cost of ownership (TCO) for your complete analytics pipeline. This is due to the fact that every percentage point improvement in storage performance results in that same percentage reduction in the requirement for the very expensive compute layer. Given that the disaggregated storage model allows us to scale compute and storage independently, that percentage reduction in compute requirement results in almost the same (compute typically equates to 90 percent of the TCO) reduction in TCO.
So, when I say that ADLS Gen2 provides performance improvements ranging from 10-50 percent, depending on the nature of the workload over existing storage solutions, this equates to VERY significant reductions in the monthly analytics spend. It also has the added benefit of providing your insights sooner!

Check out all of the changes.

Comments closed