Press "Enter" to skip to content

Month: December 2018

Tee-Object In Powershell

Adam Bertram has a paean in honor of Tee-Object:

What does Tee-Object  do anyway? Tee-Object basically kills two birds with one stone. This cmdlet redirects output from a PowerShell command and either saves it to a file or to a variable while also returning the output to the pipeline. It allows a scripter to save the output and send the output across the pipeline all in one shot.
Let’s say a scripter wants to send some text to a file and then perform kind of task on that same text afterward. Perhaps we’re pulling a list of Windows services from a computer and saving that output to a text file to create a snapshot of the before state before we attempt to start the services. This task can be done in two separate steps or with a single one with Tee-Object .

I’ve used it several times in Powershell as well as the tee command in Linux.  It’s great when you need to do several things with the same data but don’t want to break out of your pipeline.

Comments closed

A Graph Database Of US Capitals

James Livingston has a graph database to share:

While there’s countless relational databases out there for practice, there’s not much in the way of graph databases. It is my intent to share my graph databases with the world in hopes that it removes the friction associated with your learning.
US Capitals is a popular data set for working with graphs. Nodes identify a state capital. An edge connects a capital in one state with the capital of a neighboring state. Only the lower 48 states are present. While the data is readily available, I was unable to find TSQL scripts to create the graph using SQL Server 2017 graph database. I created those scripts and have made them readily available on GitHub.

I’m interested in the forthcoming post on Dijkstra’s algorithm; I think the last time I saw that was my undergrad days.

Comments closed

Tips For Migrating SSISDB

Kenneth Fisher shares some thoughts on SSISDB:

We’ve been doing a lot of upgrading recently and at one point had to move an instance from one 2016 server to another. In the process, we found out (the hard way) that it’s not that easy to move SSISDB (the SSIS Catalog that may or may not be named SSISDB). I mean it’s not hard, but it’s definitely not a basic backup/restore. The full BOL instructions on how to do this are here. That said, here are the elements that are involved.

Read on for the list as well as an order of operations.

Comments closed

Experimenting With Power BI Data Privacy Settings

Chris Webb takes us through some of the intricacies of Power BI data privacy settings and what it means when data sets are Private:

Not only does it only show the Package Search endpoint, there is a warning that says:
“Some data sources may not be listed because of hand-authored queries”
This refers to the output step in the query that calls the Package Show endpoint with the dynamically-generated url.
Closing this dialog and going back to the Query Editor, if you click the Edit Credentials button, you can set credentials for the data source (anonymous access is fine in this case). These credentials can be set at all levels in the path down to https://data.gov.uk/api/3/action/package_search.

Read the whole thing.

Comments closed

Using Slicers In Power BI To Filter Chart Categories

Prathy Kamasani shows how you can use slicers in Power BI to filter out specific categories in a line chart:

The logic is to create a table with the DAX function UNION. Each Table expression in UNION function represents a value of slicer. Apart from that slicer related value, all the rest of the values are blanks.  It is key to have them as blanks than zero’s, we don’t see any data.

In other words, pivoting the table to turn one measure with several different category values into one measure per category.  If you know the number of categories (4 in this case), this solution can work well for you.

Comments closed

Making dbatools Load Faster

Chrissy LeMaire has found a technique to reduce the load time for dbatools:

I don’t know the in-depth internals of Import-Module, but I know that importing a DLL filled with C# cmdlets is extremely fast. For instance, Microsoft’s SqlServer module imports 100 commands in less than a second. Sometimes it’s closer to half a second!
I wondered if we could somehow compile our commands into a C# binary but that seemed far-fetched. One thing we could do, though, is use compression! It works for SQL Server backups, could it work for PowerShell?

Spoilers:  the answer is “yes.”  Read on to see how much faster the initial import gets.

Comments closed

Building A Secure Microservice Which Uses Kafka Streams

George Vetticaden has a reference architecture with sample code for a secured microservice running atop Kafka Streams:

One of the key benefits of using Kafka Streams over other streaming engines is that the stream processing apps / microservices don’t need a cluster. Rather, each microservice can be run as a standalone app (e.g: jvm container). You can then spin multiple instances of each to scale up the microservice. Kafka will treat this as a single consumer group with multiple instances. Kafka streams takes care of consumer partition reassignments for scalability.
You can see how to start these three microservices here.

If you’re trying to wrap your head around streaming apps, give this a try.  George has all of the code in his GitHub repo.

Comments closed

Using Databricks Delta In Lieu Of Lambda Architecture

Jose Mendes contrasts the Lambda architecture with the Databricks Delta architecture and gives us a quick example of using Databricks Delta:

The major problem of the Lambda architecture is that we have to build two separate pipelines, which can be very complex, and, ultimately, difficult to combine the processing of batch and real-time data, however, it is now possible to overcome such limitation if we have the possibility to change our approach.
Databricks Delta delivers a powerful transactional storage layer by harnessing the power of Apache Spark and Databricks File System (DBFS). It is a single data management tool that combines the scale of a data lake, the reliability and performance of a data warehouse, and the low latency of streaming in a single system. The core abstraction of Databricks Delta is an optimized Spark table that stores data as parquet files in DBFS and maintains a transaction log that tracks changes to the table.

It’s an interesting contrast and I recommend reading the whole thing.

Comments closed

An Overview Of Apache Kafka

Leona Zhang has a series going on Apache Kafka.  Part one covers some of the concepts around messaging systems:

There is a difference between batch processing applications and stream processing applications. A visible boundary determines the most significant difference between batch processing and stream processing. If it exists, it is called batch processing. For example, a client collects the data once every hour, sends this data to the server for statistics, and then saves the statistical results in the statistical database.
If the boundary doesn’t exist, the processing is called streaming data (stream processing). Here is an example of stream processing: logs and orders are generated continuously on a large website just like a data flow. If the processing of each log and order takes less than several hundred milliseconds or several seconds after its generation, the application is called a stream application. If the collection of logs and orders happens once every hour followed by a unified transmission, the original stream data converts into batch data.
Occasionally, stream processing becomes imperative. For example, Jack Ma wanted to display the orders and sales on Tmall for November 11 on a large screen. If the data center works in a T+1 mode and can obtain data for November 11 on November 12, Jack Ma would not be happy.

Part two is an overview of the architectural components used in Kafka:

Kafka uses the group concept to integrate the producer/consumer and publisher/subscriber models.
One topic may have multiple groups, and one group may include multiple consumers. Only one consumer in the group can consume one message. For different groups, consumers are in the publisher/subscriber model. All groups receive one message. 
Note: Allocate one partition to only one consumer in the same group. If there are three partitions and four consumers in one of the groups, one consumer is redundant and cannot receive any data.

This looks to be the start to a good series.

Comments closed

Integrating PowerApps With Power BI

Wolfgang Strasser continues a series on the PowerPlatform with a post showing how to integrate an existing PowerApp with Power BI:

When creating a new PowerApp using the Power BI integration, you get an additional data source – PowerBIIntegration that serves as the connection to the Power BI report. Whenever a filtering action occurs in the Power BI report, this information is available in this property.
During the PowerApps creation action I selected the action to add a new form which in the next step needs to get a connection to the Article table (which holds the additional article details).

Check out the entire series too.

Comments closed