Month: May 2020

Securing a Kafka Deployment

Published 2020-05-29 by Kevin Feasel

Rick Spurgeon and Nikoleta Verbeck give us some tips on securing Apache Kafka:

Confluent Platform provides Role-Based Access Control (RBAC), which addresses the gaps listed above. RBAC and ACLs perform similar functions but approach the problem differently and can be used independently or cooperatively. RBAC is powered by Confluent’s Metadata Service (MDS), which integrates with LDAP and acts as the central authority for authorization and authentication data. RBAC leverages role bindings to determine which users and groups can access specific resources and what actions can be performed within those resources (roles). RBAC is empowered on the Kafka cluster by way of Confluent Server. Confluent Server is a fully compatible Kafka broker which integrates commercial security features like RBAC.

The list of items is pretty straightforward with no major surprises.

Comments closed

Pandas UDFs and Python Type Hints in Spark 3.0

Published 2020-05-29 by Kevin Feasel

Hyukjin Kwon announces some updates forthcoming in Apache Spark 3.0:

The Pandas UDFs work with Pandas APIs inside the function and Apache Arrow for exchanging data. It allows vectorized operations that can increase performance up to 100x, compared to row-at-a-time Python UDFs.
The example below shows a Pandas UDF to simply add one to each value, in which it is defined with the function called pandas_plus_one decorated by pandas_udf with the Pandas UDF type specified as PandasUDFType.SCALAR.

Click through for explanations and demos for each.

Comments closed

Switching Between Solution and Folders Views with SSRS Projects

Published 2020-05-29 by Kevin Feasel

Elizabeth Noble deals with a head-scratcher:

This past week, I made the goal of automating the deployment of our first SSRS report at work. I created the report and after adding the report to source control, my Object Explorer looked like the image below.
I added my solution to source control and synced the project up to Github. However, when my colleague tried to clone the repo and open the Report Project, they saw an image like the one below.

Click through for the images (which really tell the story) and how you can fix this problem.

Comments closed

Passing Around JSON Arrays in Azure Data Factory

Published 2020-05-29 by Kevin Feasel

Rayis Imayev continues a series on JSON in Azure Data Factory:

It’s not a new thing to know that we can reference nested elements of ADF activities’ output since it’s represented in JSON format or pass the JSON file content to other tasks/components that can process this format.

But what if you need to pass a complete output of your ADF activity task further down your pipeline. Or you need to pass a JSON array elements to another ADF activity or sub-pipeline as the parameter value. Let’s explore what other options available in Azure Data Factory for this very interesting use case.

Read on for the demo.

Comments closed

How SQL Server Stores the Money Data Type

Published 2020-05-29 by Kevin Feasel

Randolph West wants to talk to us about how we use our MONEY:

While MONEY and DECIMAL(19,4) are functionally the same, they are stored differently on disk, and this is where it gets interesting. Let’s use a random amount of $ 4,513.19. Since it’s small enough to fit in both MONEY and SMALLMONEY, we can do a simple experiment. When we ask SQL Server to store this value in a MONEY data type, it will store it (byte-reversed) as 0x7CA8B00200000000. The SMALLMONEY version of this amount would look almost identical, stored as 0x7CA8B002 (without the leading zeroes). A quick look at this byte-reversed value (0x02B0A87C) in a hex calculator gives us the amount of 45,131,900. After moving the decimal point four places to the left, we get our starting value of 4513.1900.

But read on to see how that compares to other data types.

Comments closed

Custom Power BI Maps

Published 2020-05-29 by Kevin Feasel

Alice Drummond has two articles on improving the map experience in Power BI. First up is creating drill-down maps:

But why would we want to drill down on a map?
See the big picture: Using drill down, we can visualise the dataset at the ‘big picture’ scale, and then explore finer details for areas of interest (just think of the classic spatial hierarchy of Country à State à City à Suburb)
Save on real estate: We can also display multiple datasets at different levels in the same map – which goes a long way to conserving your precious report page real estate by reducing the number of visuals in your report!
Performance: Following on from above, less visuals almost always equates to better report performance! And this is especially true when using MapBox, which is at the slower end of the Power BI performance scale at the best of times – so using drill down to reduce the number of maps combined with filtering your datasets goes a long way to speeding up your reports!

The second post is all about styling choropleth maps:

If you’ve played around with MapBox in Power BI – you’ll know that it has loads of great features to create really rich and beautiful maps, including some great ‘out of the box’ map styles (i.e. base maps). However, you might not be aware that it also gives you the ability to design custom styles with your own spatial layers. I discovered this feature on a recent project where my client wanted to include Victoria’s Catchment Management Authority (or CMA) boundaries on their base-map to provide greater context to help interpret their data. Up until this point, the only option I knew of was to purchase an expensive ArcGIS Online licence to create custom map styles. So, you can imagine just how excited I was when I discovered that I could also do this in MapBox – for FREE!!!

Click through for the posts and videos.

Comments closed

Feeding Databricks Output to Azure SQL Database

Published 2020-05-29 by Kevin Feasel

Arun Sirpal takes us through the process of moving data from Databricks into Azure SQL Database:

Recently I got to a stage where I leveraged Databricks to the best of my ability to join couple of CSV files together, play around some aggregations and then output it back to a different mount point ( based on Azure Storage) as a parquet file, I decided that I actually wanted to move this data into Azure SQL DB, which you may want to do one day.

This isn’t just dropping files into Blob Storage and picking them up, but rather a direct integration.

Comments closed

Stopping an ADF SSIS Integration Runtime

Published 2020-05-29 by Kevin Feasel

Andy Leonard shows how to shut off the Azure Data Factory SSIS integration runtime when you don’t need to use it:

I have blogged about stopping an Azure-SSIS IR in the past (Start Azure-SSIS, Execute Package, Stop Azure-SSIS). The pattern described in the Start Azure-SSIS, Execute Package, Stop Azure-SSIS post is sound and awesome. My only issue is the pipeline will report failure if the Azure-SSIS IR is not running when the Web Activity executes the Stop command. Mind you, the operation succeeds. But it reports failure.
Reporting failure when the operation succeeds seriously messes with my CDO (that’s OCD with the letters in the proper order).

Read on to see how to do this the best way.

Comments closed

SQL Agent Job Control Based on AG/Mirroring Status

Published 2020-05-28 by Kevin Feasel

Eitan Blumin has an interesting solution for us:

Lo and behold, my new and improved scripts, which implement the following logic:
Create a single Master Control Job with an hourly schedule, and also set it to run when SQL Agent is started up. The Master Control Job will do the following:
Automatically detect which jobs have steps that run on databases that are involved in an HADR solution.
For each such job, automatically detect whether there’s at least one step run on a database which currently has the Primary/Principle role.
If so, make sure it’s enabled. Otherwise, make sure it’s disabled.
Create an alert for a role/state change event, if such doesn’t exist yet, and set it to run the Master Control Job when triggered.

Click through for more details and links to scripts for Availability Group and database mirroring scenarios.

Comments closed

TDE and Managed Instances

Published 2020-05-28 by Kevin Feasel

Tim Radney notes a change to the way new Azure SQL Database Managed Instance databases are configured:

Another recent change is that all newly created databases have “Encryption Enabled” set to True. This enables Transparent Data Encryption. By default, if you haven’t created your own key, it will use the service-managed key. A bug I’ve found in SSMS is that even though you set “Encryption Enabled” to false, it will still enable TDE. This is not the behavior when using T-SQL.

Read on to understand the ramifications of this change.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31