Press "Enter" to skip to content

Day: February 8, 2021

Connecting Confluent and Databricks on Azure

Angela Chu, et al, take us through a streaming data ingestion process:

How do you process IoT data, change data capture (CDC) data, or streaming data from sensors, applications, and sources in real time? Apache Kafka® and Azure Databricks are widely adopted technologies in the industry, but they require specific skills and expertise to run. Leveraging Confluent Cloud and Azure Databricks as fully managed services in Microsoft Azure, you can implement new real-time data pipelines with less effort and without the need to upgrade your datacenter (or set up a new one).

This blog post demonstrates how to configure Azure Databricks to interact with Confluent Cloud so that you can ingest, process, store, make real-time predictions and gain business insights from your data.

Click through for a detailed demonstration.

Comments closed

Secure Cluster Connectivity in Azure Databricks

Abhinav Garg and Premal Shah have an announcement:

We’re excited to announce the general availability of Secure Cluster Connectivity (also commonly known as No Public IP) on Azure Databricks. This release applies to Microsoft Azure Public Cloud and Azure Government regions, in both Standard and Premium pricing tiers. Hundreds of our global customers including large financial services, healthcare and retail organizations have already adopted the capability to enable secure and reliable deployments of the Azure Databricks unified data platform. It allows them to securely process company and customer data in private Azure Virtual Networks, thus satisfying a major requirement of their enterprise governance policies.

Read on fore more detail about how this works.

Comments closed

Windows Authentication Across Domains

Daniel Hutmacher shows three methods for connecting to a remote SQL Server instance on a different domain:

A jump box is a virtual desktop on the client’s domain that you can connect to using Remote Desktop. You’d obviously have to ask somebody for access to one, and you’d have to set up your development environment from scratch. This may not be a big issue if you’re in SSMS all of the time, but when you need the Power BI Desktop, Excel or even Visual Studio, this setup can take some time (not to mention asking for local admin credentials on the jump box).

A fourth option is to run the executable with runas and /netonly, like:

runas /user:domain\username ssms.exe /netonly

Comments closed

Testing Power BI Report Server Datasources

Aaron Nelson has some cmdlets for us:

I finally did it. I created a function I’ve been wanting to be able to use for *years*. Test-RsRestItemDataSource is here.

I can’t tell you how many times I’ve started to work on a report I was told was working, only to find the connection info was invalid. This wastes valuable time, especially when you’ve already made changes to the report.

Other times, I’ve been asked to figure out why a bunch of subscriptions weren’t working, only to find out it was a simple connection issue. I’ve always wanted a simple PowerShell command to check the credentials of a bunch of reports before I touch anything.

Turns out, it wasn’t that hard to build at all. I only wish I had built it years ago.

Click through to see an example of this, as well as two more cmdlets.

Comments closed

Memory Grant Internals

Deepthi Goguri has two posts for us. First up, we learn about queries spilling to disk:

During the build time, the memory is allocated based on the cardinality estimates and the estimates are based on the input size and the probe. The buckets are created in the memory and we place the rows in those respective buckets. If the grant is exceeded then some of the buckets will be send to the disk. As some of the buckets are already in memory and some in disk, the initial probe of the data using the inner set. The probe rows will be scanned if the row hash to the in memory bucket the match is done for those rows. If the rows match to the on disk bucket, the probe row will be return to the disk along with the outer side bucket. So, because of this we have more disk writes to tempdb at the probe time. This is a build side spill.

Then, Deepthi wraps up this series with a bit of balance:

If this final part of the series, lets talk about how we balance these memory grants. If lot of your queries are requesting for more memory than they required, then that causes concurrency issues. Memory grants are allocated to a any query even before they are executed based on the cardinality estimates and based on the memory consuming iterators. Once the query memory is allocated to a query, that memory is only released once the query execution completes even if the query actually uses a part of the allocated query. If more memory is requested by the queries than they need, then we are wasting the memory. What happens if the queries only receive less memory grants than they need, then we there is a performance impact on the running queries.

Read on for ways to fix excessive or insufficient memory grant problems.

Comments closed