Press "Enter" to skip to content

Month: September 2021

Data Lakehouse Point-of-Sale Analytics Demo

Bryan Smith and Rob Saker share a pattern:

Disruptions in the supply chain – from reduced product supply and diminished warehouse capacity – coupled with rapidly shifting consumer expectations for seamless omnichannel experiences are driving retailers to rethink how they use data to manage their operations. Prior to the pandemic, 71% of retailers named lack of real-time visibility into inventory as a top obstacle to achieving their omnichannel goals. The pandemic only increased demand for integrated online and in-store experiences, placing even more pressure on retailers to present accurate product availability and manage order changes on the fly. Better access to real-time information is the key to meeting consumer demands in the new normal.

In this blog, we’ll address the need for real-time data in retail, and how to overcome the challenges of moving real-time streaming of point-of-sale data at scale with a data lakehouse.

It’s a cool scenario, at the least.

Comments closed

Building a Deployment Pipeline with sqlpackage

Daniel Hutmacher builds a cloudless deployment pipeline:

I did some googling to see just how simple I could make a database deployment pipeline. I keep the DDL scripts in a git repository on the local network, but I can’t use Azure DevOps or any other cloud service, and I don’t have Visual Studio installed, so the traditional database project in SSDT that I know and love is unfortunately not an option for me.

So I googled a little, and here’s what I ended up doing.

Click through to see how Daniel was able to do it.

Comments closed

Fixing Remote Desktop Resolution to Match Screen Size

Kenneth Fisher fixes a problem:

One of the things I hate most about using remote desktop is when the resolution for the remote server is way off what the size of my window is. You end up with the scroll bars on the side and bottom of the screen and spend half your time moving the screen around so that you can find what you need. Open Notepad, scroll to the top of the screen so you can save it, then scroll to the bottom of the screen so you can switch to the Excel spreadsheet that’s hidden by the Notepad window, etc.

It turns out there is an easy setting that forces the resolution to match the window size. Even if you end up with really weird window sizes.

My fix historically was to use mRemoteNg, which did this for you.

Comments closed

Patched Security Flaw in Azure Container Instances

Ionut Ilascu reports on a vulnerability:

Microsoft has fixed a vulnerability in Azure Container Instances called Azurescape that allowed a malicious container to take over containers belonging to other customers on the platform.

An adversary exploiting Azurescape could execute commands in the other users’ containers and gain access to all their data deployed to the platform, the researchers say.

This is fixed now, but it’s a good reminder that platform-as-a-service offerings can still have security problems (as we’ve also seen recently with Power Apps and Cosmos DB).

Comments closed

Hash Spill Wait Stats

Erik Darling continues a series on interesting wait stats:

Hash spills are, as we’ll see, sometimes identified by a different wait than sort spills. In small quantities, spills are often not worth bothering with. But when they pile up, they can really cause some severe performance issues.

In this post, I want to show that both Hash Aggregate and Joins can cause the same wait type to show, along with some evidence that strings make things worse.

Click through for a good explanation of the wait stat you’re liable to see the most.

Comments closed

The Purview Data Map

Angela Henry wraps up a series:

Welcome to the final installment of our Azure Purview Series, focusing on Purview Data Map. The Purview Data Map is the foundation for data discovery and serves as the jumping-off point for all things data source related. It is the cloud-native platform-as-a-service (PaaS) that collects the metadata about your data sources and runs the scans to keep your metadata up to date.

Click through to see what’s available in Data Map and information on how to kick off and view scans.

Comments closed

magrittr’s Four Pipes

Gregory Janesch shows off the various pipes in the magrittr R package:

The magrittr package is a part of the extended tidyverse – i.e., not one of the ones normally loaded. It is the one that supplies the pipe operator (%>%), but it turns out that the package actually contains four pipe operators in total. All are intended to streamline and improve the readability of code, though the three non-basic ones are a bit more situational, and I’ve rarely seen them used, so I thought I would go into them a bit.

The CRAN page for magrittr is here; much of this post is based off of the package’s vignettes and documentation.

Click through for demonstrations of each. I’ve only seen the basic pipe in use as well, but the others look quite interesting and I can see use cases where knowing about them would be helpful. Also, note in the comments about the secret 5th pipe. H/T R-Bloggers

Comments closed

Databricks Serverless SQL

Nikhil Jethava and Kevin Clugage announce serverless SQL on Databricks:

Databricks SQL already provides a first-class user experience for BI and SQL directly on the data lake, and today, we are excited to announce another step in making data and AI simple with Databricks Serverless SQL. This new capability for Databricks SQL provides instant compute to users for their BI and SQL workloads, with minimal management required and capacity optimizations that can lower overall cost by an average of 40%. This makes it even easier for organizations to expand adoption of the lakehouse for business analysts who are looking to access the rich, real-time datasets of the lakehouse with a simple and performant solution.

Under the hood of this capability is an active server fleet, fully managed by Databricks, that can transfer compute capacity to user queries, typically in about 15 seconds. The best part? You only pay for Serverless SQL when users start running reports or queries.

Things are getting interesting between Databricks and Azure Synapse Analytics, as both now have serverless SQL and Spark offerings. Synapse Analytics has the better implementation for serverless SQL and Databricks the superior Spark implementation, so it becomes a question of which weakness you take in order to gain the strength.

1 Comment

Interchangability between ADF and Synapse Integration Pipelines

Paul Andrew makes a discovery:

Inspired by an earlier blog where we looked at ‘How Interchangeable Delta Tables Are Between Databricks and Synapse‘ I decided to do a similar exercise, but this time with the integration pipeline components taking centre stage.

As I said in my previous blog post, the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based data platform solution so to answer it directly:

Read on to learn the answer.

Comments closed