Press "Enter" to skip to content

Category: Cloud

Parameterizing Queries with Amazon Athena

Blayze Stefaniak, et al, architect a service to provide data via Amazon Athena:

Customers tell us they are finding new ways to make effective use of their data assets by providing data as a service (DaaS). In this post, we share a sample architecture using parameterized queries applied in the form of a DaaS application. This is helpful for many types of organizations, whether you’re working with an enterprise making data available to other lines of business, a regulator making reports available to your industry, a company monetizing your data assets, an independent software vendor (ISV) enabling your applications’ tenants to query their data when they need it, or trying to share data at scale in other ways. In DaaS applications, you can provide predefined queries to run against your governed datasets with values your users input. You can expand your DaaS application to break away from monolithic data infrastructure by treating data as a product (DaaP) and providing a distribution of datasets, which have distinct domain-specific data pipelines. You can authorize these datasets to consumers in your DaaS application permissions. You can use Athena parameterized queries as a way to predefine your queries, which you can use to run queries across your datasets, and serve as a layer of protection for your DaaS applications. This post first describes how parameterized queries work, then applies parameterized queries in the form of a DaaS application.

Click through to learn how.

Comments closed

Visualizing Kafka Stream Lineage

David Araujo and Julia Peng show off stream lineage in Confluent Cloud:

Stream Lineage is a tool Confluent built to address the lack of data visibility in Kafka and event-driven architectures. Confluent’s Stream Lineage provides an interactive map of all your data flows that enable users to:

1. Understand what data flows are running both now or at any point in the past

2. Trace where each data flow originated from

3. Track how data is transformed along its journey

4. Observe where each data flow ends up

Read on to see how it works.

Comments closed

Removing a Data Disk from a Running Azure VM

Joey D’Antoni tightrope walks without a net for fun:

I was working with a client recently, were we had to reconfigure storage within a VM (which is always a messy proposition). In doing so, we were adding and removing disks from the VM. this all happened mostly during a downtime window, so it wasn’t a big deal to down a VM, which is how you can remove a disk from a VM via the portal. However, upon further research, I learned that through the portal you can remove a disk from a running VM.

Read on to see how. Though I’d generally still recommend shutting the VM off first just to be sure.

Comments closed

Summarizing Data & AI Summit Announcements

Zach Stagers hits the high notes:

One of the biggest cheers of the keynote was that Delta is being fully open sourced! Databricks continue to share their incredible work to help drive our industry forward. Delta already has wide adoption, but with the open sourced version now being levelled up to the same standard as the ‘proprietary’ one, this should help cement it as the default choice for lake-based storage.

There were some announcements of things to come with Delta too, such as a optimised deletes and updates by removing single rows instead of having to completely rewrite the file. It’ll be really interesting to see how this works, and just how much it boosts performance.

Read on for more notes on several big announcements.

Comments closed

Date-Time Binning in Cosmos DB

Hasan Savran bins some data:

I wrote about the Date_Bucket() function in SQL Server a couple weeks ago. Azure Cosmos DB team announced the same functionality with a different name DateTimeBin() function. It works exactly the same with the Date_Bucket() function of SQL Server.

     Cosmos DB version of the function has the same number of parameters. The order is different. All the datatime parameters must be in ISO 8601 format (YYYY-MM-DDThh:mm:ss.fffffffZ)

Read on to see how it works.

Comments closed

Data Lakehouse Cleanrooms in Databricks

Matei Zaharia, et al, announce an interesting idea:

We are excited to announce data cleanrooms for the Lakehouse, allowing businesses to easily collaborate with their customers and partners on any cloud in a privacy-safe way. Participants in the data cleanrooms can share and join their existing data, and run complex workloads in any language – Python, R, SQL, Java, and Scala – on the data while maintaining data privacy.

With the demand for external data greater than ever, organizations are looking for ways to securely exchange their data and consume external data to foster data-driven innovations. Historically, organizations have leveraged data sharing solutions to share data with their partners and relied on mutual trust to preserve data privacy. But the organizations relinquish control over the data once it is shared and have little to no visibility into how data is consumed by their partners across various platforms. This exposes potential data misuse and data privacy breaches. With stringent data privacy regulations, it is imperative for organizations to have control and visibility into how their sensitive data is consumed. As a result, organizations need a secure, controlled and private way to collaborate on data, and this is where data cleanrooms come into the picture.

Read on to learn more about how this all works. It’s definitely a lot better than sending off a bunch of CSVs…

Comments closed

Triggering a Power BI Dataset Refresh from Synapse

Nick Edwards updates a dataset:

Login to powerbi.com and in the top right hand corner locate “Settings” and then “Admin portal”

Under “Tenant settings” locate “Developer Settings” and then “Allow service principles to user Power BI APIs”.

Set this service to “Enabled” using the toggle. Next under the heading “Apply to:” select “Specific security groups (Recommended)”. Next add the newly created security group “AzureSynapsePowerBIIntegration” and click apply.

Click through for the full process.

Comments closed

SharePoint Lists Showing 100 Items in Logic Apps

Koen Verbeeck needs more than 100 results:

I was reading a SharePoint List using the “Get Items” activity in an Azure Logic App. I explain how you can create such a Logic App in the blog post Reading a SharePoint List with Azure Logic App.

It all worked fine for a while, but recently the list grew larger than 100 items. Suddenly, I started getting complaints that some items didn’t make it into the data warehouse. What’s going on? I ran the Logic App and I could see only 100 items were inserted into the SQL Server table:

Read on to see how you can bump that number past 100.

Comments closed