Cloud – Page 86 – Curated SQL

Interchangability between ADF and Synapse Integration Pipelines

Published 2021-09-09 by Kevin Feasel

Inspired by an earlier blog where we looked at ‘How Interchangeable Delta Tables Are Between Databricks and Synapse‘ I decided to do a similar exercise, but this time with the integration pipeline components taking centre stage.
As I said in my previous blog post, the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based data platform solution so to answer it directly:

Read on to learn the answer.

Comments closed

Attributing Redshift Costs to Users

Published 2021-09-08 by Kevin Feasel

Jason Pedreza, et al, show how you can break down query utilization by user in an Amazon Redshift database:

At its simplest form, cost attribution can be determined using the amount of the storage assigned to the individual objects using the ownership of the objects to the groups. But the downside of this approach is it doesn’t provide a true translation of the resource usage. For example, let’s say Team 1 has total object size of 1 TB, whereas Team 2 has 100 GB in total size. Team 1 member runs 10 queries daily, and Team 2 runs 1,000 queries per day. Of course, Team 2 uses more resources than Team 1.
The Amazon Redshift RA3 architecture allows you to pay for the compute and data warehouse storage capacity separately, therefore storage doesn’t reflect the resources used by the teams for the cost attribution.

Click through to see how.

Comments closed

An Overview of Function-as-a-Service

Published 2021-09-07 by Kevin Feasel

Grace Ol’Halloran lays out the basics of serverless computing in cloud platforms:

The term serverless computing can be misleading; how can you compute things without a server? Well, the answer is that you don’t. The term “serverless” comes from the idea that the server is abstracted from the developer, and is totally maintained by the cloud provider. In other words, the developer doesn’t really care what environment their code is run in; they just need it hosted somewhere where it can be executed. This removes the responsibility of infrastructure configuration and maintenance from the developer, but naturally gives them less flexibility and control over the environment.

It took me watching several presentations before I really understood the value behind serverless compute.

Comments closed

Troubleshooting Microsoft.Purview not Registered

Published 2021-09-03 by Kevin Feasel

Wolfgang Strasser investigates an issue:

In my last Azure Purview Quickstart video (#3 – Create an Azure Purview Account – link), I’ve shown you how to create a new Azure Purview account.
And what pre-prepared demos have in common, well – it “just” works there

BUT: there are some requirements that need to be configured beforehand, in order to create an Azure Purview Account.
Basically, problems during the creation process can be listed to:
– Security / permissions
– Missing Resource providers

Read on to learn more about permissions requirements and how to deal with these issues as they arise.

Comments closed

Azure Database for PostgreSQL Replicas

Published 2021-09-03 by Kevin Feasel

Gauri Mahajan takes us through replica creation in Azure Database for PostgreSQL:

Azure Database for PostgreSQL is an Azure offering of the open-source Postgres database. As there are many databases and data warehouses that are derived from Postgres, during migration from Postgres to a different flavor of another database or data warehouse that is compatible with Postgres, often read replicas are employed. The replicas are read-only since it’s a one-way replication from the master database to replicas. And replicas serve the purpose of decreasing the load on the primary transactional database in production environments. Replicas are typically used as migration sources, reporting and ad-hoc analytics sources and for other purposes. Let’s go ahead and learn to create and manage read replicas in Azure Database for PostgreSQL.

Click through for the process.

Comments closed

Receiving Notifications on Cosmos DB 429 Errors

Published 2021-08-31 by Kevin Feasel

Hasan Savran wants to remain in the loop:

Developers like to know when things go wrong in applications. It is an easy and simple solution to send an email when a bad error occurs. Things can go wrong easily in Cosmos Db, one of the most common error you will get from Cosmos DB is “Request rate too large (429)” exception. This error says that you do not have enough request units to run a query. This error usually occurs in peak times. Usually cause of getting 429 errors is the configuration of Request Units settings. You need to scale up your application or optimize your queries.
It takes more time to retrieve data from Cosmos DB when error 429 occurs. You should get notification when this occurs, but you do not want to get an email each time it occurs either. 1- 5% of requests with 429 is acceptable. You can always open the Cosmos DB Monitoring tools and keep eye on it, or you can create Cosmos DB Alerts to get emails.

Click through for a demonstration of how to use Cosmos DB Alerts.

Comments closed

Security Breach in Cosmos DB: ChaosDB

Published 2021-08-27 by Kevin Feasel

Nir Ohfeld and Sagi Tzadik discovered a flaw in Azure Cosmos DB:

Nearly everything we do online these days runs through applications and databases in the cloud. While leaky storage buckets get a lot of attention, database exposure is the bigger risk for most companies because each one can contain millions or even billions of sensitive records. Every CISO’s nightmare is someone getting their access keys and exfiltrating gigabytes of data in one fell swoop.
So you can imagine our surprise when we were able to gain complete unrestricted access to the accounts and databases of several thousand Microsoft Azure customers, including many Fortune 500 companies. Wiz’s security research team (that’s us) constantly looks for new attack surfaces in the cloud, and two weeks ago we discovered an unprecedented breach that affects Azure’s flagship database service, Cosmos DB.

Read on for details about the attack. Microsoft has already mitigated the issue by disabling the functionality necessary to pull off the attack. H/T Ben Stegink.

Comments closed

Multi-Cloud Pros and Cons

Published 2021-08-27 by Kevin Feasel

James Serra lays out some of the benefits and drawbacks of using multiple cloud providers:

A discussion I have seen many companies have is if they should be single-cloud (using only one cloud company) or multi-cloud (using more than one cloud company). The three major Cloud Service Providers (CSPs) that companies use for nearly all use cases are Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP).

Without spoiling it too much, James is not really sold on the idea.

Comments closed

Migrating Azure Analysis Services to Power BI PPU: the Conclusion

Published 2021-08-25 by Kevin Feasel

Gilbert Quevauvilliers wraps up a series:

This is my final post and conclusion in my blog series to look at migrating from AAS to PPU.
I have really enjoyed the series and I have learnt a lot along the way, I hope you enjoyed following along!

Read on for a nice graphic showing each step along the way.

Comments closed

Changing the Slow Query Log Threshold in RDS

Published 2021-08-20 by Kevin Feasel

John McCormack wants to know about those slow queries:

The slow query log will record all queries which are above the threshold level. The default value is 10 (seconds) but you can set it higher or lower depending on your requirements. It is useful for finding slow queries and allows you to pick out candidates for tuning.
If you set the threshold too low, it can increase I/O overhead on your instance and use a lot of valuable disk space. If you set it too high, it might not capture enough useful information.

This is a setting in AWS Relational Database Services and mimics functionality in MySQL

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Category: Cloud