Press "Enter" to skip to content

Category: Cloud

Environments in Azure ML

Luis Valencia explains what environments are in Azure ML:

An Environment defines Python packages, environment variables, and Docker settings that are used in machine learning experiments, including in data preparation, training, and deployment to a web service. An Environment is managed and versioned in an Azure Machine Learning Workspace. You can update an existing environment and retrieve a version to reuse. Environments are exclusive to the workspace they are created in and can’t be used across different workspaces.

In basic terms for a developer, it’s basically a Docker Image with all the needed dependencies (conda/pip packages) to run your experiment.

A friendly word of advice from some bad experiences: stick with the curated environments as much as you can. Those are easy and rarely fail. Building your own environments from Conda files is a possibility, but it’s an, err, probabilistic exercise as to whether your compute target will actually work or not.

Comments closed

AWS EC2 I3 Instance Types and Storage Persistence

Steve REzhener has a warning for us:

Amazon Web Services Elastic Cloud Computing (a.k.a. EC2)  is a service that lets anyone with a credit card rent a virtualized server from Amazon. To cater to different clients’ needs, AWS provides various instance types that are either general instance or specific-purpose instances (focused on CPU, RAM, IO). You can see the different types in Fig 1. This blog post is going to talk about a storage optimized instance. the I3 instance type family, its little-known problem, and the solution in the form of  Elastic Block Storage (a.k.a. EBS).

Click through for the warning, more explanation, and what you can do about it. H/T the SQLServerCentral newsletter.

Comments closed

Patched Security Flaw in Azure Container Instances

Ionut Ilascu reports on a vulnerability:

Microsoft has fixed a vulnerability in Azure Container Instances called Azurescape that allowed a malicious container to take over containers belonging to other customers on the platform.

An adversary exploiting Azurescape could execute commands in the other users’ containers and gain access to all their data deployed to the platform, the researchers say.

This is fixed now, but it’s a good reminder that platform-as-a-service offerings can still have security problems (as we’ve also seen recently with Power Apps and Cosmos DB).

Comments closed

Databricks Serverless SQL

Nikhil Jethava and Kevin Clugage announce serverless SQL on Databricks:

Databricks SQL already provides a first-class user experience for BI and SQL directly on the data lake, and today, we are excited to announce another step in making data and AI simple with Databricks Serverless SQL. This new capability for Databricks SQL provides instant compute to users for their BI and SQL workloads, with minimal management required and capacity optimizations that can lower overall cost by an average of 40%. This makes it even easier for organizations to expand adoption of the lakehouse for business analysts who are looking to access the rich, real-time datasets of the lakehouse with a simple and performant solution.

Under the hood of this capability is an active server fleet, fully managed by Databricks, that can transfer compute capacity to user queries, typically in about 15 seconds. The best part? You only pay for Serverless SQL when users start running reports or queries.

Things are getting interesting between Databricks and Azure Synapse Analytics, as both now have serverless SQL and Spark offerings. Synapse Analytics has the better implementation for serverless SQL and Databricks the superior Spark implementation, so it becomes a question of which weakness you take in order to gain the strength.

1 Comment

Interchangability between ADF and Synapse Integration Pipelines

Paul Andrew makes a discovery:

Inspired by an earlier blog where we looked at ‘How Interchangeable Delta Tables Are Between Databricks and Synapse‘ I decided to do a similar exercise, but this time with the integration pipeline components taking centre stage.

As I said in my previous blog post, the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based data platform solution so to answer it directly:

Read on to learn the answer.

Comments closed

Attributing Redshift Costs to Users

Jason Pedreza, et al, show how you can break down query utilization by user in an Amazon Redshift database:

At its simplest form, cost attribution can be determined using the amount of the storage assigned to the individual objects using the ownership of the objects to the groups. But the downside of this approach is it doesn’t provide a true translation of the resource usage. For example, let’s say Team 1 has total object size of 1 TB, whereas Team 2 has 100 GB in total size. Team 1 member runs 10 queries daily, and Team 2 runs 1,000 queries per day. Of course, Team 2 uses more resources than Team 1.

The Amazon Redshift RA3 architecture allows you to pay for the compute and data warehouse storage capacity separately, therefore storage doesn’t reflect the resources used by the teams for the cost attribution.

Click through to see how.

Comments closed

An Overview of Function-as-a-Service

Grace Ol’Halloran lays out the basics of serverless computing in cloud platforms:

The term serverless computing can be misleading; how can you compute things without a server? Well, the answer is that you don’t. The term “serverless” comes from the idea that the server is abstracted from the developer, and is totally maintained by the cloud provider. In other words, the developer doesn’t really care what environment their code is run in; they just need it hosted somewhere where it can be executed. This removes the responsibility of infrastructure configuration and maintenance from the developer, but naturally gives them less flexibility and control over the environment.

It took me watching several presentations before I really understood the value behind serverless compute.

Comments closed

Troubleshooting Microsoft.Purview not Registered

Wolfgang Strasser investigates an issue:

In my last Azure Purview Quickstart video (#3 – Create an Azure Purview Account – link), I’ve shown you how to create a new Azure Purview account.

And what pre-prepared demos have in common, well – it “just” works there 

BUT: there are some requirements that need to be configured beforehand, in order to create an Azure Purview Account.

Basically, problems during the creation process can be listed to:

– Security / permissions

– Missing Resource providers

Read on to learn more about permissions requirements and how to deal with these issues as they arise.

Comments closed

Azure Database for PostgreSQL Replicas

Gauri Mahajan takes us through replica creation in Azure Database for PostgreSQL:

Azure Database for PostgreSQL is an Azure offering of the open-source Postgres database. As there are many databases and data warehouses that are derived from Postgres, during migration from Postgres to a different flavor of another database or data warehouse that is compatible with Postgres, often read replicas are employed. The replicas are read-only since it’s a one-way replication from the master database to replicas. And replicas serve the purpose of decreasing the load on the primary transactional database in production environments. Replicas are typically used as migration sources, reporting and ad-hoc analytics sources and for other purposes. Let’s go ahead and learn to create and manage read replicas in Azure Database for PostgreSQL.

Click through for the process.

Comments closed

Receiving Notifications on Cosmos DB 429 Errors

Hasan Savran wants to remain in the loop:

Developers like to know when things go wrong in applications. It is an easy and simple solution to send an email when a bad error occurs. Things can go wrong easily in Cosmos Db, one of the most common error you will get from Cosmos DB is “Request rate too large (429)” exception. This error says that you do not have enough request units to run a query. This error usually occurs in peak times. Usually cause of getting 429 errors is the configuration of Request Units settings. You need to scale up your application or optimize your queries.
     It takes more time to retrieve data from Cosmos DB when error 429 occurs. You should get notification when this occurs, but you do not want to get an email each time it occurs either. 1- 5% of requests with 429 is acceptable. You can always open the Cosmos DB Monitoring tools and keep eye on it, or you can create Cosmos DB Alerts to get emails.

Click through for a demonstration of how to use Cosmos DB Alerts.

Comments closed