Press "Enter" to skip to content

Category: Cloud

CosmosDB Via Linked Server

Rolf Tesmer shows us how to connect to Azure CosmosDB using a linked server:

Recently I had a requirement to combine data that I already had in SQL Server (2016)with JSON document data already stored in Azure CosmosDB.  Both databases were operational and continuously accepting data so I didn’t want to go to the trouble of doing the delta load thing between them, instead I just wanted to be able to query directly on demand.

And so – the purpose of this article is to outline the method to connect direct to Azure CosmosDB from SQL Server using a SQL Linked Server.

Click through for the step-by-step details.  Ultimately, it’s a linked server connecting via ODBC, so nothing magical—but it is nice to see interoperability.

Comments closed

Serverless Lambda Architecture

Laith Al-Saadoon shows off a new Amazon Web Services product, AWS Glue, which allows you to build a data processing system on the Lambda architecture without directly provisioning any EC2 instances:

With the launch of AWS Glue, AWS provides a portfolio of services to architect a Big Data platform without managing any servers or clusters. AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (for example, the table definition and schema) in the AWS Glue Data Catalog. After it’s cataloged, your data is immediately searchable, queryable, and available for ETL.

AWS Glue generates the code to execute your data transformations and data loading processes. Furthermore, AWS Glue provides a managed Spark execution environment to run ETL jobs against a data lake in Amazon S3. In short, you can now run a Lambda Architecture in AWS in a completely 100% serverless fashion!

“Serverless” applications allow you to build and run applications without thinking about servers. What this means is that you can now stream data in real-time, process huge volumes of data in S3, and run SQL queries and visualizations against that data without managing server provisioning, installation, patching, or capacity scaling. This frees up your users to spend more time interpreting the data and deriving business value for your organization.

Laith has a working demo of the process available as well.

Comments closed

Instant Log Initialization In Azure

Dimitri Furman shows a benefit of creating database files with Azure Blob Storage:

Recently, we were working on a performance testing exercise using a SQL Server database with files in Azure Blob Storage. After creating the database using the default 8 MB size for data and log file (as in the example above), we wanted to increase the size of all files to be sufficient for the expected workload. IFI was not yet enabled for the SQL Server instance we were working with, and growing the data file from 8 MB to 1 TB took about one minute (using Premium Storage). This was expected, since the data file had to be fully initialized. We expected that the log growth to 1 TB would take about as much time, for the same reason. It was very surprising then that the same operation on the log file completed in less than one second.

It turns out that this is due to differences in Azure Blob Storage versus traditional storage systems.

Comments closed

Using Multiple Cosmos DB APIs

Vincent-Philippe Lauzon shows how to access graph data stored in Cosmos DB using the DocumentDB API:

Now here’s a little secret:  although we choose the “model” (e.g. Gremlin) at the Cosmos DB account level, we can use other models to query the data.

Not all combination are possible, but many are.  Specifically, we can query a Gremlin graph using DocumentDB / SQL query language.

The graph is then projected into documents.

We will explore that in this article.

Why is that interesting?  Because there are a lot of tools out there we might be familiar with to manipulate DocumentDB (or MongoDB).  Having to possibility to look at a Graph with other APIs extends our toolset from Gremlin-based ones.

That is interesting.

Comments closed

Using The Kubernetes Dashboard

Andrew Pruski shows how to set up and use the Kubernetes dashboard inside Azure Container Services:

But not only can existing objects be viewed, new ones can be created.

In my last post I created a single pod running SQL Server, I want to move on from that as you’d generally never just deploy one pod. Instead you would create what’s called a deployment.

The dashboard makes it really simple to create deployments. Just click Deployments on the right-hand side menu and fill out the details:

Check it out; this looks like a good way of managing Kubernetes on the small, or getting an idea of what it can do.

Comments closed

Integrating Active Directory: Local And Azure

Shannon Lowder sets up an on-prem Active Directory domain and links it to Azure Active Directory:

You’ll need to plan out your domain before you begin.  In my case, I already had my network configured to use 192.168.254.x. My Fiber router serves as my default gateway as well as my DHCP server and primary DNS server for my local network. My wireless access points, primary workstation, and printer are already set up for static IP addresses.  I have already set aside a subnet of addresses for static servers.  I also already own a domain name (toyboxcreations.net).  Having all this set up before trying to install my domain controller help by saving time.

Shannon glosses over the local AD part, but once that’s set up, shows how to tie it in with Azure Active Directory.

Comments closed

Azure SQL Data Warehouse Patterns

Murshed Zaman shows us a couple of patterns and anti-patterns for Azure SQL Data Warehouse:

Azure SQL DW is a Massively Parallel Processing (MPP) data warehousing service. It is a service because Microsoft maintains the infrastructure and software patching to make sure it’s always on up to date hardware and software on Azure. The service makes it easy for a customer to start loading their tables on day one and start running queries quickly and allows scaling of compute nodes when needed.

In an MPP database, table data is distributed among many servers (known as compute or slave nodes), and in many MPP systems shared-nothing storage subsystems are attached to those servers. Queries come through a head (or master) node where the location metadata for all the tables/data blocks resides. This head node knows how to deconstruct the query into smaller queries, introduce various data movement operations as needed, and pass smaller queries on to the compute nodes for parallel execution. Data movement is needed to align the data by the join keys from the original query. The topic of data movement in an MPP system is a whole another blog topic by itself, that we will tackle in a different blog. Besides Azure SQL DW, some other examples of a MPP data warehouses are Hadoop (Hive and Spark), Teradata, Amazon RedShift, Vertica, etc.

The opposite of MPP is SMP (Symmetric Multiprocessing) which basically means the traditional one server systems. Until the invention of MPP we had SMP systems. In database world the examples are traditional SQL Server, Oracle, MySQL etc. These SMP databases can also be used for both OLTP and OLAP purposes.

Murshed spends the majority of this blog post covering things you should not do, which is probably for the best.

Comments closed

CHECKDB On Azure SQL Database

Arun Sirpal ponders running DBCC CHECKDB on Azure SQL Database:

I was exchanging messages with Azure Support and even though I didn’t get a concrete answer to confirm this I ended up asking the question within a Microsoft based yammer group and yes they do automatically carry out consistency checks.

This is great but it is one less thing for me to worry about and if there is serious corruption, you know potential data loss (which would be rare) then they will definitely tell you and work with you.

However, it doesn’t mean you CAN’T run it, I was curious so I ran DBCC CHECKDB on my Azure SQL Databases, but like with any other consistency check it is best to do it OFF-PEAK hours. I would probably take it a step further and wouldn’t even bother running it.

It’s an interesting post, reminding us that administering an Azure database isn’t the same as on-prem.

Comments closed

Azure SQL Database Multi-Factor Authentication

Arun Sirpal notes that the latest version of SQL Server Management Studio supports Multi-Factor Authentication with Azure Active Directory:

Quite a mouth full for a title but never the less very exciting. With the new version of SQL Server Management Studio (SSMS) 17.2 You now have the option to use Azure AD authentication for Universal Authentication with Multi-factor authentication (MFA) enabled, by that I mean use a login via SSMS that is enabled for MFA where below I will show you the two step verification using a push notification to my iPhone. (Yes iPhone I love it)

Download SSMS 17.2 from this link. https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms

Once installed you will see new Authentication options, the option that I want is the one highlighted below – “Active Directory – Universal with MFA support”

Click through for a demo of this.  I wonder if (when?) something like this comes to on-prem, maybe in conjunction with a third-party multi-factor authentication service.

Comments closed

Azure Archive Blob Storage

James Serra talks about a new tier of blob storage:

Last year Microsoft introduced Azure Cool Blob storage, which cost customers a penny per GB per month in some Azure regions.  Now, users have another, lower-cost option in Azure Archive Blob Storage, along with new Blob-Level Tiering data lifecycle management capabilities.  So there are now three Azure blog storage tiers: Hot, Cool, and Archive.

Azure Archive Blob Storage costs 0.18 cents per GB per month when the service is delivered through its cloud data center in the East US 2 (for comparison, in the same region hot is 1.8 cents and cool is 1.0 cents per GB per month) .  Customers can expect a 99 percent availability SLA (service level agreement) when the service makes its way out of the preview stage.

This is Azure’s response to AWS Glacier.  The immediate sticker price is a bit higher, but if there aren’t any incremental costs associated with deletion, uploading, or retrieving files, then it could end up matching Glacier in TCO.

Comments closed