Press "Enter" to skip to content

Category: Cloud

Issue with PolyBase and Cosmos DB

I found an issue with connecting to Cosmos DB from PolyBase after installing SQL Server 2019 CU2:

After upgrading to SQL Server 2019 CU2, I noticed some issues when trying to connect to a Cosmos DB collection via PolyBase. Specifically, I started getting the following error message:

Msg 105082, Level 16, State 1, Line 35
105082;Generic ODBC error: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: Server at <<my Cosmos account name>>.documents.azure.com:10255 reports wire version 2, but this version of libmongoc requires at least 3 (MongoDB 3.0) (Error Code: 15) Additional error <2>: ErrorMsg: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: Server at <<my Cosmos account name>> .documents.azure.com:10255 reports wire version 2, but this version of libmongoc requires at least 3 (MongoDB 3.0) (Error Code: 15), SqlState: HY000, NativeError: 110 .

Read on for a couple attempts at a solution and some more detail.

Comments closed

Cosmos DB Changes

Hasan Savran hits on a few changes to Cosmos DB:

Free Tier of Cosmos DB
     Couple of big changes came from Azure Cosmos Db team this week. First one is, free tier of Cosmos DB. In free tier, you get the first 400 Request Units and 5 GB of storage for free for the lifetime of the account. This is a great deal if you like to use Azure Cosmos DB in your new project. Give it a try, It’s free! All you need to do is click on Apply for Apply Free Tier Discount.

This is a big one, but there are a few other interesting changes in there as well.

Comments closed

Transforming JSON to CSV with Azure Data Factory

Rayis Imayev shows how you can use the Flatten task in Azure Data Factory to convert JSON text to CSV:

What this new task does it helps to transform/transpose/flatten your JSON structure into a denormalized flatten datasets that you can upload into a new or existing flat database table. 

I like the analogy of the Transpose function in Excel that helps to rotate your vertical set of data pairs (name : value) into a table with the column names and values for corresponding objects. And when this vertical JSON structural set contains several similar sets (array) then ADF Mapping Data Flows Flatten does a really good job by transforming it into a table with several rows (records).

Click through for a demonstration.

Comments closed

ADF.PROCFWK 1.1 Released

Paul Andrew has a new name for the metadata-driven framework in Azure Data Factory:

Hi data friends! Version 1.1 of my ADF.procfwk is ready!

Following the great response I got to the version 1.0 blog series I decided to push ahead and get version 1.1 out there as soon as possible. The main thing I’ve addressed with this release is the big problem of having hard coded service principal credentials in the body of the Azure Function request that called our execution pipelines. Hopefully we can all agree this was/is not great for a number of reasons.

Read on for more details including a detailed changelog, and check out the GitHub repo.

Comments closed

Network Security Changes Around Azure SQL DB

Rohit Nayak announces some changes to Azure SQL Database’s connectivity and network security:

Now in general availability, Private Link enables users to have private connectivity from a Microsoft Azure Virtual Network to Azure SQL Database.

This feature creates a private endpoint which maps a private IP Address from the Virtual Network to your Azure SQL Database.

From security perspective, Private Link provides you with data exfiltration protection on the login path to SQL Database. Additionally, it does not require adding of any IP addresses to the firewall on Azure SQL Database or changing the connection string of your application.

Private Link is built on best of class Software Defined Networking (SDN) functionality from the Azure Networking team. Clients can connect to the Private endpoint from within the same Virtual Network, peered Virtual Networking the same region, or via VNet-to-VNet connection across regions. Additionally, clients can connect from on-premises using ExpressRoute, private peering, or VPN tunneling. More information can be found here

Click through to see what else they’ve been working on.

Comments closed

Getting Started with Azure Cognitive Search

Matt How introduces us to Azure Cognitive Search:

Intuitive and powerful search technologies are becoming more and more important as businesses look to get more value from their unstructured data. Having the ability to full text search across an entire organisation’s worth of files can present huge opportunities for efficiency and understanding. Modern search tools now offer Artificial Intelligence (AI) capabilities that allow value driven enrichment of the raw content using Machine Learning and Data Science techniques. Microsoft’s Azure Cognitive Search product is a leader in this space and offers an excellent search experience with many out-of-the-box AI competencies.

Click through for an overview and a demo.

Comments closed

Oracle’s Automatic Workload Repository Explained

Kellyn Pot’vin-Gorman explains to us what the Automatic Workload Repository is:

The Automatic Workload Repository, (AWR) had been around since Oracle 10g and requires the diagnostic and tuning management pack licensing to use all of its features in Oracle’s Enterprise Edition database. Versions before 10.2.0.4 had limited collections vs. the modern reporting schema and every subsequent release of Oracle has added to it’s content, which explains the size increase stored in the objects/number of objects in the SYSAUX tablespace.

By default and since version 11.2.0.4, the AWR retention is 8 days and takes an automatic snapshot once per hour. It’s common for DBAs to up this retention to at least 31 days to capture a month of workload information and these snapshot identifiers can then be used to identify workload intervals for querying and reporting. Oracle can be also be configured to lessen the intervals between snapshots to change the granularity of the AWR reports, or my preference, the DBA or privileged user can take manual snapshots to identify an important beginning or ending of a period.

Kellyn goes into a good amount of detail in this post and, based on the title, promises at least a part 2. Though this could be a History of the World: Part I trick Kellyn is playing on us.

Comments closed

Metadata-Driven ADF Pipelines

Paul Andrew wraps up a series on metadata-driven processing of Azure Data Factory pipelines. Part 3 covers the ADF wrapper necessary for our custom pipelines:

Firstly, to help guide this post below is a mock up of our Data Factory pipelines and activities to show the end goal. Hopefully this view informs how things are going to be connected using what I call a pipeline hierarchy system and how they will work in the overall framework. For our metadata processing framework we can make the following category distinctions about the activities represented:

Grandparent – This is the top level orchestration of our wider data platform solution. Here a scheduled trigger could be connected or processing in our solution grouped into natural areas. Technically this level isn’t required for our processing framework, but I’ve included it as good practice.
Parent – Our parent pipelines primary purpose (try saying that fast 3 times 🙂 ) is to handle the stages of our processing framework. The stages will then be passed off sequentially to our child pipeline using another execute pipeline activity.
Child – At this level in the framework the child is hitting the Azure Function to call the lowest level executors, or the pipelines that we want to actually do the work in our data platform solution. In my previous post I added some example metadata to call pipelines name Stage X-X. These in turn relate to a set of empty place holder pipelines that contained only Wait activities.

Part 4 puts it all together:

For the end to end run of the framework we have a few options to see progress once its been triggered. Before that its worth pointing out that in the below I’ve used the sample metadata provided with the database scripts in GitHub. Then for each execution pipeline I’ve added a single Wait activity with a random time delay of a few seconds. The point here is to test the framework execution, not the pipelines being called. To further clarify, the Stage X-X pipelines names should be replaced with your actual pipeline names in your data platform solution.

Definitely worth the read.

Comments closed

Syncing Mobile Apps via Change Tracking

Davide Mauri shows how we can perform data synchronization using change tracking in Azure:

Sending data from the cloud to the app is way more tricky. You want to do it in the most efficient way, to spare bandwidth and device battery life, so you need a way to know what has changed since the last time that specific user and device synced. As data is surely stored in a database of some sort, you also need some efficient method on the database side to make sure you can quickly get everything that is new or changed and that is in the scope for that specific user/device. If your mobile application is successful, this means that you may literally have millions and millions of rows or documents to scan and check for changes.

Not an easy task: all hope is lost then? Just send back the whole data set and that’s it? Of course not! We don’t want to just be developers, but better developers, right?

Modern databases can help a lot in tackling this challenge. Azure SQL, for example, has a feature called Change Tracking that, guess what?, will take care of keeping track of changes for you.

Davide includes a lot of detail and even a sample application on GitHub.

Comments closed

Troubleshooting Azure SQL DB Elastic Jobs

Kate Smith wraps up a series on elastic jobs in Azure SQL Database:

This error means that the Elastic Job Agent cannot connect to the target server(s) because the target has some firewall rules blocking the connection requests.  Indeed – it is required that every target in the target group allows connections from Azure Services in order for Elastic Jobs to work.  To fix this, I go to the target server in the Azure Portal and click on the “Firewalls and virtual networks” item under “Security”.  Next, I toggle the “Allow Azure services” from OFF to ON, and save my changes.  

This has been an interesting series to read through, even though I don’t do much at all with Azure SQL Database.

Comments closed