Press "Enter" to skip to content

Day: March 11, 2019

Accidentally Building a Population Graph

Neil Saunders shares an example of a newspaper headline which ultimately just shows us population sizes:

Some poking around in the NSW Transport Open Data portal reveals how many people enter every Sydney train station on a “typical” day in 2016, 2017 and 2018. We could manipulate those numbers in various ways to estimate total, unique passengers for FY 2017-18 but I’m going to argue that the value as-is serves as a proxy variable for “station busyness”.

When working with spatial data cases, it’s important to differentiate an effect you see because it’s actually unique or interesting versus an effect you see because that’s where all of the people are.

Comments closed

Aspect-Based Sentiment Analysis

Federico Pascual explains aspect-based sentiment analysis and then shows how to implement it with MonkeyLearn:

Imagine you have a large dataset of customer feedback from different sources such as NPS, satisfaction surveys, social media, and online reviews. Some positive, some negative and others that contain mixed feelings. You’d use sentiment analysis to classify the polarity of each text, right? After all, it’s already proven to be a highly efficient tool.

But, what if you wanted to pick customer feedback apart, hone in on the details, get down to the nitty-gritty of each review for a more accurate analysis of your customers’ opinions?

Cue aspect-based sentiment analysis (ABSA). A text analysis technique that breaks down text into aspects (attributes or components of a product or service) and allocates each one a sentiment level. This technique can help businesses become customer-centric, which means putting their customers at the heart of everything they do. It’s about listening to their customers, understanding their voice, analyzing their feedback and learning more about customer experiences, as well as their expectations for products or services.

Click through for the demo.

Comments closed

Azure Data Studio and T-SQL Notebooks

Constantine Kokkinos takes us through the preview of T-SQL notebooks in Azure Data Studio:

I have been waiting for word about the new Notebook functionality in Azure Data Studio, and when I heard it was available in the insider build, I jumped in to take a look.

Jupyter Notebook is a web application that allows you to host programming languages, run code (often with different programming languages), return results, annotate your data, and importantly, share the source controlled results with your colleagues.

This is an exciting addition; SQL is a great language to combine with notebooks given the exploratory nature of the language. I’m going to wait until it’s officially out before diving too far into it, though.

Comments closed

Querying Cosmos DB From SQL Server

Jovan Popovic shows how you can use the Cosmos DB ODBC driver to perform OPENROWSET queries against Cosmos DB:

Now you need to install ODBC Driver for CosmosDB on the computer where you have SQL Server installed. I’m using Microsoft Azure Cosmos DB ODBC 64-bit.msi for 64-bit Windows – 64-bit versions of Windows 8.1 or later, Windows 8, Windows 7, Windows Server 2012 R2, Windows Server 2012, and Windows Server 2008 R2.

Once you install this driver, you should setup ODBC source in system DSN and test the connection:

If you’re running SQL Server 2019, you can follow Jovan’s first two steps and then create an external data source and table with PolyBase to get to the same results.

Comments closed

Updating SQL Server on Linux Docker Images

Max Trinidad shows us how you can make a change to the default SQL Server container and save it for your own purposes:

The “docker commit …” command, you’ll provide both the image-name (all lowercase) and a TAG name (uppercase allowed). You can be creative in having an naming conversion for you images repositories.

It’s very important to save images after doing the commit. I found out that having an active container would be useless without an image.  As far as I know, I haven’t found a way to rebuild an image from an existing container if the image was previously removed.

Max has a full demo, including installing various tools and programs as well as tips on how to minimize the pain.

Comments closed

Finding Three-Part or Four-Part Names in SQL Server

Louis Davidson shows how we can find three-part or four-part naming in T-SQL code:

In order to make this work, one of the considerations is to eliminate cross database dependencies, as you can’t reference objects that don’t exist in views, and even in stored procedures, which offer delayed resolution of objects, you can’t test the code without the database it is referencing.

In addition, and somewhat more important to the process, is dealing with three part names that reference the name of the database your object is in. During the comparison process, the database can be created with a name that is different from your target database to compare to (referred to as a shadow database.) So if you are in database X and have references to X.schema.table, but the database is generated as X_Shadow, the X. is now a cross database reference rather than the local reference you are desiring.

Four part names to linked servers are a different sort of nightmare, but one that is (hopefully) exceedingly rare. The queries presented will help with this as well.

Louis has a few scripts to help you find these. If your code is in source control already, you could also build a regular expression to search through it.

Comments closed

Azure Blob Storage and Data Lake Storage Gen2

Melissa Coates shows what you need to know about Azure Blob Storage with Azure Data Lake Storage Gen2:

– You may need to consider separate storage accounts if you need to segregate access control (RBAC), virtual networks, access keys, and the like. (Note that RBAC can also be set at the container level too, but ACL type permissions only apply to ADLS Gen2 and not to blob storage.)
– If you don’t need the hierarchical namespace whatsoever (for non-analytical use cases), this could mean a separate storage account. The storage cost is the same but transaction costs are higher when the HNS is enabled (discussed in item #8 of this post).

Click through for more details, including several more tips about Azure Storage Accounts, Azure Blob Storage Containers, and the Azure Storage Blobs themselves.

Comments closed

Passing Messages to Azure Service Bus via Data Factory

Rayis Imayev shows how we can use Logic Apps to let Azure Data Factory send messages to Azure Service Bus:

Summary:
1) Azure Data Factory and Service Bus can find common grounds to communicate with each other, and Azure Logic Apps could serve as a good mediator to establish this type of messaging communication.
2) As soon as messages land in a service bus queue, it’s now a responsibility of recipient side to obtain and process those message, which may be part of another blog post.

Click through for a demo of the process.

Comments closed