Press "Enter" to skip to content

Curated SQL Posts

Creating An Azure Chat Bot

Dustin Ryan shows how to build a QnA bot:

After you’ve created your knowledge base you can then edit and update your knowledge base. There’s a few different ways to update your knowledge.

a. Manually edit the knowledge base directly within QnAMaker.ai. You can do this by directly editing the questions by modifying the text of your knowledge base.

b. Edit the source of your knowledge base. Click the Settings tab on the left to edit the URL of your FAQs or upload a new document.

Building a bot is pretty easy, and Dustin shows you just how to do it.

Comments closed

Data Lake Archive Tier

Ust Oldfeld looks at an important part of a data lake:

The Archive access tier in blob storage was made generally available today (13th December 2017) and with it comes the final piece in the puzzle to archiving data from the data lake.

Where Hot and Cool access tiers can be applied at a storage account level, the Archive access tier can only be applied to a blob storage container. To understand why the Archive access tier can only be applied to a container, you need to understand the features of the Archive access tier. It is intended for data that has no or low SLAs for availability within an organisation and the data is stored offline (Hot and Cool access tiers are online). Therefore, it can take up to 15 hours for data to be made online and available. Brining Archive data online is a process called rehydration (fitting for the data lake). If you have lots of blob containers in a storage account, you can archive them and rehydrate them as required, rather than having to rehydrate the entire storage account.

Read on for more details, including a pattern for archiving data lake data.

Comments closed

R In Linux For Windows

David Smith shows how to install and use R in the Windows Subsystem for Linux:

R has been available for Windows since the very beginning, but if you have a Windows machine and want to use R within a Linux ecosystem, that’s easy to do with the new Fall Creator’s Update (version 1709). If you need access to the gcc toolchain for building R packages, or simply prefer the bash environment, it’s easy to get things up and running.

Once you have things set up, you can launch a bash shell and run R at the terminal like you would in any Linux system. And that’s because this is a Linux system: the Windows Subsystem for Linux is a complete Linux distribution running within Windows. This page provides the details on installing Linux on Windows, but here are the basic steps you need and how to get the latest version of R up and running within it.

Click through for a quick tutorial.

Comments closed

Data File Migration With Minimal Downtime

Nate Johnson weaves a yarn around moving from one storage system to another with minimal downtime:

Our ERP database has been chosen by the IT gods to get moved to the shiny new flash storage array, off the old spinning-rust SAN.  This is fantastic news for the business users.  But lo, the executives warn us, “You must do this with no downtime!” (said in my best Brent Ozar PHB-imitation voice).  Of course when we tell them that’s impossible, they say, “OK, you must do this with minimal downtime.”  That’s mo’ betta’.

So what are our typical options for doing a database migration?  Or, more specifically, a data file migration.  See, we’re not moving to a new server, and we’re not moving a bunch of databases together; we’re just moving this one ERP database.  And we’re keeping it on the same SQL instance, we’re just swapping the storage underneath.

Click through for some discussion on options, followed by implementation of a particular strategy.

1 Comment

A Hack For Dynamic ML Services Result Sets

Dave Mason has put together a solution to his dynamic data frame naming problem:

We can take those names and R types, string them together, and “convert” them to SQL data types. (Mapping data types from one language to another is waaaay outside the scope of this post. Lines 11-13 are quick and dirty, just for demonstration purposes. Okie dokie?)

It’s certainly a less than ideal solution, but it does the job.  And I hope as well that someday this functionality ends up being something built into the product.

Comments closed

Decrypting Always Encrypted Columns In SSMS

Monica Rathbun shows how to view Always Encrypted data within Management Studio:

Viewing decrypted data within SQL Server Management Studio (SSMS) is very easy. SSMS uses .NET 4.6 and the modern SQL Server client, so you can pass in the necessary encryption options. SSMS uses the connection string to access the Master Key and return the data in its decrypted format.

First create a new SQL Connection and Click Options to expand the window.

Then go to the Additional Connections Parameters Tab of the login window and simply type column encryption setting = enabled. Then choose Connect.

Click through to see the whole demo.

Comments closed

String Formatting With Powershell

Thomas Raynor provides a quick tip for string formatting in Powershell:

And, for some reason, instead of the default output which is formatted like a table, I want output presented like this.

This is a silly example, but notice that even though there are extensions of varying length (.ps1 and .dll are four characters including the dot, and .xlsx is five), all of the “file extension: <number>” is aligned.

How’d I do that?

Read on to learn how.

Comments closed

Blockchain For Business Notes

Allison Tharp has some notes on an edX course entitled Blockchain for Business.  This looks like it will be a multi-part series.  Part one:

distributed ledger is a data structure that is spread across multiple computers (which are usually spread across locations or regions).  Distributed ledger technologies have three basic components:

  • A data model to capture the current state of the ledger
  • A language of transactions to track the changes in the ledger state
  • A protocol that builds consensus among participants around which the transactions can be accepted

In other words, we can think of a distributed ledgers as databases which are shared among peers and do not rely on any central authority or intermediary.  Instead of having a central database, every participant has their own copy which stays in sync via the pre-established protocol.  Each participant verifies transactions and speaks a common language to ensure universal agreement on the state of the ledger.

Part two:

Another consensus algorithm is called the Proof of Stake algorithm.  With this algorithm, the nodes are known as validators and instead of mining the blockchain, they validate the transactions to earn a transaction fee.  Instead of creating new coins (as is the case in Bitcoin), all of the coins exist from the very beginning.  Another way to look at this is that the nodes are randomly selected to validate blocks.  The likelihood of the random selection will depend on how many coins the node holds (this is known as the amount of stake they hold).

Blockchain has gone from wacky idea to interesting business concept over the course of about a decade.  It’ll be interesting to see if it catches on to be a vital business concept in the next ten years.

Comments closed

Avro Schemas In Kafka

Stephane Maarek explains the value of using Apache Avro as a schema structure for your Kafka topics:

  • Avro has support for primitive types ( intstringlongbytes, etc…), complex types (enumarraysunions, optionals), logical types (datestimestamp-millisdecimal), and data record (name and namespace). All the types you’ll ever need.

  • Avro has support for embedded documentation. Although documentation is optional, in my workflow I will reject any Avro Schema PR (pull request) that does not document every single field, even if obvious. By embedding documentation in the schema, you reduce data interpretation misunderstandings, you allow other teams to know about your data without searching a wiki, and you allow your devs to document your schema where they define it. It’s a win-win for everyone.

  • Avro schemas are defined using JSON. Because every developer knows or can easily learn JSON, there’s a very low barrier to entry

Read on for more about Avro as well as the possibilities of using other techniques for defining schemas in Kafka.

Comments closed

When Spark Meets Hive

Anna Martin and Rosaria Silipo look at combining HiveQL and SparkQL:

We set our goal here to investigate the age distribution of Maine residents, men and women, using SQL queries. But the question is… on Apache Hive or on Apache Spark? Well, why not both? We could use SparkSQL to extract men’s age distribution and HiveQL to extract women’s age distribution. We could then compare the two distributions and see if they show any difference.

But the main question, as usual, is: Will SparkSQL queries and HiveQL queries blend?

Topic: Age distribution for men and women in the U.S. state of Maine.

Challenge: Blend results from Hive SQL and Spark SQL queries.

Access mode: Apache Spark and Apache Hive nodes for SQL processing.

Using KNIME, the authors are able to blend together data from different sources.

Comments closed