Press "Enter" to skip to content

Curated SQL Posts

Why SQL On Linux

David Klee explains the benefits of SQL Server on Linux:

First and foremost (IMHO), Microsoft wants to appeal to developers. They want their development stack to run anywhere there are developers. Notably, Microsoft just released Visual Studio 2017 for Mac on May 10th! Many developers out there run on non-Microsoft workstations, notably Apple computers. Apple’s OSX operating system is originally derived from the FreeBSD operating system. FreeBSD and other *BSD operating systems share much in common with Linux. So, if you can make SQL Server work on the Apple, you’ve probably made it work on Linux. Arguably, covering these two platforms nails just about every widely adopted development platform out there.

Microsoft also wants to appeal to a broader customer base, which means exploring the other environments that software runs on. An exceptionally high number of the world’s servers are powered by Linux. It’s lean, mean, stable, and powerful. Lots of shops refuse to run a Windows-based server because of a number of reasons, including that their in-house IT staff only have Linux knowledge. These same shops are most likely pressured to run a SQL Server for various applications. I know a number of third-party vended application that require a SQL Server, and previously if an organization dictated no Windows-based servers, that meant that this application would never be adopted in the organization, no matter how well it would function.

David provides a good explanation and sets up the context behind his upcoming SQL Server on Linux series.

Comments closed

So You Want A FAQ-Bot

Steph Locke shows how easy it can be to create a Q&A bot:

Now we need to go to Azure and finish our bot.

Add a new Bot Service. You’ll need to give it a name and set which region you want to host it in. It will then setup everything in the background, and takes a couple of minutes.

Once it is successfully deployed, navigate your bot service and Create an App, making sure to copy and paste the values from the new tab into the interface. Select the Q&A Bot type. It should bring up a poppup that allows you to select your bot from a dropdown.

Bots are fun and worth learning, but that technology is still in its infancy, regardless of whose bot platform you’re using.

Comments closed

Triggers And Memory-Optimized Table Modifications

Jack Li troubleshoots a customer issue when trying to modify a memory-optimized table:

Recently I assisted on a customer issue where customer wasn’t able to alter a memory optimized table with the following error

Msg 41317, Level 16, State 3, Procedure ddl, Line 4 [Batch Start Line 35]
A user transaction that accesses memory optimized tables or natively compiled modules cannot access more than one user database or databases model and msdb, and it cannot write to master.

If you access a memory optimized table, you can’t span database or access model or msdb.  The alter statement doesn’t involve any other database.

It turns out there was a DDL trigger defined on the instance that wrote data to msdb.  Click through for Jack’s repro script.  I’d be able to use memory-optimized tables a lot more frequently (to the chagrin of company DBAs, perhaps) if they supported cross-database operations.

Comments closed

Using Azure Cloud Shell

Jeffrey Verheul shows off a bit of Azure Cloud Shell:

Connecting to a database
Now that your Cloud Shell is ready to go, you can start using Bash. This means you can also use sqlcmd from within Bash.

You can connect to a database with sqlcmd, by using the following command:

sqlcmd -S servername.database.windows.net -U username -P password
Once the connection to your database has been made, you can run queries against it.

There’s no Powershell support yet, but Bash is currently supported and Powershell is in the works.

Comments closed

Creating An Azure Database For MySQL

Arun Sirpal shows how to create a new MySQL database in Azure:

Here you have the concept of compute units. No such thing as DTUs here but just as confusing.

Compute Units are a measure of CPU processing throughput guaranteed to be available to a single Azure Database for MySQL server. A Compute Unit is a blended measure of CPU and memory resources. In general, 50 Compute Units equate to half-core, 100 Compute Units equate to one core, and 2000 Compute Units equate to twenty cores of guaranteed processing throughput available to your server. I am not going to rehash official documentation on these concepts so I recommend reading https://docs.microsoft.com/en-gb/azure/mysql/concepts-compute-unit-and-storage

Different database product, different metric, it seems.  Check out Arun’s post as he walks you through the process step by step.

Comments closed

Azure MySQL Backups

Grant Fritchey focuses on an area where Azure’s MySQL Platform as a Service offering really makes sense:

Why Is MySQL Platform as a Service Important?

I am going to answer this question. There are a lot of advantages to creating, using and developing against data storage within a PaaS offering. One of the biggest for me is backups. Microsoft is automatically taking backups of the MySQL databases you create within Azure. These are real, full backups. Microsoft validates the backups. As I write this, you’ll have the ability to restore your entire database, to any point in time, at intervals of five minutes, over a 35 day preceding period. By programming against a MySQL database within Azure, you are gaining protection of the information you’re storing within your database, and you don’t have to do anything to benefit from this. It’s all part of the service.

Read the whole thing.

Comments closed

Sub-Second Hive Analytics

Carter Shanklin and Slim Bouguerra have started a series on using Hive and Druid to obtain sub-second SQL queries over terabytes of data:

We’ll show how the Hive/Druid integration delivers ultra-fast SQL analytics that can be consumed from your favorite BI tool to get accelerated business results.  And we will show benchmark results of BI queries running in just milliseconds over a 1TB dataset.

 WHAT IS DRUID?

Druid is a high-performance, column-oriented, distributed data store, which is well suited for user-facing analytic applications and real-time architectures. Druid is included as a technical preview in HDP 2.6 and you can read more about Druid on our project page, or at the project website.

This first post is mostly about Druid, which sounds like it might eventually become a very interesting technology for implementing Kimball-style warehouse models but for the whole “Joins?  We don’t need no steenkin’ joins” philosophy.  But when used as one engine component (as mentioned in the post), I can see it being quite useful.

Comments closed

Keeping Up With Analytics

Jen Underwood discusses the need to stay relevant in analytics and shares some tips on how to do so:

Although most analytics applications today still leverage older data warehouse and OLAP technologies on-premises, the pace of the cloud shift is significantly increasing. Infrastructure is getting better and is almost invisible in mature markets. Cloud fears are subsiding as more organizations witness the triumphs of early adopters. Instant, easy cloud solutions continue to win the hearts and minds of non-technical users. Cloud also accelerates time to market allowing for innovation at faster speeds than ever before. As data and analytics professionals, be sure to make time to learn a variety of cloud and hybrid analytics tools.

Exploring novel technologies across various ecosystems in the cloud world is usually as simple as spinning up a cloud image or service to get started. There are literally zillions of free and low cost resources for learning. As you dive into a new world of data, you will find common analytics architectures, design patterns, and types of technologies (hybrid connectivity, storage, compute, microservices, IoT, streaming, orchestration, database, big data, visualization, artificial intelligence, etc.) being used to solve problems.

It’s worth reading the whole thing.

Comments closed

Basic Data Tidying

Sarah Dutkiewicz tidies up a data set in R:

Looking at this data, the first thing I thought was untidy. There has to be a better way. When I think of tidy data, I think of the tidyr package, which is used to help make data tidy, easier to work with. Specifically, I thought of the spread() function, where I could break things up. Once data was spread into appropriate columns, I figure I can operate on the data a bit better.

Sarah has also made the data set available in case you’re interested in following along.

Comments closed

Interpreting Regression Coefficients

Steph Locke explains what beta values on parameters in a regression actually signify:

When we read the list of coefficients, here is how we interpret them:

  • The intercept is the starting point – so if you knew no other information it would be the best guess.

  • Each coefficient multiplies the corresponding column to refine the prediction from the estimate. It tells us how much one unit in each column shifts the prediction.

  • When you use a categorical variable, in R the intercept represents the default position for a given value in the categorical column. Every other value then gets a modifier to the base prediction.

Linear regression is easy, but the real value here is Steph’s explanation of logistic regression coefficients.

Comments closed