2017-05-16 – Curated SQL

The new Visual Studio 2017 has built-in support for programming in R and Python. For older versions of Visual Studio, support for these languages has been available via the RTVS and PTVS add-ins, but the new Data Science Workloads in Visual Studio 2017 make them available without a separate add-in. Just choose the “Data Science and analytical applications” option during installation to install everything you need, including Microsoft R Client and the Anaconda Python distribution.

I’m personally going to wait a little bit before jumping onto Visual Studio 2017, but I’m glad that RTVS is now available.

Comments closed

What’s New In Hadoop 3.0?

Published 2017-05-16 by Kevin Feasel

Shubham Sinha explains some of the changes coming to Hadoop:

Integrating EC with HDFS can maintain the same fault-tolerance with improved storage efficiency. As an example, a 3x replicated file with 6 blocks will consume 6*3 = 18 blocks of disk space. But with EC (6 data, 3 parity) deployment, it will only consume 9 blocks (6 data blocks + 3 parity blocks) of disk space. This only requires the storage overhead up to 50%.

Since Erasure coding requires additional overhead in the reconstruction of the data due to performing remote reads, thus it is generally used for storing less frequently accessed data. Before deploying Erasure code, users should consider all the overheads like storage, network and CPU overheads of erasure coding.

Now to support the Erasure Coding effectively in HDFS they made some changes in the architecture. Lets us take a look at the architectural changes.

There are some nice features coming to Hadoop version 3.

Comments closed

Optimizing Kafka

Published 2017-05-16 by Kevin Feasel

Yeva Byzek explains different tuning options available within Apache Kafka:

Without needing to make any changes to Kafka configuration parameters, you can setup a development Kafka environment and test basic functionality. Yet the fact that Kafka runs straight off the shelf does not mean you won’t want to do some tuning before you go into production. The reason to tune is that different use cases will have different sets of requirements that will drive different service goals. To optimize for those service goals, there are Kafka configuration parameters that you should change. In fact, the Kafka design itself provides configuration flexibility to users, and to make sure your Kafka deployment is optimized for your service goals, you absolutely should investigate tuning the settings of some configuration parameters and benchmarking in your own environment. Ideally, you should do that before you go to production, or at least before you scale out to a larger cluster size.

We have written a white paper to help you identify those service goals, configure your Kafka deployment to optimize for them, and ensure that you are achieving them through monitoring.

Read the whole thing, especially the part about throughput versus latency.

Comments closed

Why SQL On Linux

Published 2017-05-16 by Kevin Feasel

David Klee explains the benefits of SQL Server on Linux:

First and foremost (IMHO), Microsoft wants to appeal to developers. They want their development stack to run anywhere there are developers. Notably, Microsoft just released Visual Studio 2017 for Mac on May 10th! Many developers out there run on non-Microsoft workstations, notably Apple computers. Apple’s OSX operating system is originally derived from the FreeBSD operating system. FreeBSD and other *BSD operating systems share much in common with Linux. So, if you can make SQL Server work on the Apple, you’ve probably made it work on Linux. Arguably, covering these two platforms nails just about every widely adopted development platform out there.

Microsoft also wants to appeal to a broader customer base, which means exploring the other environments that software runs on. An exceptionally high number of the world’s servers are powered by Linux. It’s lean, mean, stable, and powerful. Lots of shops refuse to run a Windows-based server because of a number of reasons, including that their in-house IT staff only have Linux knowledge. These same shops are most likely pressured to run a SQL Server for various applications. I know a number of third-party vended application that require a SQL Server, and previously if an organization dictated no Windows-based servers, that meant that this application would never be adopted in the organization, no matter how well it would function.

David provides a good explanation and sets up the context behind his upcoming SQL Server on Linux series.

Comments closed

So You Want A FAQ-Bot

Published 2017-05-16 by Kevin Feasel

Steph Locke shows how easy it can be to create a Q&A bot:

Now we need to go to Azure and finish our bot.

Add a new Bot Service. You’ll need to give it a name and set which region you want to host it in. It will then setup everything in the background, and takes a couple of minutes.

Once it is successfully deployed, navigate your bot service and Create an App, making sure to copy and paste the values from the new tab into the interface. Select the Q&A Bot type. It should bring up a poppup that allows you to select your bot from a dropdown.

Bots are fun and worth learning, but that technology is still in its infancy, regardless of whose bot platform you’re using.

Comments closed

Triggers And Memory-Optimized Table Modifications

Published 2017-05-16 by Kevin Feasel

Jack Li troubleshoots a customer issue when trying to modify a memory-optimized table:

Recently I assisted on a customer issue where customer wasn’t able to alter a memory optimized table with the following error

Msg 41317, Level 16, State 3, Procedure ddl, Line 4 [Batch Start Line 35]
A user transaction that accesses memory optimized tables or natively compiled modules cannot access more than one user database or databases model and msdb, and it cannot write to master.

If you access a memory optimized table, you can’t span database or access model or msdb. The alter statement doesn’t involve any other database.

It turns out there was a DDL trigger defined on the instance that wrote data to msdb. Click through for Jack’s repro script. I’d be able to use memory-optimized tables a lot more frequently (to the chagrin of company DBAs, perhaps) if they supported cross-database operations.

Comments closed

Using Azure Cloud Shell

Published 2017-05-16 by Kevin Feasel

Jeffrey Verheul shows off a bit of Azure Cloud Shell:

Connecting to a database
Now that your Cloud Shell is ready to go, you can start using Bash. This means you can also use sqlcmd from within Bash.

You can connect to a database with sqlcmd, by using the following command:

sqlcmd -S servername.database.windows.net -U username -P password
Once the connection to your database has been made, you can run queries against it.

There’s no Powershell support yet, but Bash is currently supported and Powershell is in the works.

Comments closed

Creating An Azure Database For MySQL

Published 2017-05-16 by Kevin Feasel

Arun Sirpal shows how to create a new MySQL database in Azure:

Here you have the concept of compute units. No such thing as DTUs here but just as confusing.

Compute Units are a measure of CPU processing throughput guaranteed to be available to a single Azure Database for MySQL server. A Compute Unit is a blended measure of CPU and memory resources. In general, 50 Compute Units equate to half-core, 100 Compute Units equate to one core, and 2000 Compute Units equate to twenty cores of guaranteed processing throughput available to your server. I am not going to rehash official documentation on these concepts so I recommend reading https://docs.microsoft.com/en-gb/azure/mysql/concepts-compute-unit-and-storage

Different database product, different metric, it seems. Check out Arun’s post as he walks you through the process step by step.

Comments closed

Azure MySQL Backups

Published 2017-05-16 by Kevin Feasel

Grant Fritchey focuses on an area where Azure’s MySQL Platform as a Service offering really makes sense:

Why Is MySQL Platform as a Service Important?

I am going to answer this question. There are a lot of advantages to creating, using and developing against data storage within a PaaS offering. One of the biggest for me is backups. Microsoft is automatically taking backups of the MySQL databases you create within Azure. These are real, full backups. Microsoft validates the backups. As I write this, you’ll have the ability to restore your entire database, to any point in time, at intervals of five minutes, over a 35 day preceding period. By programming against a MySQL database within Azure, you are gaining protection of the information you’re storing within your database, and you don’t have to do anything to benefit from this. It’s all part of the service.

Read the whole thing.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Day: May 16, 2017

R And Python Support In VS 2017

What’s New In Hadoop 3.0?

Optimizing Kafka

Why SQL On Linux

So You Want A FAQ-Bot

Triggers And Memory-Optimized Table Modifications

Using Azure Cloud Shell

Creating An Azure Database For MySQL

Azure MySQL Backups

Why Is MySQL Platform as a Service Important?