Month: July 2017

Last week in Part Two I went through how to create named volumes and map them to containers in order to persist data.

However, there is also another option available in the form of data volume containers.

The idea here is that create we a volume in a container and then mount that volume into another container. Confused? Yeah, me too but best to learn by doing so let’s run through how to use these things now.

Read through for the step-by-step description.

Comments closed

Spell Checking With Visual Studio Code

Published 2017-07-06 by Kevin Feasel

Andy Levy shows how to create a custom dictionary for a programming language in Visual Studio Code:

But as you can see from the marketplace page there, by default this plugin doesn’t know PowerShell. In my user settings file settings.json, I added PowerShell to the cSpell.enabledLanguageIds section so it’s always recognized:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

"cSpell.enabledLanguageIds": [

        "c",

        "cpp",

        "csharp",

        "go",

        "javascript",

        "javascriptreact",

        "json",

        "latex",

        "markdown",

        "php",

        "plaintext",

        "powershell",

        "python",

        "text",

        "typescript",

        "typescriptreact",

        "yml",

        "powershell"

    ],

And with that, VSCode was giving me green squiggles under lots of words – both misspelled and not. Code Spellchecker doesn’t understand PowerShell in its default setup, it doesn’t have a dictionary for it. Just to get things started, I added a cSpell.userWords section to my settings.json and the squiggles started disappearing.

It’s an interesting post, so read the whole thing.

Comments closed

Monitoring On Linux

Published 2017-07-06 by Kevin Feasel

Steven Schneider shows how Microsoft’s SQLCAT monitors SQL Server on Linux:

The following solutions were tested:

Graphing with Grafana and Graphite

Collection with collectd and Telegraf

Storage with Graphite/Whisper and InfluxDB

We landed on a solution which uses InfluxDB, collectd and Grafana. InfluxDB gave us the performance and flexibility we needed, collectd is a light weight tool to collect system performance information, and Grafana is a rich and interactive tool for visualizing the data.
In the sections below, we will provide you with all the steps necessary to setup this same solution in your environment quickly and easily. Details include step-by-step setup and configuration instructions, along with a pointer to the complete GitHub project.

I’ve been a big fan of Grafana since Hortonworks introduced it as the primary monitoring tool in HDP 2.5. We use Grafana extensively for monitoring SQL on Windows and SQL on Linux.

Comments closed

Fixing Power Settings With T-SQL

Published 2017-07-06 by Kevin Feasel

Randolph West shows how to use T-SQL and xp_cmdshell to switch a server’s power settings from Balanced to High Performance:

Windows has the same setting. It’s in the Power Options under Control Panel, and for all servers, no matter what, it should be set to High Performance.

Here’s a free T-SQL script I wrote that will check for you what the power settings are. We don’t always have desktop access to a server when we are checking diagnostics, but it’s good to know if performance problems can be addressed by a really simple fix that doesn’t require the vehicle to be at rest.

(The script also respects your settings, so if you had xp_cmdshell disabled, it’ll turn it off again when it’s done.)

Click through for the script.

Comments closed

Migrating To In-Memory OLTP

Published 2017-07-05 by Kevin Feasel

Erin Stellato is kicking the hornet’s nest again; this time it’s about In-Memory OLTP:

In the past few months I’ve had several clients reach out about migrating to In-Memory OLTP solutions. When the second or third request rolled in I remember thinking, “Finally!” As in, I’ve been wondering why businesses haven’t been looking to implement In-Memory sooner. In fact, I added a section on In-Memory to our IEPTO2 course because with the availability of the feature in SQL Server 2016 SP1 (yes, there are memory limitations, but it’s a good place to start) I figured we would see an uptick in interest. But here we are, half way through 2017 and over 6 months since the release of SQL Server 2016 SP1, and I still see a lot of hesitation around it.

I wrote a post over on SQLPerformance last week, Testing DML Statements for In-Memory OLTP, and that generated some discussions on Twitter. So I figured it was time for a post to find out what’s holding companies back. This isn’t a true poll – it’s a fill-in-the-blank. As in: post a comment. If you have considered migrating to In-Memory and then didn’t, I want to understand why. I recognize there are limitations – the product is still new and it’s evolving. But perhaps if we understand the largest inhibitors to migration we can help move them up on Microsoft’s list via Connect and other pathways. Understand I am asking for specifics here, for example: we can’t use In-Memory tables because they don’t support spatial data types. Whatever the reason, share it via a comment.

If I were to take a wild guess, the most common answers will be something like:

Not using SQL Server 2014 EE or later, or any edition of 2016 SP1 or later
Limitations in what memory-optimized tables provide: can’t go cross-database, can’t create useful constraints, etc.
Syntax troubles, particularly in 2014: no outer joins, etc.
Difficulties fitting this into a legacy system: it’s not just as simple as drop-and-replace tables due to limitations above. Also, due to size limits (none of that NVARCHAR(MAX) business), candidate tables might need to be broken up or restructured so that they fit the mold.

I like using memory-optimized tables where I can, but have had much more success with memory-optimized table-valued parameters.

Comments closed

DevOps And The DBA

Published 2017-07-05 by Kevin Feasel

Kellyn Pot’Vin-Gorman explains that DevOps does not signal the end of the DBA:

As I travel to multiple events focused on numerous platforms the database is crucial to, I’m faced with peers frustrated with DevOps and considerable conversation dedicated to how it’s the end of the database administrator. It may be my imagination, but I’ve been hearing this same story, with the blame assigned elsewhere- either its Agile, DevOps, the Cloud or even a latest release of the actual database platform. The story’s the same- the end of the Database Administrator.

The most alarming and obvious pain point of this, is that in each of these scenarios, the result was the Database Administrator a focal point in the end more so than they were when it began. When it comes to DevOps, the specific challenges of the goal needed the DBA more so than any of these storylines. As development hurdled top speed to deliver what the business required, the DBA and operations as a whole, delivered the security, the stability and the methodologies to build automation at the level that the other groups simply never needed previously.

This is a useful rejoinder to fears of imminent job loss.

Comments closed

How Kafka Does Exactly-Once

Published 2017-07-05 by Kevin Feasel

Neha Narkhede explains exactly-once messaging and describes how Kafka implements their exactly-once process:

In a distributed system, the computers that make up the system can always fail independently of one another. In the case of Kafka, an individual broker can crash, or a network failure can happen while the producer is sending a message to a topic. Depending on the action the producer takes to handle such a failure, you can get different semantics:

At least once semantics: if the producer receives an acknowledgement (ack) from the Kafka broker and acks=all, it means that the message has been written exactly once to the Kafka topic. However, if a producer ack times out or receives an error, it might retry sending the message assuming that the message was not written to the Kafka topic. If the broker had failed right before it sent the ack but after the message was successfully written to the Kafka topic, this retry leads to the message being written twice and hence delivered more than once to the end consumer. And everybody loves a cheerful giver, but this approach can lead to duplicated work and incorrect results.
At most once semantics: if the producer does not retry when an ack times out or returns an error, then the message might end up not being written to the Kafka topic, and hence not delivered to the consumer. In most cases it will be, but in order to avoid the possibility of duplication, we accept that sometimes messages will not get through.
Exactly once semantics: even if a producer retries sending a message, it leads to the message being delivered exactly once to the end consumer. Exactly-once semantics is the most desirable guarantee, but also a poorly understood one. This is because it requires a cooperation between the messaging system itself and the application producing and consuming the messages. For instance, if after consuming a message successfully you rewind your Kafka consumer to a previous offset, you will receive all the messages from that offset to the latest one, all over again. This shows why the messaging system and the client application must cooperate to make exactly-once semantics happen.

Read on for a discussion of technical details. I appreciate how Neha linked to a 60+ page design document as well, for those wanting to dig into the details.

Comments closed

Connecting Hadoop To LDAP

Published 2017-07-05 by Kevin Feasel

Giovanni Lanzani walks through some of the difficulties of getting LDAP working with Hadoop:

This section could probably could have much less workarounds if I’d knew more about LDAP.

But I’m a data scientist at heart and I want to get things done.

If you ever dealt with Hadoop, you know that there are a bunch of non-interactive users, i.e. users who are not supposed to login, such as hdfs, spark, hadoop, etc. These users are important to have. However the groups with the same name are also important to have. For example when using airflow and launching a spark job, the log folders will be created under the airflow user, in the spark group.

LDAP, however, doesn’t allow you, to my knowledge, to have overlapping user/groups, as Unix does.

The way I solved it was to create, in LDAP, the spark_user (or hdfs_user or …) to work around this limitation.

Also, Giovanni apparently lives in an interesting neighborhood.

Comments closed

Neural Nets With R And Power BI

Published 2017-07-05 by Kevin Feasel

Leila Etaati continues her series on using neural nets in Power BI:

we are going to predict the concrete strength using neural network. neural network can be used for predict a value or class, or it can be used for predicting multiple items. In this example, we are going to predict a value, that is concrete strength.

I have loaded the data in power bi first, and in “Query Editor” I am going to write some R codes. First we need to do some data transformations. As you can see in the below picture number 2,3 and 4,data is not in a same scale, we need to do some data normalization before applying any machine learning. I am going to write a code for that (Already explained the normalization in post KNN). So to write some R codes, I just click on the R transformation component (number 5).

There’s a lot going on in this demo; check it out.

Comments closed

Loan Chargeoff Templates

Published 2017-07-05 by Kevin Feasel

Ajay Jagannathan announces a couple new Cortana Intelligence Solutions Gallery templates:

For more information, read this blog: End to End Loan ChargeOff Prediction Built Using Azure HDInsight Spark Clusters and SQL Server 2016 R Service

We have published two solution templates deployable using two technology stacks for the above chargeoff scenario:-

Loan Chargeoff Prediction using SQL Server 2016 R Services – Using DSVM with SQL Server 2016 and Microsoft ML, this solution template walks through how to create and clean up a set of simulated data, use 5 different models to train, select the best performant model, perform scoring using the model and save the prediction results back to SQL Server. A PowerBI report connects to the prediction table and show interactive reports with the user on the chargeoff prediction.
Loan Chargeoff Prediction using HDInsight Spark Clusters – This solution demonstrates how to develop machine learning models for predicting loan chargeoff (including data processing, feature engineering, training and evaluating models), deploy the models as a web service (on the edge node) and consume the web service remotely with Microsoft R Server on Azure HDInsight Spark clusters. The final predictions is saved to a Hive table which could be visualized in Power BI.

These tend to be nice because they show you how the different pieces of the Azure stack tie together.

Comments closed