Press "Enter" to skip to content

Author: Kevin Feasel

Azure Data Lake Tools For VS Code

Jenny Jiang announces that Azure Data Lake Tools for Visual Studio Code is now generally available:

ADLA Integration

The ADL Tools for VSCode integrate well with ADLA. Azure Data Lake includes the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. U-SQL on ADLA offers Job as a Service with the Microsoft invented U-SQL language. Customers do not have to manage deployment of clusters, but can simply submit their jobs to ADLA, an analytics platform managed by Microsoft.

Click through for the full announcement.

Comments closed

File Management In Containers

Andrew Pruski hows how to copy files into and out from Docker containers:

Last week I was having an issue with a SQL install within a container and to fix I needed to copy the setup log files out of the container onto the host so that I could review.

But how do you copy files out of a container?

Well, thankfully there’s the docker cp command. A really simple command that let’s you copy whatever files you need out of a running container into a specified directory on the host.

I’ll run through a quick demo but I won’t install SQL, I’ll use an existing SQL image and grab its Summary.txt file.

Read on for the demo.

Comments closed

Cardinality Estimation On Memory-Optimzied Table Variables

Jack Li explains that the cardinality estimator works the same for memory-optimized table variables as it does for regular table variables:

In a previous blog, I talked about memory optimized table consumes memory until end of the batch.   In this blog, I want to make you aware of cardinality estimate of memory optimized table as we have had customers who called in for clarifications.  By default memory optimized table variable behaves the same way as disk based table variable. It will have 1 row as an estimate.   In disk based table variable, you can control estimate by using option (recompile) at statement level (see this blog) or use trace flag 2453.

You can control the same behavior using the two approaches on memory optimized table variable if you use it in an ad hoc query or inside a regular TSQL stored procedure.  The behavior will be the same. This little repro will show the estimate is correct with option (recompile).

Jack also explains how this works for natively compiled stored procedures (spoilers:  it doesn’t), so read the whole thing.

Comments closed

Build Your Own Power BI App

Reza Rad shows how to build a Power BI app:

Till now there were about 6 methods of sharing content in Power BI, including:

I have written about some of these already (follow links above), and will write about the rest soon. Yesterday, Microsoft announced preview version of Power BI Apps, which is a new method of sharing. This is an enhancement version of two methods previously: Work spaces, and Content Packs together!

Read on for a step-by-step guide.

Comments closed

Picking An R Package Name

Marcelo Perlin has fun looking at package names in CRAN:

Looking at package names, one strategy that I commonly observe is to use small words, a verb or noun, and add the letter R to it. A good example is dplyr. Letter d stands for dataframe, ply is just a tool, and R is, well, you know. In a conventional sense, the name of this popular tool is informative and easy to remember. As always, the extremes are never good. A couple of bad examples of package naming are A3, AF, BB and so on. Googling the package name is definitely not helpful. On the other end, packagesamplesizelogisticcasecontrol provides a lot of information but it is plain unattractive!

Another strategy that I also find interesting is developers using names that, on first sight, are completely unrelated to the purpose of the package. But, there is a not so obvious link. One example is package sandwich. At first sight, I challenge anyone to figure out what it does. This is an econometric package that computes robust standard errors in a regression model. These robust estimates are also called sandwich estimators because the formula looks like a sandwich. But, you only know that if you studied a bit of econometric theory. This strategy works because it is easier to remember things that surprise us. Another great example is package janitor. I’m sure you already suspect that it has something do to with data cleaning. And you are right! The message of the name is effortless and it works! The author even got the privilege of using letter R in the name.

Marcelo uses word and character analysis to come up with his conclusions, making this a good way of seeing how to graph and slice data. h/t R-bloggers

Comments closed

Improving Performance For JSON In SQL Server

Bert Wagner shows how to use indexed, non-persisted columns to pre-parse JSON data in SQL Server:

This is basically a cheat code for indexing computed columns.

SQL will only compute the “Make” value on a row’s insert or update into the table (or during the initial index creation) — all future retrievals of our computed column will come from the pre-computed index page.

This is how SQL is able to parse indexed JSON properties so fast; instead of needing to do a table scan and parsing the JSON data for each row of our table, SQL Server can go look up the pre-parsed values in the index and return the correct data incredibly fast.

Personally, I think this makes JSON that much easier (and practical) to use in SQL Server 2016. Even though we are storing large JSON strings in our database, we can still index individual properties and return results incredibly fast.

It’s great that the database engine is smart enough to do this, but I’m not really a big fan of storing data in JSON and parsing it within SQL Server, as that violates first normal form.  If you know you’re going to use Make as an attribute and query it in SQL, make it a real attribute instead of holding multiple values in a single attribute.

Comments closed

Stored Procedures Are Code

Rob Farley hates hearing that stored procedures don’t need to go into source control:

Hearing this is one of those things that really bugs me.

And it’s not actually about stored procedures, it’s about the mindset that sits there.

I hear this sentiment in environments where there are multiple developers. Where they’re using source control for all their application code. Because, you know, they want to make sure they have a history of changes, and they want to make sure two developers don’t change the same piece of code, maybe they even want to automate builds, all those good things.

But checking out code and needing it to pass all those tests is a pain. So if there’s some logic that can be put in a stored procedure, then that logic can be maintained outside the annoying rigmarole of source control. I guess this is appealing because developers are supposed to be creative types, and should fight against the repression, fight against ‘the man’, fight against control.

When I come across this mindset, I worry a lot.

Read on for Rob’s set of worries, and hie thee to the source control repository.  It really doesn’t matter which source control product you use (ideally, the same one that developers use for their app code), just as long as it’s in source control.

Comments closed

Smoother Deployments

Steve Jones talks about how, at a prior company, he & coworkers simplified their deployment processes and their lives:

The lead developer and I both had little children at the time. Spending 12+ hours at work on Wednesday wasn’t an ideal situation for us, and we decided to get better.

The first thing we did was ensure that all code was tracked in VSS. We had most web code here, but there were always a few files that weren’t captured, so we cleaned that up. I also added database code to VSS with the well known, time tested and proven File | Save, File | Open method of capturing SQL code. This took a few months, and some deployment issues, to get everyone in the habit of modifying code in this manner. I refused to deploy code that wasn’t in VSS, and since our CTO was a former developer, I had support.

The other change was the lead developer and I started building a release branch of code each week. We’d move over the changes that were going to be released to this branch, which simplified our process. We could now see exactly which code was being deployed. This was before git and more modern branching strategies, but we were able to easily copy code from the mainline of development to the release branch as we made changes for this week.

It’s a good read.

Comments closed

Azure Cosmos DB

Victoria Holt discusses a new Azure product announcement:

Another database annoucment today from Microsoft. The first globally distributed, multi-model database service. Microsoft describe Azure Cosmos DB as containing a write optimized, resource governed, schema-agnostic database engine that natively supports multiple data models: key-value, documents, graphs, and columnar.

The troll in me really wants this to be SQL Server, but it is instead a very much beefed-up DocumentDB.

Comments closed

Continuous Disintegration

Alex Yates gets on his soapbox about continuous integration:

Everyone does CI wrong!

(OK, perhaps not everyone, but a lot of people.)

Whenever I deliver a conference session about database continuous integration (CI), I like to start by asking a question to the audience. “Who can tell me what continuous integration means?”

I almost always get responses like:

“Automated deployments!”

“Automated builds upon commit!”

Very occasionally someone will impress me with something like:

“Unit tests!” or “Automatically running my unit tests!”

Not bad answers. Have a biscuit. But you are still missing the fundamental point.

Alex makes a number of great points, so check it out.

Comments closed