Press "Enter" to skip to content

Month: May 2017

Picking An R Package Name

Marcelo Perlin has fun looking at package names in CRAN:

Looking at package names, one strategy that I commonly observe is to use small words, a verb or noun, and add the letter R to it. A good example is dplyr. Letter d stands for dataframe, ply is just a tool, and R is, well, you know. In a conventional sense, the name of this popular tool is informative and easy to remember. As always, the extremes are never good. A couple of bad examples of package naming are A3, AF, BB and so on. Googling the package name is definitely not helpful. On the other end, packagesamplesizelogisticcasecontrol provides a lot of information but it is plain unattractive!

Another strategy that I also find interesting is developers using names that, on first sight, are completely unrelated to the purpose of the package. But, there is a not so obvious link. One example is package sandwich. At first sight, I challenge anyone to figure out what it does. This is an econometric package that computes robust standard errors in a regression model. These robust estimates are also called sandwich estimators because the formula looks like a sandwich. But, you only know that if you studied a bit of econometric theory. This strategy works because it is easier to remember things that surprise us. Another great example is package janitor. I’m sure you already suspect that it has something do to with data cleaning. And you are right! The message of the name is effortless and it works! The author even got the privilege of using letter R in the name.

Marcelo uses word and character analysis to come up with his conclusions, making this a good way of seeing how to graph and slice data. h/t R-bloggers

Comments closed

Improving Performance For JSON In SQL Server

Bert Wagner shows how to use indexed, non-persisted columns to pre-parse JSON data in SQL Server:

This is basically a cheat code for indexing computed columns.

SQL will only compute the “Make” value on a row’s insert or update into the table (or during the initial index creation) — all future retrievals of our computed column will come from the pre-computed index page.

This is how SQL is able to parse indexed JSON properties so fast; instead of needing to do a table scan and parsing the JSON data for each row of our table, SQL Server can go look up the pre-parsed values in the index and return the correct data incredibly fast.

Personally, I think this makes JSON that much easier (and practical) to use in SQL Server 2016. Even though we are storing large JSON strings in our database, we can still index individual properties and return results incredibly fast.

It’s great that the database engine is smart enough to do this, but I’m not really a big fan of storing data in JSON and parsing it within SQL Server, as that violates first normal form.  If you know you’re going to use Make as an attribute and query it in SQL, make it a real attribute instead of holding multiple values in a single attribute.

Comments closed

Stored Procedures Are Code

Rob Farley hates hearing that stored procedures don’t need to go into source control:

Hearing this is one of those things that really bugs me.

And it’s not actually about stored procedures, it’s about the mindset that sits there.

I hear this sentiment in environments where there are multiple developers. Where they’re using source control for all their application code. Because, you know, they want to make sure they have a history of changes, and they want to make sure two developers don’t change the same piece of code, maybe they even want to automate builds, all those good things.

But checking out code and needing it to pass all those tests is a pain. So if there’s some logic that can be put in a stored procedure, then that logic can be maintained outside the annoying rigmarole of source control. I guess this is appealing because developers are supposed to be creative types, and should fight against the repression, fight against ‘the man’, fight against control.

When I come across this mindset, I worry a lot.

Read on for Rob’s set of worries, and hie thee to the source control repository.  It really doesn’t matter which source control product you use (ideally, the same one that developers use for their app code), just as long as it’s in source control.

Comments closed

Smoother Deployments

Steve Jones talks about how, at a prior company, he & coworkers simplified their deployment processes and their lives:

The lead developer and I both had little children at the time. Spending 12+ hours at work on Wednesday wasn’t an ideal situation for us, and we decided to get better.

The first thing we did was ensure that all code was tracked in VSS. We had most web code here, but there were always a few files that weren’t captured, so we cleaned that up. I also added database code to VSS with the well known, time tested and proven File | Save, File | Open method of capturing SQL code. This took a few months, and some deployment issues, to get everyone in the habit of modifying code in this manner. I refused to deploy code that wasn’t in VSS, and since our CTO was a former developer, I had support.

The other change was the lead developer and I started building a release branch of code each week. We’d move over the changes that were going to be released to this branch, which simplified our process. We could now see exactly which code was being deployed. This was before git and more modern branching strategies, but we were able to easily copy code from the mainline of development to the release branch as we made changes for this week.

It’s a good read.

Comments closed

Azure Cosmos DB

Victoria Holt discusses a new Azure product announcement:

Another database annoucment today from Microsoft. The first globally distributed, multi-model database service. Microsoft describe Azure Cosmos DB as containing a write optimized, resource governed, schema-agnostic database engine that natively supports multiple data models: key-value, documents, graphs, and columnar.

The troll in me really wants this to be SQL Server, but it is instead a very much beefed-up DocumentDB.

Comments closed

Continuous Disintegration

Alex Yates gets on his soapbox about continuous integration:

Everyone does CI wrong!

(OK, perhaps not everyone, but a lot of people.)

Whenever I deliver a conference session about database continuous integration (CI), I like to start by asking a question to the audience. “Who can tell me what continuous integration means?”

I almost always get responses like:

“Automated deployments!”

“Automated builds upon commit!”

Very occasionally someone will impress me with something like:

“Unit tests!” or “Automatically running my unit tests!”

Not bad answers. Have a biscuit. But you are still missing the fundamental point.

Alex makes a number of great points, so check it out.

Comments closed

Narrative Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Narrative Power BI Custom Visual.  The Narrative visual is developed by Narrative Science and it gives you the ability to automatically deliver analysis of your data.  The results look similar to a final report that a analyst might provide after spending weeks with your data.

It’s not going to replace a seasoned analyst (or any analyst at all, frankly) but if you need a couple paragraphs of text summing up a trend, it’s a good start.

Comments closed

The DBA Role Isn’t Dead

Robert Davis discusses shipping database changes and the role of the DBA:

I don’t know what percentage of people out there are doing DevOps, but I’m going to go out on a limb and say that it is most likely some number LESS than MOST of them. I don’t think more than half the companies or people out there that do Ops are doing DevOps. I also believe that DBAs that make really good money aren’t making it because DBAs are rare. They are making it because DBA is a tough job to be really good at and the ones who are really good at it are rare. All DBAs are molded by the environments they work in, but really good DBAs are ones that eventually learn to mold their environments to them.

A normal DBA may say something like, “I do it this way because that is the way it is done here.”

An exceptional DBA says, “We do it that way here because that is the way I have it done.”

Robert makes great points.  I have on my agenda to write a post entitled something like “The Cloud’s Not Going To Steal Our Jobs,” somewhat in the same spirit as this post.  Definitely a must-read.

Comments closed

Baby Steps With Deployments

Riley Major looks at small things you can do to help smooth out deployment processes:

It’s not just about exercising restraint, though. It’s also about taking small additional steps to minimize the impact of new development. If you are making a new stored procedure parameter, can you provide a default so that its omission will make the procedure act as before? If so, you can now deploy that change without affecting any consuming code. If you are changing the name of a column, can you create a computed column with the old name which simply mirrors the original column’s data? All insert- and update-related code would still need to change, but you could save the SELECT statements for the next round.

Ideally, these steps are transitory. Optional parameters can be made mandatory once all of the consuming procedures have been adapted. The legacy-named computed columns should be deleted once all old queries are updated. But recognize that this might not happen. Other priorities come up, and these supposedly temporary constructs can linger.

Click through for additional hints.

Comments closed

Managing Federated Systems

Aaron Bertrand tells how he would work with a federated system back in the SQL Server 2005 days, and ties it back to modern database deployment practices:

The biggest concern I have about database deployment is not about how you deploy, or even whether or not you use source control. It’s that your database changes are backward compatible – meaning they won’t break the current application, in the event the application can’t be changed at the same time (and with distributed software, that’s impossible). The largest number of bugs I had a hand in creating were caused because I assumed the application or middle tiers would be deployed at the same time.

I quickly became very motivated to make sure my changes would work before and after the application tier changes were deployed. Add a parameter to a stored procedure? Make sure it’s at the end, and has a default. Add a column to a table? Make sure views and stored procedures don’t expose it (yet) or require it. And a hundred other examples.

It’s a good read.

Comments closed