Press "Enter" to skip to content

Curated SQL Posts

Bash For The Powershell-Minded

Mark Wilkinson has started a new series on Bash.  His first post is an introduction to the scripting language:

Bash (the Bourne Again Shell) was created in 1989 for the GNU Project as a free replacement for the Unix Bourne shell. Most modern Linux systems use Bash as their default command line shell, so if you have ever dropped to a command line on a Linux system, you have probably used Bash. Just like PowerShell, Bash is both a scripting language and a command shell/interpreter. So not only can you execute commands in an interactive shell session, but you can also write scripts that incorporate multiple commands.

Once you get your hands dirty with Bash you’ll notice a lot of features that were incorporated into PowerShell. Things like command substitution: $(Get-Date) were directly pulled from Bash $(date). Other features will look familiar as well, like the ability to pipe multiple commands together.

One thing you need to understand right away is that Bash is string based, not object based like PowerShell. This means you’ll find yourself doing a lot more string processing to get tasks done. Things like string splitting will be much more common. Bash does support objects, like arrays, but few if any commands output an array. As we go through this series you’ll see that this might not be as limiting as it sounds.

The best part about learning Bash is that you can then get into arguments about Bash vs ksh vs zsh.

Comments closed

Handling Permissions Changes With Powershell

Drew Furgiuele has a process to store and then re-run rights grants on SQL Server databases:

Permission requirements for these environments can change over time, just like the code and data going into your databases. It’s hard to track permissions because a database permission is much more than just a user principal; database objects often contain permission definitions for GRANT and DENY states, and users may belong in certain database roles in one environment, but not another. This isn’t a big deal… until it is: sooner or later your data and code drift will be different than production, or maybe some new change really breaks an environment. Then, you’ll be asked to restore these environments to either an earlier version, or, more likely, you’ll be asked to “refresh” these editions to what is currently in production.

You probably already have a process for this, but how are you handling maintaining differences in permissions between environments? Wouldn’t it be nice if you had a way to quickly evaluate, store, and then re-apply permissions as part of refresh? Even better, wouldn’t it be cool if you could do this for all your databases on a given instance? Or what about all your instances in a given environment?

You can, and you can do it pretty easily with PowerShell.

My one problem with Drew’s otherwise-excellent post is that he approved far too many entry visas in the opening GIF.  100% deny, 0 problems.

Comments closed

Virtualize Data Or Move It?

James Serra contrasts data virtualization with traditional ETL moving data to a warehouse:

Data virtualization integrates data from disparate sources, locations and formats, without replicating or moving the data, to create a single “virtual” data layer that delivers unified data services to support multiple applications and users.

Data movement is the process of extracting data from source systems and bringing it into the data warehouse and is commonly called ETL, which stands for extraction, transformation, and loading.

If you are building a data warehouse, should you move all the source data into the data warehouse, or should you create a virtualization layer on top of the source data and keep it where it is?

Read on for James’s thoughts.

Comments closed

Performance Problems With ADO.Net’s AddWithValue

Dan Guzman inveighs against using AddWithValue in ADO.Net:

The nastiness with AddWithValue is that ADO.NET infers the parameter definition from the supplied object value. Parameters in SQL Server are inherently strongly-typed, including the SQL Server data type, length, precision, and scale. Types in .NET don’t always map precisely to SQL Server types, and are sometimes ambiguous, so AddWithValue has to makes guesses about the intended parameter type.

The guesses AddWithValue makes can have huge implications when wrong because SQL Server uses well-defined data type precedence rules when expressions involve unlike data types; the value with the lower precedence is implicitly converted to the higher type. The implicit conversion itself isn’t particularly costly but is a major performance concern when it is the column value rather than the parameter value must be converted, especially in a WHERE or JOIN clause predicate. The implicit column value conversion can prevent indexes on the column from being used with an index seek (i.e. non-sargable expression), resulting in a full scan of every row in the table or index.

Read the whole thing.

Comments closed

Dealing With Dates In R

Mathew McLean shows how to convert strings to dates using a couple well-known packages and introduces flipTime:

The package flipTime provides utilities for working with time series and date-time data. The package can be installed from GitHub using

1
2
require(devtools)
install_github("Displayr/flipTime")

I will discuss only two functions from the package in this post, AsDate() and AsDateTime(). These are used for the conversion of date and date-time strings, respectively. These functions build on the convenience and speed of the lubridate function. Furthermore, the flipTime functions provide additional functionality (making them easier to use). The functions are smart about identifying the proper format to use. So the user doesn’t need to specify the format(s) as inputs. At the same time, both AsDate() and AsDateTime() are careful to not convert any strings to dates when they are not formatted as dates. Additionally, it will also warn the user when the dates are not in an unambiguous format.

Check it out.

Comments closed

ARIMA In R

Subhasree Chatterjee shows us how to use R to implement an ARIMA model:

Once the data is ready and satisfies all the assumptions of modeling, to determine the order of the model to be fitted to the data, we need three variables: p, d, and q which are non-negative integers that refer to the order of the autoregressive, integrated, and moving average parts of the model respectively.

To examine which p and q values will be appropriate we need to run acf() and pacf() function.

pacf() at lag k is autocorrelation function which describes the correlation between all data points that are exactly k steps apart- after accounting for their correlation with the data between those k steps. It helps to identify the number of autoregression (AR) coefficients(p-value) in an ARIMA model.

ARIMA feels like it should be too simple to work, but it does.

Comments closed

Logs Are For Parsing

Tim Wilde shares an oft-forgotten truth:

How often have you found yourself contemplating some hair-brained regex scheme in order to extract an inkling of value from a string and wishing the data had just arrived in a well-structured package without all the textual fluff?

So why do we insist on writing prose in our logs? Take “Exception while processing order 1234 for customer abc123” for example. There are at least four important pieces of information drowning in that one sentence alone:

  1. An exception was raised!
  2. During order processing
  3. Order number 1234
  4. Customer abc123

Being an exception log message, it’s more than likely followed by a stack trace, too. And stack traces certainly don’t conform to carefully crafted log layout patterns.

Logging is something we tend to forget about and slap in at the last minute.  We also think about it from the viewpoint of a developer looking at a single error message.  Those are both mistakes that lead to a huge amount of extra work later.

Comments closed

KSQL 0.4 Released

Apurva Mehta announces the release of KSQL 0.4:

The SHOW TOPICS command has been enhanced to include the number of active consumers and also the number of active consumer groups which are reading the topics.

Consumer groups are a feature of Apache Kafka which enable multiple consumer processes to divide the work of consuming Kafka topic. You can learn more about them in the Kafka Consumer JavaDocs, and of course you should read the SHOW TOPICS documentation for more information.

Read on for the full set of changes.

Comments closed

Understanding Kafka Consumers And Offsets

Simarpreet Kaur Monga builds a simple Kafka consumer in Scala to demonstrate how offsets work:

The method endOffsets accepts a collection of TopicPartition, for which you want to find the endOffsets.

As I want to find the endOffsets of the partitions that are assigned to my topic, I have passed the value of consumer.assignment() in the parameter of endOffsets. consumer.assignment gives the set of TopicPartitions that the Consumer has been assigned.

Note: You should call the method assignment only after calling poll on the consumer; otherwise, it will give null as the result. Additionally, the method endOffsets doesn’t change the position of the consumer, unlike seek methods, which do change the consumer position/offset.

Read the whole thing.

Comments closed

Displaying Power BI Filter Values

Reid Havens shows how to display Power BI report filter values on the report itself:

The client and I brainstormed, and we decided to create card visuals to identify filter selections. The beautiful thing about DAX, as mentioned in many articles on our site, is that it can easily return text values. I actually did something similar to this with Dynamic Titles when I posted about Power BI’s new Drill Through feature last year, that article can be found here. So, I essentially wanted to do something similar here, but to call out the filter selection for each slicer. The end result looked something like this

The end result was a new section of the report, dedicated to calling out slicer selections. The BIGGEST reason the client wanted this, was for screenshots. They often took screenshots of this report, and pasted it into emails or slides to use in presentations. The result works well, and uses a bit of clever DAX to always return the right selections, no matter the combination of selections among the slicers.

Read on to see a couple odd scenarios that Reid ran into and how to fix them.

Comments closed