Press "Enter" to skip to content

Curated SQL Posts

Building A Gantt Chart With ggplot2

Sebastian Sauer shows us how to build a gantt chart in R:

Of importance are only TaskPrevious_Evnet and Duration. In addition, we need an overall start date (“2019-03-01” in this case). Each subsequent task is assumed to follow neatly its predecessing event.

Our job is to compute the start date and end date of task given that we know the initial start date and the durations. As said, this procedure is based on the assumption that there is a frictionless and gapless sequence of tasks.

Read on for a code-heavy example.  I’ve always had a soft spot in my heart for gantt charts.

Comments closed

Configuring MSDTC

Ryan Adams gives us the skinny on the Microsoft Distributed Transaction Coordinator:

MSDTC configuration is not as straight forward as you might think.  It’s a different choice if you are using a local MSDTC, clustered MSDTC, on-premises, Azure, Failover Cluster, or Availability Group.  Every one of those variables leads you down a different path with different choices.  The goal of this article is to clarify those choices.

If you think you do not use the MSDTC, you better check again.  I frequently see folks using it that do not realize it.  The most common is Linked Servers.  You got it.  If you are using Linked Servers, you’re using the MSDTC.

You are no longer required to install an MSDTC with a SQL Failover Cluster starting in SQL Server 2008.  That does not mean you do not need an MSDTC configuration.  If you wanted to make your application database highly available, but your application did not use the MSDTC you were being required to install one anyway.  That limitation is what was removed, not that you simply didn’t need a clustered MSDTC.

Read the whole thing.

Comments closed

Using Wireshark

Julia Evans explains how she uses Wireshark:

Usually I use Wireshark to debug networking problems in production. My Wireshark workflow is:

  1. Capture packets with tcpdump (typically something like sudo tcpdump port 443 -w output.pcap
  2. scp the pcap file to my laptop (scp host:~/output.pcap .)
  3. Open the pcap file in Wireshark (wireshark output.pcap)

That’s pretty simple! But once you have a pcap file with a bunch of packets on your laptop, what do you do with it?

Wireshark is my go-to tool for diagnosing networking issues.

Comments closed

SQL Server 2008 (And R2) Nearing End Of Support

The SQL Server team reminds us that those SQL Server 2008 and 2008 R2 instances are nearing end of support:

SQL Server 2008 and 2008 R2 have had a tremendous run. But all good things come to an end, right? On July 9, 2019, Microsoft will end Extended Support, which means no more updates or support of any kind, potentially leaving you vulnerable to security and compliance issues.

The good news is, you still have plenty of time and options to avoid any heartburn caused by the technology “circle of life.” And we’ll lay out all of those options for you in a webinar on July 12.

Non-contrarian opinion:  get those old SQL Server instances updated.  Life is so much better on the 2012 and later branch.

Contrarian opinion:  core-based licensing had such a major impact on businesses that five years from now, we’ll still see more SQL Server 2008 and 2008 R2 in the wild than we see 2005 today.

Comments closed

Finding Dependencies On Tables

Lori Brown shows us a few methods for finding references to tables:

I use sys.sql_modules to check for references to strings in stored procedures. You might think that this would not be necessary since you can find that in sys.dm_sql_referencing_entities for those. However, what if you do a lot of stuff in stored procedures that uses dynamic SQL to create objects or other stored procs on the fly? sys.dm_sql_referencing_entities might miss those. For example, when I look in sys.sql_modules for all occurrences of Sales.Orders I find more.

This is necessarily an incomplete list, but at least it gets you started.

Comments closed

Package References And ssisUnit

Bartosz Ratajczyk shows us a few methods for setting package references in ssisUnit:

When you set the packages’ references in the ssisUnit tests you have four options for the source (StoragePath) of the package:

  • Filesystem – references the package in the filesystem – either within a project or standalone
  • MSDB – package stored in the msdb database
  • Package store – packages managed by Integration Services Service
  • SsisCatalog – references the package in the Integration Services Catalog

In this post, I will show you how to set the package reference (PackageRef) for each option.

Read on for an example of using each method.

Comments closed

The Hidden Performance Costs Of Collation Mismatch

Nate Johnson explains why you want collation consistency when joining tables together on varchar/nvarchar columns:

There’s a subtle difference here, vs. those many community blog posts, which I’ll repeat.  The columns are of the same type.  Just different collations.

And when the collation on the join predicates is different, bad things happen. Let’s take CustomerNumber for example. On the ERP side, it’s a nvarchar(20) collate Latin1_General_100_CI_AS. On the internal & web apps side, it’s a varchar(20) collateSQL_Latin1_General_CP1_CI_AS. As you might imagine, this is a prime field for joining because it’s the main customer identified throughout all the systems.

Let’s be clear here. This is a numeric value only. Did it need to support Unicode? Absolutely not. Should it have been an int or bigint? Probably. But did The ERP designers choose to make it Unicode string anyway? Yep.

Read on to see how Nate tries to dig himself out of this hole.

Comments closed

Ingesting Google Analytics Data Into Kafka With Python

Bill Ward shows us an example using Python of reading Google Analytics data into a Kafka topic:

Google Analytics is a very powerful platform for monitoring your web site’s metrics including top pages, visitors, bounce rate, etc. As more and more businesses start using Big Data processes, the need to compile as much data as possible becomes more advantageous. The more data you have available, the more options you have to analyze that data and produce some very interesting results that can help you shape your business.

This article assumes that you already have a running Kafka cluster. If you don’t, then please follow my article Kafka Tutorial for Fast Data Architecture to get a cluster up and running. You will also need to have a topic already created for publishing the Google Analytics metrics to. The aforementioned article covers this procedure as well. I created a topic called admintome-ga-pages since we will be collection Google Analytics (ga) metrics about my blog’s pages.

In this article, I will walk you through how to pull metrics data from Google Analytics for your site then take that data and push it to your Kafka Cluster. In a later article, we will cover how to take that data that is consumed in Kafka and analyze it to give us some meaningful business data.

This is a good tutorial and I’m looking forward to part two.

Comments closed

Machine Learning With F#

Diogo Souza gives us an introduction to using Accord.NET in F#:

F# is a scripting as well as a REPL language. REPL comes from Read-Eval-Print Loop, which means that the language processes single steps one at a time like reading the user inputs (usually expressions), evaluating their values and, in the end, returning the result to the same user. All that happens in a loop until the loop ends. Visual Studio provides a great F# Interactive view that runs the scripts in REPL mode and shows the results. Take the following Hello World example:

This code just creates a single variable (let keyword) and assigns a string value to it. When you run this code (select all the code text and press Alt + Enter), you’ll see the following result in the F# Interactive window (Figure 5):

You can also use C# with Accord.NET, but there’s a strong bias toward F# among people in the .NET space who work with ML, for the same reason that there’s a bias toward Scala over Java for Spark developers:  the functional programming paradigm works extremely well with mathematical concepts.  Also, in addition to Accord.NET, you might also want to check out Math.NET.  My experience has been that this package tends to be a bit faster than Accord.

Comments closed

Constrained Optimization In Python: pyomo

Jeff Schecter introduces us to pyomo, a Python package for constrained optimization problems:

Constrained optimization is a tool for minimizing or maximizing some objective, subject to constraints. For example, we may want to build new warehouses that minimize the average cost of shipping to our clients, constrained by our budget for building and operating those warehouses. Or, we might want to purchase an assortment of merchandise that maximizes expected revenue, limited by a minimum number of different items to stock in each department and our manufacturers’ minimum order sizes.

Here’s the catch: all objectives and constraints must be linear or quadratic functions of the model’s fixed inputs (parameters, in the lingo) and free variables.

Constraints are limited to equalities and non-strict inequalities. (Re-writing strict inequalities in these terms can require some algebraic gymnastics.) Conventionally, all terms including free variables live on the lefthand side of the equality or inequality, leaving only constants and fixed parameters on the righthand side.

To build your model, you must first formalize your objective function and constraints. Once you’ve expressed these terms mathematically, it’s easy to turn the math into code and let pyomo find the optimal solution.

I haven’t touched it in a decade, but I did have some success with LINGO for solving the same type of problem.

Comments closed