Press "Enter" to skip to content

Author: Kevin Feasel

Alternatives To Temp Tables In SSIS

Tim Mitchell gives us a few methods for avoiding temp tables in SQL Server Integration Services:

While temp tables are a good option for in-flight data transformation, there are some unique challenges that arise when using temp tables in SSIS.

SQL Server Integration Services uses tight metadata binding for data flow operations. This means that when you connect to a relational database, flat file, or other structure in an SSIS data flow, the SSIS design-time and runtime tools will check those data connections to validate that they exist and that the metadata has not changed. This tight binding is by design, to avoid potential runtime issues arising from unexpected changes to the source or destination metadata.

Because of this metadata validation process, temp tables present a challenge to the SSIS data flow. Since temp tables exist only for the duration of the session(s) using them, it is likely that one of these tables created in a previous step in an SSIS package may not be present when validation needs to occur. During the design of the package (or even worse, when you execute the deployed package in a scheduled process), you could find yourself staring at an “object not found” error message.

It’s good to have alternatives, though there are times when you really just need a temp table.

Comments closed

Taking Screenshots With R

Abdul Majed Raja shows us how to take screenshots of webpages using R:

webshot package provides one simple function webshot() that takes a webpage url as its first argument and saves it in the given file name that is its second argument. It is important to note that the filename includes the file extensions like ‘.jpg’, ‘.png’, ‘.pdf’ based on which the output file is rendered. Below is the basic structure of how the function goes:

library(webshot)

#webshot(url, filename.extension)
webshot(“https://www.listendata.com/”, “listendata.png”)

If no folder path is specified along with the filename, the file is downloaded in the current working directory which can be checked with getwd().

Now that we understood the basics of the webshot() function, It is time for us to begin with our cases – starting with downloading/converting a webpage as a PDFcopy.

This isn’t something I’d expect to do every day, but I could see it being useful as part of a notebook to give the user a sanity check, like if a webpage or data set has a last updated timestamp that you want to check.  H/T R-Bloggers

Comments closed

Native Scoring With SQL Server 2017 R Services

Tomaz Kastrun gives us an example using native scoring in SQL Server 2017 Machine Learning Services:

Native scoring in SQL Server 2017 comes with couple of limitations, but also with a lot of benefits. Limitations are:

  • currently supports only SQL server 2017 and Windows platform

  • trained model should not exceed 100 MiB in size

  • Native scoring with PREDICT function supports only following algorithms from RevoScaleR library:

    • rxLinMod (linear model as linear regression)

    • rxLogit (logistic regression)

    • rxBTrees (Parallel external memory algorithm for Stochastic Gradient Boosted Decision Trees)

    • rxDtree (External memory algorithm for Classification and Regression Trees

    • rxDForest (External memory algorithm for Classification and Regression Decision Trees)

Read on for an example.  If you’re using one of these methods, then native scoring is extremely fast and a bit more flexible than I originally anticipated.  The problem is that you have to use one of those methods.

Comments closed

WVPlots 1.0.0

John Mount announces WVPlots 1.0.0:

Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. We are excited to announce the WVPlots is now at version 1.0.0 on CRAN!

The idea is: we sacrifice some of the flexibility and composability inherent to ggplot2 in R for a menu of prescribed presentation solutions. This is a package to produce plots while you are in the middle of another task.

I like this idea:  I know the kind of plot I need and just want to throw something together for myself to give me an idea of the underlying data.

Comments closed

Row Width And Snapshot Isolation

Kendra Little shows us the impact that row width has on snapshot isolation:

So I went to work to demonstrate row width impact on the version store — when only a tiny bit column is changed in the row.

Here’s how I did the test:

  • I created two tables, dbo.Narrow and dbo.Wide. They each each have a bit column named bitsy, along with some other columns.
  • I inserted one row in each table, but I put a lot more data into the row in dbo.Wide.
  • I allowed snapshot isolation on the database
  • I began a transaction in another session under snapshot isolation and left the transaction open (so version store cleanup wouldn’t kick in while I looked around)
  • I updated the bit column named bitsy for the single row in each table, thereby generating a row-version in tempdb for each table

The code I ran to test this is here, if you’d like to play around with it.

Read on for the results.

Comments closed

Tic-Tac-Toe In T-SQL

Riley Major implements Tic-Tac-Toe in T-SQL:

It turns out there’s a concept called bitmasking which can work a lot like this cardboard cut-out process. (Props to Dylan Beattie for his quick visual demonstration at NDC Minnesota which drove this point home.) First, you represent your game state with a bunch of bits (“OXOOOXXXX” yields “0100011110” for our example above, remembering that we’re padding that last 0 just to make the powers 1-based instead of 0-based) and then you represent your winning state with a bunch of bits (“0000001110” for our example winning state here). Now you use the magic of “bitwise math” to compare the two.

For our use, we want to find out whether our mask exposes the winning three bits. We want to block everything else out. With bits, to check if both items are true, you use “AND” (0 and 0 is 0; 0 and 1 is 0; 1 and 1 is 1). If we apply that “AND” concept to each bit in our game, it will squash out any values which don’t match. If what we have left matches the mask (fills in all of the space we can see through), then we have a match and a win.

The twist in all of this is that the end result doesn’t quite work as expected, but it was interesting watching the process.  That said, there’s a good reason why we don’t use T-SQL as a primary language for development…

Comments closed

Why You Should Read Gartner Critical Capabilities Reports

Jen Underwood explains the value behind Gartner Critical Capabilities reports, specifically the one for analytics and BI platforms:

Notably, the three Magic Quadrant Leaders except Tableau were ranked near the middle in all use cases. MicroStrategy, Birst, SisenseTIBCOYellowFin, Salesforce, SAS and a few other players excelled above the rest with high scores on this report. These results are a bit refreshing to see. Gartner Critical Capabilities scores seem to better align with Forrester’s rankings of Analytics and Business Intelligence Platforms and also my own understanding of several top offerings. I admit that I was surprised by these results. I was rarely – if ever – asked about several of the top scoring vendors over the past three years.

Read the whole thing, and then read the report.

Comments closed

Azure Data Factory V2 Pricing

Chris Seferlis gives us the details on how Azure Data Factory V2 pricing works:

2. Volume of data moved – this is measured in DMUs (data movement units). This is one you should be aware of as this will default to auto, which is basically using all the DMUs it can use and this is paid for by the hour. Let’s say you specify and use 2 DMUs and it takes an hour to move that data. The other option is you could use 8 DMUs and it takes 15 minutes, this price is going to end up the same. You’re using 4X the DMUs but it’s happening in a quarter of the time.

This is good to look at and do some comparisons since how many DMUs you’re using is where the bulk of your spend if going to be.

There are a few moving parts here, so the calculation is not trivial.  But Chris makes good sense of it all.

Comments closed

Power BI Color Palattes

Meagan Longoria helps us choose a color palette for Power BI reports:

A color palette is simply a collection of colors applied to the visual elements in your report. What we typically refer to as color is a combination of three main properties: hue (base color on the color wheel), intensity (brightness or gray-ness) and value (lightness or darkness). You can build an engaging and professional looking report with just 6 colors. It’s possible to have fewer colors or more colors, but 6 should cover many common visualization needs. If you are using more than 6 colors, you might want to check that you are optimizing engagement and cognitive load.

  1. Main color – default color on graphs

  2. Color 2 – used when multiple colors are needed in a graph or report

  3. Color 3 – used when multiple colors are needed in a graph or report and Color 2 has already been used

  4. Highlight color – a color used to highlight important data points to make them stand out from other points on the page

  5. Border color – a light color used for borders on tables and KPIs where necessary

  6. Title color – color used for visual titles and axis labels as appropriate

There’s a lot of good advice in here.

Comments closed

Finding Procedure Parameters Which Don’t Match Column Names

Shane O’Neill has a process to update procedures to make input parameter names match output column names:

I was asked to standardise stored procedures we use for common support cases before we hand them over to IT Helpdesk.

One of the comments that came back from the Helpdesk while testing was that the parameter names that they had to put values in for didn’t match what they saw in the application.

Luckily for me (or unluckily) the application was a third party developed one and they didn’t bother renaming the columns. So if the column is called create_date in the database then the application is going to show create_date.

However, if I created a parameter called DateCreated or even CreateDate, then they don’t want it.

Shane has a Powershell script which uses the Find-DbaStoredProcedure method in dbatools; click through to see it in action.

Comments closed