Press "Enter" to skip to content

Curated SQL Posts

Executing Azure SSIS Packages from Blob Storage

Andy Leonard cranks it to the next level:

I confess: I have been waiting for this feature since I first learned of Azure-SSIS.

When I first saw Azure-SSIS – which creates an Azure Data Factory Integration Runtime and SSIS Catalog in the cloud, my first thought was a paraphrase Ferris Bueller’s question about dying the river green: “If we can execute SSIS packages from the SSIS Catalog in Azure Data Factory, why can’t we execute SSIS packages from Azure Blob Storage?” Today, we can.

Read on to see how you can do it.

Comments closed

Column From Examples in Power Query

Matthew Roche shows off the “column from examples” feature in Power Query:

Here’s the quick overview:

1. In the Power Query editor in Power BI Desktop, choose “Column from Examples” from the “Add Column” tab.
2. Enter the values that the new column should have for rows that are already in your data set.
3. Review the values that Power Query is suggesting for the other rows, and when they are all correct, choose OK, and then say “Ooooooohhhhh” when Power Query does all the work for you.

It’s a little hard to see from Matthew’s image but this is a great feature when you have a regular pattern but don’t want to put together a regular expression yourself.

Comments closed

Pandas Multiindex and T-SQL

Tomaz Kastrun explains why you should never cross the streams:

1. SQL Server and Python Pandas Indexes are two different worlds and should not be mixed.
2. SQL Server uses Index primarily for DML operations and to keep data ACID.
3. Python Pandas uses Index and MultiIndex for keeping data dimensionality when performing data wrangling and statistical analysis.
4. SQL Server Index and Python Pandas Index don’t know about each other’s existence, meaning if user want to propagate the T-SQL index to Python Pandas (in order to minimize the impact of duplicates, missing values or to impose the relational model), it needs to be introduced and created, once data enters “in the python world”.

Read on for additional conclusions and the demos which bring us here.

Comments closed

CQL: Category Theory-Based Querying Language

John Cook looks at a querying language based on category theory:

My interest in category theory waxes and wanes, and just as it was was at its thinnest crescent phase I ran across CQL, categorical query language. I haven’t had time to look very far into it, but it seems promising. The site’s modest prose relative to the revolutionary rhetoric of some category enthusiasts makes me have more confidence that the authors may be on to something useful.

I’m going through some lectures on category theory now and am in a big functional programming phase, so this is interesting but I won’t be giving up SQL anytime soon for it.

Comments closed

Checkpoints Under Simple Recovery

Max Vernon shows us how checkpointing works when your database is in the simple recovery model:

Even though the transaction has been rolled back, the log records will not be cleared until a checkpoint occurs. An automatic checkpoint could be triggered by other ongoing transactions being written to the log, or a manual CHECKPOINT statement could be executed. However, for a database that is not seeing frequent transactions, the log may stay nearly full for an extended period of time. This scenario might be seen often during development where there are a very limited number of transactions being generated. 

Read the whole thing. Just because you’re in simple recovery mode doesn’t mean the transaction log becomes any less useful.

Comments closed

When Power BI Publish to Web is the Wrong Choice

Treb Gatte takes us through Publish to Web and the importance of getting the correct licensing for Power BI:

Some folks, either to avoid the need to buy a Microsoft Power BI license or in trying to embed Microsoft Power BI content in an On-Premises site like Microsoft SharePoint 2013, published their content using this function.

The risk is that if the content is on a page that gets indexed by a major search engine, like Google, the embed code will likely live in Google’s index forever. Then anyone can search for your data.

Read on to see the right way to do this. Treb also notes that there are good use cases for Publish to Web; you just have to make sure yours is one of them.

Comments closed

Unicode Escaping Across Various Languages

Solomon Rutzky shows how to perform Unicode character escaping in a dozen places:

The purpose of this post is to correct the overall lack of examples. Everything shown below are actual working examples of creating both a Unicode-only BMP character (meaning a non-Supplementary Character that would require Unicode) and a Supplementary Character. Most examples include a link to an online demo, either on db<>fiddle (for database demos) or IDE One (for non-database demos), both very cool and handy sites.

This includes T-SQL and MySQL as well as .NET languages, PHP, JavaScript, and even Excel. It’s a handy reference page.

Comments closed

Pre-Filtering Power BI Slicers

Matt Allington takes us through a new feature in Power BI:

Now back to the point of this post.  As of June 2019, it is possible to pre-filter slicers as well.  It may seem weird, but this previously wasn’t possible – it seemed weird to me, anyway.  This is now fixed and it is possible to use the side filter pane in the same way as other visuals.  I can think of quite a few useful scenarios, including:
– Hiding the dreaded (Blank) in a slicer.  [Actually, please don’t do that, but instead fix your data model].
– Filtering out items not relevant (eg category managers may only want to see their own products)
– Hiding items with no sales

Read on for a few examples of how to use this.

Comments closed

Random Forest on Small Numbers of Observations

Neil Saunders takes us through an interesting problem:

A recent question on Stack Overflow [r] asked why a random forest model was not working as expected. The questioner was working with data from an experiment in which yeast was grown under conditions where (a) the growth rate could be controlled and (b) one of 6 nutrients was limited. Their dataset consisted of 6 rows – one per nutrient – and several thousand columns, with values representing the activity (expression) of yeast genes. Could the expression values be used to predict the limiting nutrient?

The random forest was not working as expected: not one of the nutrients was correctly classified. I pointed out that with only one case for each outcome, this was to be expected – as the random forest algorithm samples a proportion of the rows, no correct predictions are likely in this case. As sometimes happens the question was promptly deleted, which was unfortunate as we could have further explored the problem.

Neil decided to explore the problem further regardless and came to some interesting conclusions.

Comments closed

Monitoring Big Data Clusters

Mohammad Darab continues a series on Big Data Clusters:

There are many ways to view the health of your Big Data Cluster. As of CTP 3.0, there are kubectl commands, mssqlctl commands as well as dashboards. For the sake of this series, I will focus on the dashboards. I will blog about some of the useful kubectl and mssqlctl commands in later posts.

The first dashboard is the Microsoft Cluster Administration portal (see below snapshot). This is a view into the Big Data Cluster Controller. As you can see from the image below, the Overview pane shows the Controller, Master Instance and all the pools. On the left hand side you can see more details. If you click on the “Service Endpoint” option, you will see a list of endpoints that you can bookmark.

Something I appreciate is that Microsoft thought ahead on what the monitoring story should look like rather than waiting until the end and slapping something together.

Comments closed