Press "Enter" to skip to content

Month: March 2020

Options for SQL Server in Azure

Josh Darnell looks at the options for running SQL Server in Azure:

Things are always changing the cloud, but as it stands in early 2020, these are the main options for SQL Server in Azure:

– SQL Virtual Machines
– Managed Instances
– Databases (this one really has a number of sub-options, like serverless, hyperscale, and elastic pools)

I’m going to touch on each of these options, why you might choose them, and some of my personal takeaways from the talk as well.

Read on for the summary of each. Each one fills a different set of needs and has a different set of available options.

Comments closed

Coding Tips for a Younger Self

Parvinder Nijjar has some good tips for less-experienced developers:

Always break a big problem into a series of small problems. Sometimes a big problem can feel daunting and be more difficult than it seems. Break the problem down into smaller blocks and then solve each block one at a time. It’s okay to skip a block and return to a previous block as it may give you clues to fixing the skipped block. In all honesty whilst coding the parts you can do a piece of divine inspiration will enable you to problem you couldn’t solve or you will enough code to show one of your more capable colleagues that you had tried to fix the issue and now this rubbish was there problem to fix.

I am basically at the point of self-parody in my love of functional programming languages, but this is one of those things FP really reinforces: bottom-up development and chaining together lots of little pieces to create big pieces. But there is good advice in here no matter what language you use.

Comments closed

Fun with Python: Calculating Pi

Jon Fletcher implements a method of estimating the value of Pi:

This series converges to Pi, the more terms that are added to the series, the closer the value is to Pi.
For the proof on why this series converges to Pi – https://proofwiki.org/wiki/Leibniz’s_Formula_for_Pi
There are several points to note about the series:

– It’s infinite, we need to find a way to continue adding term after term.
– The denominator of the fraction increases by 2 every term.
– The terms alternate between positive and negative.

Click through for the implementation of the formula in Python. And what you should do if you really need to reference Pi in your Python code.

Comments closed

Azure Data Studio March 2020 Release

Alan Yu announces the March 2020 release of Azure Data Studio:

Now you can add visualizations using a T-SQL query. In addition, as the gif illustrates, you can also customize your visualization whether it is a scatter or time series graph.

You can also copy your visualization or save the image so that you can quickly add this in an email or report to other team members.

We will continue to bring improvements to charting over the next few months.

They’ve put a lot of time and effort into notebooks. They’re still missing some of the quality of life improvements I want to see before moving to them full-time, but they’re consistently getting better.

Comments closed

Improving Queries at the Margin

Jared Poche has a story about improving a query which is already pretty fast:

In my last post, I spoke about optimizing a procedure that was being executed hundreds of millions of times per day, and yes, that is expected behavior.

The difficult thing about trying to optimize this procedure is that it only takes 2.5ms on average to run. Tuning this isn’t a matter of changing a scan to a seek; we’ll have to look hard to find the opportunities here. A one millisecond Improvement on a procedure running 100 million times a day would save 100,000 seconds every day.

Well, I’ve found a few more options since my last post, and wanted to share my findings.

Read on to see how Jared tries to tackle one specific case.

Comments closed

Cloudera Data Platform in Azure Marketplace

Ram Venkatesh announces availability for Cloudera Data Platform in the Azure Marketplace:

Cloudera Data Platform (CDP) is now available on Microsoft Azure Marketplace – so joint customers can easily deploy the world’s first enterprise data cloud on Microsoft Azure.

Last week we announced the availability of Cloudera Data Platform (CDP) on Azure Marketplace. CDP is an integrated data platform that is easy to secure, manage, and deploy. With its availability on the Azure Marketplace, joint customers of Cloudera and Microsoft will be able to easily discover and provision CDP Public Cloud across all Azure regions. Additionally, by procuring CDP through the Azure Marketplace, these customers can leverage integrated billing i.e. the cost of CDP will be part of a single Azure bill making procurement simple and friction-free.

The new Cloudera’s approach has been cloud-first to the point of being cloud-only. It’s an interesting shift from the merger of two on-prem companies.

Comments closed

Color Palettes in R

Paul van der Laken talks to us about paleteer:

I often cover tools to pick color palettes on my website (e.g. herehere, or here) and also host a comprehensive list of color packages in my R programming resources overview.

However, paletteer is by far my favorite package for customizing your colors in R!

The paletteer package offers direct access to 1759 color palettes, from 50 different packages!

Just make sure to run your graphics through something like Coblis afterward to ensure that they’re CVD-friendly. H/T R-Bloggers.

Comments closed

Illogical Errors and Implicit Conversion

Aaron Bertrand takes us through a problem with seemingly indeterminate query errors:

I’ve talked about illogical errors before. In several answers on Database Administrators (onetwothree), I show how you can use a CASE expression or TRY_CONVERT to work around an error where a non-numeric value, that should have been filtered out by a join or other clause, still leads to a conversion error. Erland Sommarskog raised a Connect item over a decade ago, still unaddressed, called “SQL Server should not raise illogical errors.”

Recently we had a scenario where a query was failing on one server but not another. But this was slightly different; there were no numerics involved. Imagine this scenario: a source table has a column that is varchar(20). A query creates a table variable with a column that is varchar(10), and inserts rows from the source table, with a filter in place that only exposes values that are 10 characters or less.

In a lot of cases, of course, this scenario is perfectly fine, and everything works as expected.

Read the whole thing. There is a method to the madness, and Aaron explains how it can come up in some cases but not others.

Comments closed

Tuning a Query Searching for a Substring in Text

Eddy Djaja gives us two methods for improving performance of a search for a fixed substring:

The reason substring function is used because the column ACCOUNTDISPLYVALUE has multiple values combined in one column. In this case, the query is searching for the Account Number which is the first six characters. The long running query is listed below:

set statistics io on
go
select sum(ACCOUNTINGCURRENCYAMOUNT)from [d365].[GeneralJournalAccountMultiCompanyEntries]where substring([ACCOUNTDISPLAYVALUE], 1, 6)  = '877601'

Eddy gives us two solutions. As a quick note, these solutions work because the query is looking for a specific stretch of characters after a specific starting point. For arbitrary text, things get a little trickier.

Comments closed

An Example of Complex CSV Rule Parsing with Power Query

Cedric Charlier shows off some of the benefit of Power Query with a fairly complicated set of rules:

At the beginning, some of us thought that it would be easy to fix these issues by returning to the data quality team and ask them to fix these issues but it was not so easy. Identifing the rules needing a fix would be huge task (the CSV files are not created if the test is successful, maling it impossible to address this issue in one run and other impediments). I took the decision to go over this issue with the implementation of the following heuristic:

– if the CSV has a column DateTime then we’ll use it
– if the header is empty or no column is named DateTime then use the first column
– if the content of the selected column is not a date then try to parse it as the inner content of a JSON element.

Read on to see how.

Comments closed