2020-03-23 – Curated SQL

Cloudera Data Platform (CDP) is now available on Microsoft Azure Marketplace – so joint customers can easily deploy the world’s first enterprise data cloud on Microsoft Azure.
Last week we announced the availability of Cloudera Data Platform (CDP) on Azure Marketplace. CDP is an integrated data platform that is easy to secure, manage, and deploy. With its availability on the Azure Marketplace, joint customers of Cloudera and Microsoft will be able to easily discover and provision CDP Public Cloud across all Azure regions. Additionally, by procuring CDP through the Azure Marketplace, these customers can leverage integrated billing i.e. the cost of CDP will be part of a single Azure bill making procurement simple and friction-free.

The new Cloudera’s approach has been cloud-first to the point of being cloud-only. It’s an interesting shift from the merger of two on-prem companies.

Comments closed

Color Palettes in R

Published 2020-03-23 by Kevin Feasel

Paul van der Laken talks to us about paleteer:

I often cover tools to pick color palettes on my website (e.g. here, here, or here) and also host a comprehensive list of color packages in my R programming resources overview.
However, paletteer is by far my favorite package for customizing your colors in R!
The paletteer package offers direct access to 1759 color palettes, from 50 different packages!

Just make sure to run your graphics through something like Coblis afterward to ensure that they’re CVD-friendly. H/T R-Bloggers.

Comments closed

Illogical Errors and Implicit Conversion

Published 2020-03-23 by Kevin Feasel

Aaron Bertrand takes us through a problem with seemingly indeterminate query errors:

I’ve talked about illogical errors before. In several answers on Database Administrators (one, two, three), I show how you can use a CASE expression or TRY_CONVERT to work around an error where a non-numeric value, that should have been filtered out by a join or other clause, still leads to a conversion error. Erland Sommarskog raised a Connect item over a decade ago, still unaddressed, called “SQL Server should not raise illogical errors.”
Recently we had a scenario where a query was failing on one server but not another. But this was slightly different; there were no numerics involved. Imagine this scenario: a source table has a column that is varchar(20). A query creates a table variable with a column that is varchar(10), and inserts rows from the source table, with a filter in place that only exposes values that are 10 characters or less.
In a lot of cases, of course, this scenario is perfectly fine, and everything works as expected.

Read the whole thing. There is a method to the madness, and Aaron explains how it can come up in some cases but not others.

Comments closed

Tuning a Query Searching for a Substring in Text

Published 2020-03-23 by Kevin Feasel

Eddy Djaja gives us two methods for improving performance of a search for a fixed substring:

The reason substring function is used because the column ACCOUNTDISPLYVALUE has multiple values combined in one column. In this case, the query is searching for the Account Number which is the first six characters. The long running query is listed below:
set statistics io on go select sum(ACCOUNTINGCURRENCYAMOUNT)from [d365].[GeneralJournalAccountMultiCompanyEntries]where substring([ACCOUNTDISPLAYVALUE], 1, 6) = '877601'

Eddy gives us two solutions. As a quick note, these solutions work because the query is looking for a specific stretch of characters after a specific starting point. For arbitrary text, things get a little trickier.

Comments closed

An Example of Complex CSV Rule Parsing with Power Query

Published 2020-03-23 by Kevin Feasel

Cedric Charlier shows off some of the benefit of Power Query with a fairly complicated set of rules:

At the beginning, some of us thought that it would be easy to fix these issues by returning to the data quality team and ask them to fix these issues but it was not so easy. Identifing the rules needing a fix would be huge task (the CSV files are not created if the test is successful, maling it impossible to address this issue in one run and other impediments). I took the decision to go over this issue with the implementation of the following heuristic:
– if the CSV has a column DateTime then we’ll use it
– if the header is empty or no column is named DateTime then use the first column
– if the content of the selected column is not a date then try to parse it as the inner content of a JSON element.

Read on to see how.

Comments closed

Enabling the SQL Server Agent on Big Data Clusters

Published 2020-03-23 by Kevin Feasel

Mohammad Darab shows how you can enable the SQL Server Agent when running a Big Data Cluster:

In order to enable the SQL Agent, Microsoft has provided a workaround. I’m sure in a future release you will have that option, but as of right now there is no other way. The process is straight forward.

Mohammad gives us the steps and shows it to us in video form.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: March 23, 2020

Cloudera Data Platform in Azure Marketplace

Color Palettes in R

Illogical Errors and Implicit Conversion

Tuning a Query Searching for a Substring in Text

An Example of Complex CSV Rule Parsing with Power Query

Enabling the SQL Server Agent on Big Data Clusters