Kevin Feasel – Page 826

The definition of a Date is very simple: It’s a combination of the year, month and day fields, like (year=2012, month=12, day=31). However, the values of the year, month and day fields have constraints, so that the date value is a valid day in the real world. For example, the value of month must be from 1 to 12, the value of day must be from 1 to 28/29/30/31 (depending on the year and month), and so on.
These constraints are defined by one of many possible calendars. Some of them are only used in specific regions, like the Lunar calendar. Some of them are only used in history, like the Julian calendar. At this point, the Gregorian calendar is the de facto international standard and is used almost everywhere in the world for civil purposes. It was introduced in 1582 and is extended to support dates before 1582 as well. This extended calendar is called the Proleptic Gregorian calendar.
Starting from version 3.0, Spark uses the Proleptic Gregorian calendar, which is already being used by other data systems like pandas, R and Apache Arrow. Before Spark 3.0, it used a combination of the Julian and Gregorian calendar: For dates before 1582, the Julian calendar was used, for dates after 1582 the Gregorian calendar was used. This is inherited from the legacy java.sql.Date API, which was superseded in Java 8 by java.time.LocalDate, which uses the Proleptic Gregorian calendar as well.

Even in this three-paragraph snippet, you can already get a feeling for how complex working with dates can be. Then throw in the complexities of time and you get a detailed post full of good information.

Comments closed

Building an End-to-End Streaming App with Flink SQL

Published 2020-07-29 by Kevin Feasel

Jark Wu lays down the guantlet:

Apache Flink 1.11 has released many exciting new features, including many developments in Flink SQL which is evolving at a fast pace. This article takes a closer look at how to quickly build streaming applications with Flink SQL from a practical point of view.
In the following sections, we describe how to integrate Kafka, MySQL, Elasticsearch, and Kibana with Flink SQL to analyze e-commerce user behavior in real-time. All exercises in this blogpost are performed in the Flink SQL CLI, and the entire process uses standard SQL syntax, without a single line of Java/Scala code or IDE installation.

Read on for a demo using only bash and Flink SQL.

Comments closed

Futureproofing a Power BI Solution

Published 2020-07-29 by Kevin Feasel

Paul Turley begins a series on doing Power BI the right way:

The purpose of this post is to provide some guidance to help you design Power BI solutions that can survive the journey through these stages with as little “throw-away” design as possible. There will always be prototyping and redesign in any project but if you follow proven design patterns and lessons learned from prior experience, you’ll have a much better chance of building a reporting solution that will endure.
Oh, I should mention this… the difference between item #1 (the simple prototype) and #2 (working proof-of-concept) is that you need to throw away the first one – and anything else that doesn’t lay a foundation that you can build on top of. This is a hard lesson but one that will pay off by helping our teams, sponsors and business stakeholders understand that we can proceed after laying down solid bricks (based on well-defined functional and technical requirements) to build the rest of the solution.

That italicized part was important enough for me to call it out. Far too often we develop proofs of concept which work well enough for a demo, and then the next question is “Great, when will it be in production?”

Comments closed

Working with SQL Server Configuration Files

Published 2020-07-29 by Kevin Feasel

Jamie Wick takes us through an underrated part of the SQL Server installer:

The ability to use a parameter file (configurationfile.ini), for automating the installation of SQL Server, has been around for many years. However, each release of SQL Server has had different parameters that could be included in the file. Here are some directions on how to find or create a parameter file, along with the parameter values that are supported by each version of SQL Server.

I appreciate the fact that every installation of SQL Server generates one of these and even points it out to you as you go through the installer wizard. And Jamie has gone a step further by giving us an Excel spreadsheet with all of the available settings and their defaults.

Comments closed

Simulating DATESERIAL in SQL Server

Published 2020-07-29 by Kevin Feasel

Madhivanan has nearly a dozen methods to replicate the functionality of the old DateSerial function in VB:

DateSerial function accepts three paramter values year,month and day and return a valid date value with time set to midnight. SQL Server does not support this function until version 2012 is released.
There can be many ways to simulate this functionality. Here are 10 different ways

I’d be pretty upset if some of those were in my code base, but this shows that there are plenty of ways to get to the same result. Just, uh, don’t use all of them.

Comments closed

Changing SQL Server Settings with dbatools

Published 2020-07-29 by Kevin Feasel

Mikey Bronowski walks through a slew of dbatools cmdlets which replicate behavior from SQL Server Management Studio:

The reasons to alter the database might be as many as different users. Some want to rename the old database, or maybe change the owner or recovery model after restoring the database. Enabling features like Query Store is also common. Most of this can be done from the Database Properties screen and as well with dbatools.

This is part of a whole series on dbatools which Mikey has been putting together, and to which you can find a link as you click through.

Comments closed

Transforming JSON to CSV: ADF vs Databricks

Published 2020-07-29 by Kevin Feasel

Rayis Imayev compares two methods of transforming a JSON-structured data set into a CSV:

There is a well known and broadly advertised message from Microsoft that Azure Data Factory (ADF) is a code-free environment to help you to create your data integration solutions – https://azure.microsoft.com/en-us/resources/videos/microsoft-azure-data-factory-code-free-cloud-data-integration-at-scale/. I agree and support this approach of using drag and drop visual UI to build and automate data pipelines without writing code. However, I’m also interested to try if I can recreate certain ADF operations by writing code, just out of my curiosity.

Rayis includes a link to the Azure Data Factory step-by-step demonstration and then kicks it up a notch with Databricks. Read on to see how the two compare.

Comments closed

Lessons Learned from Migrating to SQL Server 2017 with Availability Groups

Published 2020-07-29 by Kevin Feasel

Lee Markum has an after-action report:

In late 2019, a project that had been in progress for well over a year finally came to a conclusion. I had collaborated with a number of people to migrate a stand alone SQL Server 2008 and two stand alone SQL Server 2008 R2 instances. Each stand alone SQL Server was migrated to a three node Availability Group. Here are a few things learned along the way.

Click through for some good advice.

Comments closed

Slopegraphs in Action

Published 2020-07-28 by Kevin Feasel

Mike Cisneros takes us through slopegraphs:

When we use slopegraphs, we are trying to show one specific thing: is the value in the first column higher, lower, or the same as the value in the second column? That change is easy to see when we connect those values with lines, because the lines will slope up or down, in the direction of the change.
The steeper the slope, the bigger the change; and, if one thing is going up more dramatically than its neighbors, a slopegraph will make that easier to see than a traditional line graph would.

Mike has some examples of where slopegraphs make sense as well as cases when they don’t.

Comments closed

Real-World Sentiment Analysis Examples

Published 2020-07-28 by Kevin Feasel

Ines Roldos shares a few examples of sentiment analysis:

Net Promoter Score (NPS) surveys are one of the most common ways of knowing how customers perceive a product or service. Basically, they consist of two stages: first, you ask a customer to score a business from 0 to 10, then you ask them to give reasons for the score they leave with open-ended question.
When it comes to processing the results, the first stage is easy: you just have to calculate the average score. But when it comes to analyzing tons of open-ended NPS responses, the analysis becomes more complicated. Imagine if your team had to tag hundreds of responses manually. Not only it would be a tedious and time-consuming task, it may also lead to inconsistent results derived from different criteria during the tagging process.
Fortunately, sentiment analysis enables you to process large volumes of NPS responses and obtain consistent results in a very fast and simple way.

It might just be the industry I’m in, but I don’t really get excited about sentiment analysis. Still, don’t let my biases influence your thought process too much.

Comments closed

Author: Kevin Feasel

Dates and Timestamps in Spark 3.0

Building an End-to-End Streaming App with Flink SQL

Futureproofing a Power BI Solution

Working with SQL Server Configuration Files

Simulating DATESERIAL in SQL Server

Changing SQL Server Settings with dbatools

Transforming JSON to CSV: ADF vs Databricks

Lessons Learned from Migrating to SQL Server 2017 with Availability Groups

Slopegraphs in Action

Real-World Sentiment Analysis Examples