Press "Enter" to skip to content

Day: July 29, 2020

Dates and Timestamps in Spark 3.0

Maxim Gekk, et al, look at the different date and time data types in Apache Spark 3.0:

The definition of a Date is very simple: It’s a combination of the yearmonth and day fields, like (year=2012, month=12, day=31). However, the values of the year, month and day fields have constraints, so that the date value is a valid day in the real world. For example, the value of month must be from 1 to 12, the value of day must be from 1 to 28/29/30/31 (depending on the year and month), and so on.

These constraints are defined by one of many possible calendars. Some of them are only used in specific regions, like the Lunar calendar. Some of them are only used in history, like the Julian calendar. At this point, the Gregorian calendar is the de facto international standard and is used almost everywhere in the world for civil purposes. It was introduced in 1582 and is extended to support dates before 1582 as well. This extended calendar is called the Proleptic Gregorian calendar.

Starting from version 3.0, Spark uses the Proleptic Gregorian calendar, which is already being used by other data systems like pandas, R and Apache Arrow. Before Spark 3.0, it used a combination of the Julian and Gregorian calendar: For dates before 1582, the Julian calendar was used, for dates after 1582 the Gregorian calendar was used. This is inherited from the legacy java.sql.Date API, which was superseded in Java 8 by java.time.LocalDate, which uses the Proleptic Gregorian calendar as well.

Even in this three-paragraph snippet, you can already get a feeling for how complex working with dates can be. Then throw in the complexities of time and you get a detailed post full of good information.

Comments closed

Building an End-to-End Streaming App with Flink SQL

Jark Wu lays down the guantlet:

Apache Flink 1.11 has released many exciting new features, including many developments in Flink SQL which is evolving at a fast pace. This article takes a closer look at how to quickly build streaming applications with Flink SQL from a practical point of view.

In the following sections, we describe how to integrate Kafka, MySQL, Elasticsearch, and Kibana with Flink SQL to analyze e-commerce user behavior in real-time. All exercises in this blogpost are performed in the Flink SQL CLI, and the entire process uses standard SQL syntax, without a single line of Java/Scala code or IDE installation.

Read on for a demo using only bash and Flink SQL.

Comments closed

Working with SQL Server Configuration Files

Jamie Wick takes us through an underrated part of the SQL Server installer:

The ability to use a parameter file (configurationfile.ini), for automating the installation of SQL Server, has been around for many years. However, each release of SQL Server has had different parameters that could be included in the file. Here are some directions on how to find or create a parameter file, along with the parameter values that are supported by each version of SQL Server.

I appreciate the fact that every installation of SQL Server generates one of these and even points it out to you as you go through the installer wizard. And Jamie has gone a step further by giving us an Excel spreadsheet with all of the available settings and their defaults.

Comments closed

Futureproofing a Power BI Solution

Paul Turley begins a series on doing Power BI the right way:

The purpose of this post is to provide some guidance to help you design Power BI solutions that can survive the journey through these stages with as little “throw-away” design as possible. There will always be prototyping and redesign in any project but if you follow proven design patterns and lessons learned from prior experience, you’ll have a much better chance of building a reporting solution that will endure.

Oh, I should mention this… the difference between item #1 (the simple prototype) and #2 (working proof-of-concept) is that you need to throw away the first one – and anything else that doesn’t lay a foundation that you can build on top of. This is a hard lesson but one that will pay off by helping our teams, sponsors and business stakeholders understand that we can proceed after laying down solid bricks (based on well-defined functional and technical requirements) to build the rest of the solution.

That italicized part was important enough for me to call it out. Far too often we develop proofs of concept which work well enough for a demo, and then the next question is “Great, when will it be in production?”

Comments closed

Changing SQL Server Settings with dbatools

Mikey Bronowski walks through a slew of dbatools cmdlets which replicate behavior from SQL Server Management Studio:

The reasons to alter the database might be as many as different users. Some want to rename the old database, or maybe change the owner or recovery model after restoring the database. Enabling features like Query Store is also common. Most of this can be done from the Database Properties screen and as well with dbatools.

This is part of a whole series on dbatools which Mikey has been putting together, and to which you can find a link as you click through.

Comments closed

Simulating DATESERIAL in SQL Server

Madhivanan has nearly a dozen methods to replicate the functionality of the old DateSerial function in VB:

DateSerial function accepts three paramter values year,month and day and return a valid date value with time set to midnight. SQL Server does not support this function until version 2012 is released.

There can be many ways to simulate this functionality. Here are 10 different ways

I’d be pretty upset if some of those were in my code base, but this shows that there are plenty of ways to get to the same result. Just, uh, don’t use all of them.

Comments closed

Lessons Learned from Migrating to SQL Server 2017 with Availability Groups

Lee Markum has an after-action report:

In late 2019, a project that had been in progress for well over a year finally came to a conclusion.  I had collaborated with a number of people to migrate a stand alone SQL Server 2008 and two stand alone SQL Server 2008 R2 instances.  Each stand alone SQL Server was migrated to a three node Availability Group. Here are a few things learned along the way.

Click through for some good advice.

Comments closed

Transforming JSON to CSV: ADF vs Databricks

Rayis Imayev compares two methods of transforming a JSON-structured data set into a CSV:

There is a well known and broadly advertised message from Microsoft that Azure Data Factory (ADF) is a code-free environment to help you to create your data integration solutions – https://azure.microsoft.com/en-us/resources/videos/microsoft-azure-data-factory-code-free-cloud-data-integration-at-scale/. I agree and support this approach of using drag and drop visual UI to build and automate data pipelines without writing code. However, I’m also interested to try if I can recreate certain ADF operations by writing code, just out of my curiosity.

Rayis includes a link to the Azure Data Factory step-by-step demonstration and then kicks it up a notch with Databricks. Read on to see how the two compare.

Comments closed