Press "Enter" to skip to content

Day: March 11, 2021

Batch Execution Mode in Flink’s DataStream API

Dawid Wysakowicz takes us through batch execution mode in a streaming solution:

Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the DataStream API, and took the first steps towards enabling efficient batch execution in the DataStream API.

Read on to see the progress they’ve achieved so far.

Comments closed

Two Disliked Data Types

Aaron Bertrand has two bones to pick:

I am not often one to do the bare minimum, unless it comes to looking for things around the house. After about 22 seconds I throw my hands in the air and exclaim, “I can’t find it!” As my wife so kindly added: It usually turns out to be in a spot I already looked.

But when it revolves around SQL Server and opinions, I’m all over it. So I’m not going to talk about a data type today; I’m going to talk about two of them. One of them Brent already mentioned in his invite:

I kinda-sorta disagree with Aaron’s second choice. By that I mean that I fully agree with his premise: use UTC everywhere. But if you don’t use UTC everywhere, then use DATETIMEOFFSET everywhere and apply the time zones.

Comments closed

The Problems with VARCHAR(MAX)

Deepthi Goguri has a complaint about VARCHAR(MAX):

Though VARCHAR(MAX) is suitable in situations with large strings of data, it has its own complications that we need to consider. In my career as a DBA, I at least saw couple of times SQL developers using VARCHAR(MAX) when they should not. Fixing the datatypes once in production is painful and causes risks.

I think Deepthi’s advice is sound: use VARCHAR(MAX) when necessary but not as a starting point. I don’t shy away from VARCHAR(MAX) on principle (except with columnstore indexes—get that noise right out of here) and I don’t think you should either, as long as you understand the ramifications.

Comments closed

Date and Time Data Types

Deborah Melkin takes a look at the different date and time data types:

But I think I’ll “wax poetic” about datetimes and all of the varieties. When I first got started with SQL Server back in the SQL 6.5 days, we really just had datetime, which was the date and time. If you needed just the time, you ignored the date portion. If you needed just the date, you put midnight (00:00:00.000) for the time. And when you dealt with the time portion, you deal with rounding to the .003 millisecond.

Read on for more info about each type.

1 Comment

Fragmentation and GUIDs

Chad Callihan doesn’t llike the UNIQUEIDENTIFIER data type:

My mind quickly went to the uniqueidentifier (GUID) data type. It may not be fair but I think of it as my least favorite. The reasoning is more of a pet peeve. Most of the time there’s nothing wrong with the uniqueidentifier data type; however, it makes me cringe if it is the clustering key on a table when an INT would do just fine because it ends up wasting disk space.

On this topic, Jeff Moden had a really great presentation for our SQL Server user group. He has a rather contrarian take and interesting findings on fragmentation in practice. It’s a lengthy and advanced talk, but definitely worth the nearly 2 1/2 hours.

Comments closed

Against VARBINARY(MAX)

Travis Page wants you to think twice before using VARBINARY(MAX):

Brent Ozar put out a call for blogging about your data type of choice. This might be a favorite (or least favorite) datatype. Naturally, varchar(max) and nvarchar(max) are going to have their punishment, deservedly. The datatype I’m not so sure is going to be getting the bludgeoning it deserves is varbinary(max). Sure, there are valid uses for the data type, but having the potential for just shy of 2GB of binary LOB data stored in a database has some negative potentials. Let’s take a look at some of these pitfalls.

Click through for a good reckoning of the downsides. In terms of upside, two good ones that I actively use are:

  • VARBINARY(MAX), when combined with the COMPRESS() function, can efficiently store a lot of text data, as that text data gets gzipped. This works best if you just need to store data but not search through it. Then, DECOMPRESS() when it’s time for that data to show up in an app somewhere.
  • SQL Server Machine Learning Services models are stored in VARBINARY(MAX).

Comments closed