Apache Avro 1.9.0 Released

Kevin Feasel

2019-05-20

Hadoop

Fokko Driesprong announces the release of Apache Avro 1.9.0:

Avro is a remote procedure call and data serialization framework developed within Apache’s Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. If you’re unfamiliar with Avro, I would highly recommend the explanation of Dennis Vriend at Binx.io about an introduction into Avro.

Over 272 Jira tickets have been resolved, and 844 PRs are included since 1.8.2. I’d like to point out several major changes.

That’s a lot of tickets.

Loading Data Into SnowflakeDB

Dan Bilsborough shows a couple ways of loading data into SnowflakeDB from Azure:

Before being loaded into a Snowflake table, the data can be optionally staged, which is essentially just a pointer to a location where the files are stored. There are different types of stages including:
– User stages, which each user will have by default
– Table stages, which each table will have by default
– Internal named stages, meaning staged within Snowflake

Internal named stages are the best option for regular data loads, if you are thinking along the lines of your standard daily ETL process. One benefit of these is the flexibility in that they are database objects, so you can grant privileges to roles to access these objects as you would expect. Alternatively, there are external stages, such as Azure Blob storage.

Read on to see what comes next.

Proposed Max Server Memory Defaults

Randolph West has a proposal for default max server memory on a SQL Server instance:

As noted in the previous post in this series, memory in SQL Server is generally divided between query plans in the plan cache, and data in the buffer pool (other uses for memory in SQL Server are listed later in this post).

The official documentation tells us:
[T]he default setting for max server memory is 2,147,483,647 megabytes (MB).

Look carefully at that number. It’s 2 billion megabytes. In other words, we might think of it as either 2 million gigabytes2,048 terabytes, or 2 petabytes.

Randolph is writing this like we don’t all have multiple petabytes of RAM on each machine.

Bash Script Introductions

Kellyn Pot’vin-Gorman continues a series on Bash scripting:

For Part II, we’ll start with the BASH script “introduction”.

The introduction in a BASH script should begin the same in all scripts.
1. Set the shell to be used for the script
2. Set the response to failure on any steps, (exit or ignore)
3. Add in a step for testing, but comment out or remove when in production

For our scripts, we’ll keep to the BASH format that is used by the template scripts, ensuring a repeatable and easy to identify introduction.

Click through to see what that entails.

Multi-Line Powershell Comments

Jess Pomfret shows how you can build out Powershell comments with multiple lines of code in them:

You can see above the first example looks good, however in the second example the first two lines should both have a prompt to show they are code. I spent a little while Googling this without much avail. I then figured, somewhere within dbatools there must be an example with two lines of code. Sure enough I found my answer, and it’s pretty straightforward.

Click through for the answer, as well as one of the most important Powershell cmdlets you’ll ever find on the Internet.

Making Corporate Color Palettes Palatable

Meagan Longoria takes us through a corporate coloring problem:

Last week, I had a conversation on twitter about dealing with corporate color palettes that don’t work well for data visualization. Usually, this happens because corporate palettes are designed with websites and/or marketing collateral in mind rather than information graphic design. This often results in colors being too bright, dark, or dull to be used together in a report. Sometimes the colors aren’t easily distinguishable from each other. Other times, the colors needed for various situations (main color, ancillary colors, highlight color, error color, KPIs, text, borders) aren’t available in the corporate palette.

You can still stay on brand and create a consistent user experience with a color palette optimized for data visualization. But you may not be using the exact hex values as defined in the corporate palette. I like to say the data viz color palette is “inspired by” the marketing color palette.

Click through for lots of goodies, including a link to a really interesting color tester.

Minimal Logging into Empty Clustered Indexes

Paul White explains how to perform minimal logging when using the INSERT..SELECT pattern to insert into an empty table with a clustered index:

The summary top row suggests that all inserts to an empty clustered index will be minimally logged as long as TABLOCK and ORDER hints are specified. The TABLOCK hint is required to enable the RowSetBulk facility as used for heap table bulk loads. An ORDER hint is required to ensure rows arrive at the Clustered Index Insert plan operator in target index key order. Without this guarantee, SQL Server might add index rows that are not sorted correctly, which would not be good.

Unlike other bulk loading methods, it is not possible to specify the required ORDER hint on an INSERT...SELECT statement. This hint is not the same as using an ORDER BY clause on the INSERT...SELECT statement. An ORDER BY clause on an INSERTonly guarantees the way any identity values are assigned, not row insert order.

Read on to see what you can do.

SQL Server and Recent Security Patches

Allan Hirt takes us through recent security updates and how they pertain to SQL Server:

After Spectre and Meltdown a few months back (which I cover in this blog post from January 4), another round of processor issues has hit the chipmaker. This one is for MDS (also known as a ZombieLoad) This one comprises the following security issues: CVE-2019-11091, CVE-2018-12126, CVE-2018-12127, and CVE-2018-12130. Whew! Fun fact: CVE stands for “Common Vulnerabilities and Exposures”.

As of now, this is only known to be an Intel, not AMD, issue. That is an important distinction here. The official Intel page on this issue can be found at this link. This issue does not exist in select 8th and 9th generation Intel Core processors as well as the 2nd generation Xeon Scalable processor family. (read: the latest stuff) 

Be sure to read through all of this. Most of the notes are for non-SQL Server items which have an impact rather than bugs in SQL Server itself, but that doesn’t make patching any less important.

Categories

May 2019
MTWTFSS
« Apr Jun »
 12345
6789101112
13141516171819
20212223242526
2728293031