Press "Enter" to skip to content

Day: December 26, 2018

Vectorization With Apache Hive And Parquet Tables

Vihang Karajgaonkar, et al, take us through using a performance improvement in Apache Hive using Parquet tables:

The performance benchmarks on CDH 6.0 show that enabling Parquet vectorization significantly improves performance for a typical ETL workload. In the test workload (TPC-DS), enabling parquet vectorization gave 26.5% performance improvement on average (geomean value of runtime for all the queries). Vectorization achieves these performance improvements by reducing the number of virtual function calls and leveraging the SIMD instructions on modern processors. A query is vectorized in Hive when certain conditions like supported column data-types and expressions are satisfied. However, if the query cannot be vectorized its execution falls back to a non-vectorized execution. Overall, for workloads which use the Parquet file format on most modern processors, enabling Parquet vectorization can lead to better query performance in CDH 6.0 and beyond.

This is worth looking into, especially if you are on the Cloudera stack.

Comments closed

More On Confluent’s Licensing Change

Alex Woodie has an article covering Confluent’s recent licensing change:

Confluent this month became the latest commercial open source software company to restrict the use of its software in the cloud. The move prevents cloud companies from using parts of the Confluent Platform, such as the KSQL component that uses SQL to process streaming data, as standalone software as a service (SaaS) offerings.
Jay Kreps, the co-creator of Apache Kafka and the CEO of Confluent, explained the significance of switching the Confluent Platform from the Apache 2.0 license to the new Confluent Community License.

Over at Aiven, CTO Heikki Nousiainen shares his thoughts:

The new Confluent Community License is a proprietary software license, specifically excluding “making available any software-as-a-service, platform-as-a-service, infrastructure-as-a-service or other similar online service that competes with Confluent products or services that provide the Software.”
While the license change does apply to all future versions of the specific software, it doesn’t alter the licensing status of the components in the versions that have been released and utilized by Aiven.

I believe it would be best to read the latter article looking for the significant silences.

Comments closed

Forecasting Field Goal Percentages With Prophet

Marlon Ribunal uses the Prophet library in R to forecast critical information:

I’ve been looking for an easy way to get to learning predictive analysis and forecasting. Prophet provides that path. Prophet is released by Facebook’s Core Data Science Team.
“Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.”
Just to dip my toes into the waters, I tried Prophet’s Quick Start Guide in R.
Let’s forecast the Field Goal Percentage (FG%) of Kyle Kuzma of the Los Angeles Lakers for the next 6 Months.

It’d be critical and important if it were hockey data. Or football data or baseball data or maybe even cricket data (but I don’t understand cricket data and why is that guy still running didn’t he get thrown out or something I don’t get it?).

As far as Prophet goes, it’s a useful library and works well if you’re looking at seasonal time series data.

Comments closed

Switching Azure Portal Accounts

John Morehouse is happy with a change to the Azure Portal:

This means that I could have multiple email accounts that I have to use in order to sign into the portal.  Using a password manager such as 1Password, not usually a big deal and more of an annoyance rather than a headache.
Within the past month or so, Microsoft has updated the portal to allow me to easily switch accounts.  Previously you had to log out of the portal and then log back in.

This is quite convenient. Prior to this change, switching to a different account could goof with other sites I had open (like if I was sending an Outlook e-mail through one account, switching the Azure Portal signed-in account would log me out from Outlook). It’s still not a perfect experience but it’s a lot better.

Comments closed

Improving The SSMS Scroll Bar

Michelle Haarhues shows how you can enable an enhanced scroll bar in SQL Server Management Studio:

There are so many tools within SQL Server Management Studio (SSMS) that can make your job as a DBA or Developer easier that you may or may not be using.  One of the tools available is the customization of the Scroll Bar.  You can change the display and the behavior of the scroll bars, which can make working with code a lot easier and more efficient, especially when working with long code.  The two options we will discuss are Scroll Bar Display and Behavior.

I didn’t like this a lot at first, but as I used it a few times, it grew on me.

Comments closed

Combining Windows And Linux Docker Containers

Rob Sewell crosses the streams:

This is NOT a production ready solution, in fact I would not even recommend that you try it.
I definitely wouldn’t recommend it on any machine with anything useful on it that you want to use again.
We will be using a re-compiled dockerd.exe created by someone else and you know the rules about downloading things from the internet don’t you? and trusting unknown unverified people?
Maybe you can try this in an Azure VM or somewhere else safe.
Anyway, with that in mind, lets go.

That’s the kind of intro that makes me want to try it out.

Comments closed

Troubleshooting Network Issues From The Command Line

Jeff Mlakar walks us through a few tools for troubleshooting network connectivity solely from the command line:

NSLOOKUP
The nslookup command can check the name which an IP address will resolve to or which IP address resolves to a name (aka reverse lookup). This can be done either way as shown:

After having spent the long weekend futzing with Server Core instances for an upcoming project, I can also recommend learning the Powershell tools as well.

Comments closed