I’m starting a little series on some of the nice features/capabilities in Snowflake (the cloud data warehouse). In each part, I’ll highlight something that I think it’s interesting enough to share. It might be some SQL function that I’d really like to be in SQL Server, it might be something else.
In the first part: the GENERATOR function. In short, it lets you generate a virtual table with a specified number of rows, or in database lingo: a tally table (or numbers table). A tally table can help you solve a lot of problems in SQL, but the TL;DR version is that it replaces loops/cursors most of the time and allows you to tackle the issue in a true set-based manner.
Naturally, as I read the article, I got the Bad Religion song stuck in my head. That’s an occupational hazard, I suppose.
Tab order is the order in which users interact with the items on a page using the keyboard. Generally, we want tab order to be predictable and to closely match the visual order on the page (unless there is a good reason to deviate). If you press the tab key in a Power BI report, you might be surprised at the seemingly random order in which you move from visual to visual on the page. The default order is the order in which the visuals were placed on the page in Power BI Desktop, or the last modified order in PowerBI.com if you have edited your report there.
I had not actually tried to tab through a dashboard in Power BI, so this is news to me.
This is noted in the tweet as a cause, but let’s test this.
One of the common ways that we get gaps in identity values is when rows are deleted. Let’s remove the row with Steve in it.
Steve explains a few others, but even that’s not complete: identity columns can jump after the service restarts as well.
In short, please do not use identity values in cases where you need to guarantee sequentiality (like check numbers or invoice numbers). Don’t use sequences either, as they’ll behave similarly.
The performance benchmarks on CDH 6.0 show that enabling Parquet vectorization significantly improves performance for a typical ETL workload. In the test workload (TPC-DS), enabling parquet vectorization gave 26.5% performance improvement on average (geomean value of runtime for all the queries). Vectorization achieves these performance improvements by reducing the number of virtual function calls and leveraging the SIMD instructions on modern processors. A query is vectorized in Hive when certain conditions like supported column data-types and expressions are satisfied. However, if the query cannot be vectorized its execution falls back to a non-vectorized execution. Overall, for workloads which use the Parquet file format on most modern processors, enabling Parquet vectorization can lead to better query performance in CDH 6.0 and beyond.
This is worth looking into, especially if you are on the Cloudera stack.
Confluent this month became the latest commercial open source software company to restrict the use of its software in the cloud. The move prevents cloud companies from using parts of the Confluent Platform, such as the KSQL component that uses SQL to process streaming data, as standalone software as a service (SaaS) offerings.
Jay Kreps, the co-creator of Apache Kafka and the CEO of Confluent, explained the significance of switching the Confluent Platform from the Apache 2.0 license to the new Confluent Community License.
Over at Aiven, CTO Heikki Nousiainen shares his thoughts:
The new Confluent Community License is a proprietary software license, specifically excluding “making available any software-as-a-service, platform-as-a-service, infrastructure-as-a-service or other similar online service that competes with Confluent products or services that provide the Software.”
While the license change does apply to all future versions of the specific software, it doesn’t alter the licensing status of the components in the versions that have been released and utilized by Aiven.
I believe it would be best to read the latter article looking for the significant silences.
I’ve been looking for an easy way to get to learning predictive analysis and forecasting. Prophet provides that path. Prophet is released by Facebook’s Core Data Science Team.
“Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.”
Just to dip my toes into the waters, I tried Prophet’s Quick Start Guide in R.
Let’s forecast the Field Goal Percentage (FG%) of Kyle Kuzma of the Los Angeles Lakers for the next 6 Months.
It’d be critical and important if it were hockey data. Or football data or baseball data or maybe even cricket data (but I don’t understand cricket data and why is that guy still running didn’t he get thrown out or something I don’t get it?).
As far as Prophet goes, it’s a useful library and works well if you’re looking at seasonal time series data.
This means that I could have multiple email accounts that I have to use in order to sign into the portal. Using a password manager such as 1Password, not usually a big deal and more of an annoyance rather than a headache.
Within the past month or so, Microsoft has updated the portal to allow me to easily switch accounts. Previously you had to log out of the portal and then log back in.
This is quite convenient. Prior to this change, switching to a different account could goof with other sites I had open (like if I was sending an Outlook e-mail through one account, switching the Azure Portal signed-in account would log me out from Outlook). It’s still not a perfect experience but it’s a lot better.
There are so many tools within SQL Server Management Studio (SSMS) that can make your job as a DBA or Developer easier that you may or may not be using. One of the tools available is the customization of the Scroll Bar. You can change the display and the behavior of the scroll bars, which can make working with code a lot easier and more efficient, especially when working with long code. The two options we will discuss are Scroll Bar Display and Behavior.
I didn’t like this a lot at first, but as I used it a few times, it grew on me.
When I had the first idea of this blog series, it was just a vision – and now – at the end of this series I hardly cannot believe that I kept up and wrote those 24 posts.
… more work than expected,
… more feedback than expected and
… it was more fun than expected!
Click through for 23 links grouped by topic.
This is NOT a production ready solution, in fact I would not even recommend that you try it.
I definitely wouldn’t recommend it on any machine with anything useful on it that you want to use again.
We will be using a re-compiled dockerd.exe created by someone else and you know the rules about downloading things from the internet don’t you? and trusting unknown unverified people?
Maybe you can try this in an Azure VM or somewhere else safe.
Anyway, with that in mind, lets go.
That’s the kind of intro that makes me want to try it out.