This is a quick post to blog a script that allows spinlock statistics to be captured for a defined period of time (as I need to reference it in my next post). Enjoy!
Click through if you don’t already know the correct DMV to use.
Correlated Datetime Columns works. Clearly it’s not something you’re going to enable on all your databases. Probably most of your databases don’t have clustered indexes on datetime columns let alone enough tables with correlation between the data stored in them. However, when you do have that type of data correlation, enabling Correlated Datetime Columns and ensuring you have a clustered index on the datetime column is a viable tuning mechanism. Further, this is a mechanism that has been around since 2005. Just so you know, I did all my testing in SQL Server 2016, so this something that anyone in the right situation can take advantage of. Just remember that TANSTAAFL always applies. Maintaining the statistics needed for the Correlated Datetime Columns is done through materialized views that are automatically created through the optimization process. You can see the views in SSMS and any queries against the objects. You’ll need to take this into account during your statistics maintenance. However, if Correlated Datetime Columns is something you need, this is really going to help with this, fairly narrow, aspect of query tuning.
I don’t know that I’ll ever do this, but it’s worth filing away just in case.
2016 and beyond – this is an interesting timing for “Big Data”. Cloudera’s valuation has dropped by 38%. Hortonwork’s valuation has dropped by almost 40%, forcing them to cut the professional services department. Pivotal has abandoned its Hadoop distribution, going to market jointly with Hortonworks. What happened and why? I think the main driver of this decline is enterprise customers that started adoption of technology in 2014-2015. After a couple of years playing around with “Big Data” they has finally understood that Hadoop is only an instrument for solving specific problems, it is not a turnkey solution to take over your competitors by leveraging the holy power of “Big Data”. Moreover, you don’t need Hadoop if you don’t really have a problem of huge data volumes in your enterprise, so hundreds of enterprises were hugely disappointed by their useless 2 to 10TB Hadoop clusters – Hadoop technology just doesn’t shine at this scale. All of this has caused a big wave of priorities re-evaluation by enterprises, shrinking their investments into “Big Data” and focusing on solving specific business problems.
There are some good points around product saturation and a general skills shortage, but even if you look at it pessimistically, this is a product with 30% market penetration, and which is currently making the move from being a large batch data processing product to a streaming + batch processing product.
In the previous blog, we looked at on converting the CSV format into Parquet format using Hive. It was a matter of creating a regular table, map it to the CSV data and finally move the data from the regular table to the Parquet table using the Insert Overwrite syntax. In this blog we will look at how to do the same thing with Spark using the dataframes feature.
Most of the code is basic setup; writing to Parquet is really a one-liner.
According to a posting on the Hortonworks site, both the compression and the performance for ORC files are vastly superior to both plain text Hive tables and RCfile tables. For compression, ORC files are listed as 78% smaller than plain text files. And for performance, ORC files support predicate pushdown and improved indexing that can result in a 44x (4,400%) improvement. Needless to say, for Hive, ORC files will gain in popularity. (you can read the posting here: ORC File in HDP 2: Better Compression, Better Performance).
There are several considerations around picking the correct file format, and it’s probably best to experiment with them in your specific environment.
Or if you’re ready to take it to the ╰[ ⁰﹏⁰ ]╯level then change the column to NChar and use the hex of your favorite emoji like:
DECLARE @testtable1 TABLE ( testid int identity (1,1), testvalue nchar (255))
–use the below for the final query
SELECT ISNULL((SELECT testvalue from @testtable1 where testvalue > 101), NCHAR(0xD83D)+ NCHAR(0xDE20) ) as testvaluethatworks
There are a few other alternatives, such as loading results into a temp table and inserting a default row if the temp table is empty.
There it is! The ninja cat database! You can see that even IntelliSense shows the ninja cat. Cool, right? How does it show in Object Explorer?
DOH! There’s obviously something strange going on here. Let’s validate the sys.databases table:
If full emoji support is the thing keeping you from moving to SQL Server, you might have to wait until the next version.
SQL Server first introduced Columnar Storage with the SQL 2012 Enterprise release. In this release, Columnstores were read-only indexes, so it required to drop the index, load the table or partition and then rebuild the index to refresh it with the latest data.
SQL Server 2014 upgraded Columnstores with full read-write capabilities, allowing the Columnstore to become the ‘clustered’ index for the table and hold all the data instead of just being one more index on top of row-organized data. 2014 also introduced many improvements to batch operations so more pieces of an execution plan could take advantage of this faster processing mode.
Read on to see changes in 2016.
Now that we know which statistics we can gather, we need to actually start gathering them. While the DMVs for Resource Governor are great, they will only give you an aggregate of the usage information since the last time the statistics were reset, or the last time services were restarted.
In most cases it makes sense to store your data in a separate table so you can calculate differentials between two time periods. For our example we are only going to be interested in request counts and CPU usage. For this, we will create the following table
Read the whole thing.
There have been a couple of fixes in SQLCover this week, kindly submitted by John Mclusky (https://github.com/jmclusky):
Go check out SQLCover.