Many of the in-memory features in the Microsoft platform rely on the xVelocity (formerly known as VertiPaq) engine. The implementations do differ somewhat between products, such as the requirements for data to be truly memory-resident.
The remainder of this post will focus on the two columnstore technologies in SQL Server: clustered and nonclustered.
If you’re not very familiar with columnstore and aren’t quite ready to tackle Niko Neugebauer’s columnstore series, this is a good way to get started.
We will use the sqlserver module, so you will need to have installed the latest version of SSMS from https://sqlps.io/dl
This code was run using PowerShell version 5 and will not work on Powershell version 3 or lower as it uses the where method.
Lets grab all of our jobs on the estate. (You will need to fill the $Servers variable with the names of your instances, maybe from a database or CMS or a text file)
One oddity with SQL Agent jobs is that you absolutely need to call the Alter() method at the end or else the changes will not actually take effect.
Apache HBase is an open Source No SQL Hadoop database, a distributed, scalable, big data store. It provides real-time read/write access to large datasets. HDInsight HBase is offered as a managed cluster that is integrated into the Azure environment. HBase provides many features as a big data store. But in order to use HBase, the customers have to first load their data into HBase.
There are multiple ways to get data into HBase such as – using client API’s, Map Reduce job with TableOutputFormat or inputting the data manually on HBase shell. Many customers are interested in using Apache Phoenix – a SQL layer over HBase for its ease of use. The current post describes about how to use phoenix bulk load with HDinsight clusters.
Phoenix provides two methods for loading CSV data into Phoenix tables – a single-threaded client loading tool via the psql command, and a MapReduce-based bulk load tool.
Anunay explains both methods, allowing you to choose based on your data needs.
The data access and ingestion tests were on a cluster composed of 14 physical machines, each equipped with:
- 2 x 8 cores @2.60GHz
- 64GB of RAM
- 2 x 24 SAS drives
Hadoop cluster was installed from Cloudera Data Hub(CDH) distribution version 5.7.0, this includes:
- Hadoop core 2.6.0
- Impala 2.5.0
- Hive 1.1.0
- HBase 1.2.0 (configured JVM heap size for region servers = 30GB)
- (not from CDH) Kudu 1.0 (configured memory limit = 30GB)
Apache Impala (incubating) was used as a data ingestion and data access framework in all the conducted tests presented later in this report.
I would have liked to have seen ORC included as a file format for testing. Regardless, I think this article shows that there are several file formats for a reason, and you should choose your file format based on most likely expected use. For example, Avro or Parquet for “write-only” systems or Kudu for larger-scale analytics.
According to documentation online (https://msdn.microsoft.com/en-us/library/mt791503.aspx) it states that the new query execution statistics profiling infrastructure dramatically reduces performance overhead of collecting per-operator query execution statistics.
Read on, as Arun might have discovered a bug in it.
Ultimately I think any thought of the readable secondary having a vastly different plan was a red herrings. Statistics are going to be the same on both instances, and if there were a missing statistic on the secondary, SQL Server would create it in TempDB. Anyway, columnstore indexes don’t use statistics in the traditional sense.
Fortunately I was able to catch a query in the process of waiting on HTDELETE, so I no longer had to look for the needle in the haystack, and I could get to tuning the plans. I was able to grab the SELECT part of the query and generate an estimated plan on both the primary and secondary nodes. The plans were virtually the same on both nodes, with just a minor difference in memory grant between them.
Click through for the solution.
Having had a quick bingle around for a prewritten regex example I didn’t come up with much that fitted the bill. So in the hope that this will help the next person trying to do this here they are:
A PowerShell function name is of the form Word-Word
A PowerShell function definition is of the form “Function Word-Word”
A Powershell function call can be preceeded by a ‘|’,'(‘, or ‘ ‘
The script is written using a reasonable style, so there is a ‘ ‘ post call
Click through for the script.
Step 2: Check out the Github project page what’s in development.
Next, you should visit the project issues page. Here, you’ll find a list of all the features requested, in development and completed on the project. You can also filter the pages to look at current bugs or requested enhancements. Once you see what’s what, if you think of something you want to work on or help with, make a note of it. You should also look at examples of things in development and things that have been completed so you get an idea of the creative and technical process that goes into the project.
Step 3: Speak up!
Head on back to the Slack channel and let everyone know you want to help out. Someone (probably Chrissy) will add your Github account to to the project as a contributor so you can have things assigned to you. Congrats, you’re now on the hook!
I’m happy that the dbatools community has sprung up and hope it’s a gateway to further open source development in the SQL Server community.
I’ve gone minimal on the script. I’m creating a connection to the local instance, defining a command, and returning the data into a data set. From there, since the data set consists of a single column, I’m walking through them all to export out to a file:
It’s Powershell, so it’s a short snippet.