Kevin Feasel – Page 1552

The data access and ingestion tests were on a cluster composed of 14 physical machines, each equipped with:

2 x 8 cores @2.60GHz

64GB of RAM

2 x 24 SAS drives

Hadoop cluster was installed from Cloudera Data Hub(CDH) distribution version 5.7.0, this includes:

Hadoop core 2.6.0

Impala 2.5.0

Hive 1.1.0

HBase 1.2.0 (configured JVM heap size for region servers = 30GB)

(not from CDH) Kudu 1.0 (configured memory limit = 30GB)

Apache Impala (incubating) was used as a data ingestion and data access framework in all the conducted tests presented later in this report.

I would have liked to have seen ORC included as a file format for testing. Regardless, I think this article shows that there are several file formats for a reason, and you should choose your file format based on most likely expected use. For example, Avro or Parquet for “write-only” systems or Kudu for larger-scale analytics.

Comments closed

Lightweight Statistics Profiling

Published 2017-02-14 by Kevin Feasel

Arun Sirpal looks at trace flag 7412 in SQL Server 2016 SP1:

According to documentation online (https://msdn.microsoft.com/en-us/library/mt791503.aspx) it states that the new query execution statistics profiling infrastructure dramatically reduces performance overhead of collecting per-operator query execution statistics.

Read on, as Arun might have discovered a bug in it.

Comments closed

HTDELETE Wait Type

Published 2017-02-14 by Kevin Feasel

Joey D’Antoni troubleshoots a query with excessive HTDELETE waits:

Ultimately I think any thought of the readable secondary having a vastly different plan was a red herrings. Statistics are going to be the same on both instances, and if there were a missing statistic on the secondary, SQL Server would create it in TempDB. Anyway, columnstore indexes don’t use statistics in the traditional sense.

Fortunately I was able to catch a query in the process of waiting on HTDELETE, so I no longer had to look for the needle in the haystack, and I could get to tuning the plans. I was able to grab the SELECT part of the query and generate an estimated plan on both the primary and secondary nodes. The plans were virtually the same on both nodes, with just a minor difference in memory grant between them.

Click through for the solution.

Comments closed

Searching For Powershell Functions In Scripts

Published 2017-02-14 by Kevin Feasel

Stuart Moore has a regex which looks for Powershell cmdlets used in a script:

Having had a quick bingle around for a prewritten regex example I didn’t come up with much that fitted the bill. So in the hope that this will help the next person trying to do this here they are:

Assumptions:

A PowerShell function name is of the form Word-Word
A PowerShell function definition is of the form “Function Word-Word”
A Powershell function call can be preceeded by a ‘|’,'(‘, or ‘ ‘
The script is written using a reasonable style, so there is a ‘ ‘ post call

Click through for the script.

Comments closed

Contributing To Open Source

Published 2017-02-14 by Kevin Feasel

Drew Furgiuele explains the process of contributing to an open source project, specifically dbatools:

Step 2: Check out the Github project page what’s in development.

Next, you should visit the project issues page. Here, you’ll find a list of all the features requested, in development and completed on the project. You can also filter the pages to look at current bugs or requested enhancements. Once you see what’s what, if you think of something you want to work on or help with, make a note of it. You should also look at examples of things in development and things that have been completed so you get an idea of the creative and technical process that goes into the project.

Step 3: Speak up!

Head on back to the Slack channel and let everyone know you want to help out. Someone (probably Chrissy) will add your Github account to to the project as a contributor so you can have things assigned to you. Congrats, you’re now on the hook!

I’m happy that the dbatools community has sprung up and hope it’s a gateway to further open source development in the SQL Server community.

Comments closed

Exporting The Plan Cache

Published 2017-02-14 by Kevin Feasel

Grant Fritchey wrote a Powershell script to export query plans from the plan cache into a .sqlplan file:

I’ve gone minimal on the script. I’m creating a connection to the local instance, defining a command, and returning the data into a data set. From there, since the data set consists of a single column, I’m walking through them all to export out to a file:

It’s Powershell, so it’s a short snippet.

Comments closed

MDX Calculation Duration

Published 2017-02-13 by Kevin Feasel

Chris Webb wants to know how long specific MDX calculations take:

In my last two blog posts (see here and here) I showed how to use the Calculation Evaluation and Calculation Evaluation Detailed Information trace events to work out which MDX calculations are evaluated when a query runs in Analysis Services Multidimensional. That’s very useful, but wouldn’t it be great if you could work out how long any single calculation contributes to the overall duration of a query? If you could, it would make performance tuning MDX calculations much easier.

While you can’t get an exact amount of time taken for each calculation, the good news is that it is possible to get a duration rounded to the next second if your calculation is evaluated in bulk mode.

It’s an interesting way of backing into an answer.

Comments closed

Power BI Quick Calc

Published 2017-02-13 by Kevin Feasel

Nicolo Grando talks about a couple of Power BI features, conditional formatting and Quick Calc:

If you select a text column you can:

Show only the first attribute

show only the last attribute

Count the attribute

Distinct count the attribute

If you select a numeric column you can:

Sum of value
find the minimum or maximum value
Average the value of column
standard deviation of value
Count the value
Distinct count of value
Variance fo value
Median of value

The screenshots are in Italian, but it’s pretty easy to get the context behind them.

Comments closed

RStudio Connect

Published 2017-02-13 by Kevin Feasel

Jen Underwood discusses RStudio Connect:

RStudio officially introduced the newest product in RStudio’s product lineup: RStudio Connect. RStudio Connect is a new publishing platform for R that allows analytics users to share Shiny applications, R Markdown reports, dashboards, plots, and more. This release adds an improved user experience for parameterized R Markdown reports, simple button-click publishing from the RStudio IDE, scheduled execution and distribution of reports, and more security policies include hybrid data connections. Essentially RStudio Connect eases enterprise deployment scenarios.

Between what Microsoft is doing with its old Revolution Analytics holdings and what RStudio is doing, this is a great time to be an enterprise R customer.

Comments closed

Temporal Tables

Published 2017-02-13 by Kevin Feasel

Alex Grinberg has a tutorial on temporal tables, including combining temporal tables with In-Memory OLTP:

Although the process of converting an In-Memory Optimized OLTP table to a system-versioned table is similar, there are some differences that we need to cover and demonstrate in this section.

You need to be aware of some specific details when converting the in-memory optimized table to the system-versioned table

Read on for those specifics.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Author: Kevin Feasel

Performance Testing Hadoop File Formats

Lightweight Statistics Profiling

HTDELETE Wait Type

Searching For Powershell Functions In Scripts

Contributing To Open Source

Step 2: Check out the Github project page what’s in development.

Step 3: Speak up!

Exporting The Plan Cache

MDX Calculation Duration

Power BI Quick Calc

RStudio Connect

Temporal Tables