Press "Enter" to skip to content

Author: Kevin Feasel

Ignoring LoadGeneratorLocationError

Melissa Connors shows how to ignore LoadGeneratorLocationError errors in Visual Studio load tests:

I use Visual Studio for performance testing and overhead analysis with the SentryOne products. Currently, I have Microsoft Visual Studio Enterprise 2015 Version 14.0.25431.01 Update 3 installed. Since the first edition of 2015 (possibly even Visual Studio 2013), I’ve received a LoadGeneratorLocationError during each Load Test execution.

Since I am running the test locally, this error is noise. Furthermore, no one wants to see an error in an otherwise successful test. It simply ruins the final results report. In addition, when the Load Test was created, “On-premise Load Test” was selected, which makes this frustrating. Possibly more frustrating is that it’s called “On-premise” when you get started in the New Load Test Wizard.

Read on for the answer.

Comments closed

Unnecessary, Mandatory Work

Lukas Eder lays out one of the biggest performance drains today:

We’re using 8x as much memory in the database when doing SELECT * rather than SELECT film, rating. That’s not really surprising though, is it? We knew that. Yet we accepted it in many many of our queries where we simply didn’t need all that data. We generated needless, mandatory work for the database, and it does sum up. We’re using 8x too much memory (the number will differ, of course).

Now, all the other steps (disk I/O, wire transfer, client memory consumption) are also affected in the same way, but I’m skipping those.

This article is absolutely worth reading and sharing with developers.

Comments closed

Finding Physical Row Location

Wayne Sheffield shows how to find the physical location of a row in SQL Server:

Acquiring the physical location of a row

SQL Server 2008 introduced a new virtual system column: “%%physloc%%”. “%%physloc%%” returns the file_id, page_id and slot_id information for the current row, in a binary format. Thankfully, SQL Server also includes a couple of functions to split this binary data into a more useful format. Unfortunately, Microsoft has not documented either the column or the functions.

Read on for two functions you can use to format this data more nicely, as well as a short re-write Wayne did to improve performance of one of them.

Comments closed

replyr

John Mount shows off replyr, which is dplyr for remote, distributed data sets (think SparkR or sparklyr):

Suppose we had a large data set hosted on a Spark cluster that we wished to work with using dplyr and sparklyr (for this article we will simulate such using data loaded into Spark from the nycflights13 package).

We will work a trivial example: taking a quick peek at your data. The analyst should always be able to and willing to look at the data.

It is easy to look at the top of the data, or any specific set of rows of the data.

Read on for more details.

Comments closed

R 3.3.3 Released

David Smith alerts us to R 3.3.3:

The R core group announced today the release of R 3.3.3 (code-name: “Another Canoe”). As the wrap-up release of the R 3.3 series, this update mainly contains minor bug-fixes. (Bigger changes are planned for R 3.4.0, expected in mid-April.) Binaries for the Windows version are already up on the CRAN master site, and binaries for all platforms will appear on your local CRAN mirror within the next couple of days.

For now, I’m holding out until R 3.4.0.

Comments closed

WebHCat

Jiang Mouren has a two-parter on WebHCat.  First, how it works:

SSH shell/Oozie hive action directly interact with YARN for HIVE execution where as Program using HdInsight Jobs SDK/ADF (Azure Data Factory) uses WebHCat REST interface to submit the jobs.

WebHCat is a REST interface for remote jobs (Hive, Pig, Scoop, MapReduce) execution. WebHCat translates the job submission requests into YARN applications and reports the status based on the YARN application status. WebHCat results are coming from YARN and troubleshooting some of them needs to go to YARN.

Then, how to debug issues:

2.1.2. WebHCat times out

HDInsight Gateway times out responses which take longer than 2Minutes resulting in “502 BadGateway”. WebHCat queries YARN services for job status and if they take longer than the request might timeout.

When this happens collect the following logs for further investigation:

/var/log/webchat. Typical contents of directory will be like

  • webhcat.log is the log4j log to which server writes logs
  • webhcat-console.log is stdout of server is started.
  • webhcat-console-error.log is stderr of server process

NOTE: webhcat.log will roll-over daily hence files like webhcat.log.YYYY-MM-DD will also present. For logs to a specific time range make sure that appropriate file is selected.

Because HDInsight doesn’t support WebHDFS, WebHCat is the primary method for cluster access, so it’s good to know.

Comments closed

Collapsable Subqueries

Dmitry Pilugin notes a new query simplification rule in SQL Server vNext:

You may see that in the first plan, there are two clustered index scans of the table SalesOrderDetail, however the subquery is exactly the same “exists (select * from Sales.SalesOrderDetail f where f.SalesOrderID = d.SalesOrderID)” but referenced twice.

In the second case, compiled under next compatibility level, the double reference of the subquery is collapsed and we see only one reference to the SalesOrderDetails table and more efficient plan, despite the query still has two subqueries with SalesOrderDetails.

In the third case, also compiled under vNext level, we see the second branch with the SalesOrderDetail again, but that is because we turned off the rule CollapseIdenticalScalarSubquery with an undocumented hint queryruleoff (which I originally described in my blog post).

I think Dmitry has the expected use case nailed:  ORMs.  But I can see people writing (well, copy-pasting) similar queries, so maybe it’ll be useful in more contexts as well.

Comments closed

Checkpoints And Memory-Optimized Filegroups

Jack Li explains why, even without a memory-optimized table, you can see XTP checkpoints in your database:

What are checkpoint files?

They are data and delta files as documented in Durability for Memory-Optimized Tables. When you use disk based tables, the data is written to data files.  Even though data is stored in memory for memory optimized tables, SQL Server still needs to persists data for disaster recovery.  Data for memory optimized tables is stored in what we call checkpoint files.  Data file contains rows from insert and update operations. Delta file contains deleted rows.  Over time, these files can be ‘merged’ increase efficiency.  Unneeded files after the merge can be removed eventually (but this can only happen after a log backup).

Click through for a demo script to see this in action.

Comments closed

Power BI Themes

Gogula Aralingam notes that Power BI can now support basic skinning:

The March 2017 update of Power BI Desktop comes with a preview of Themes. Right now it is in its simplest of forms: You manually create a JSON file that has a very few attributes that can set basic color themes to your reports. So all you have to do is create file that looks like this:

Click through for an example.  This isn’t a true fix for the lack of Color Vision Deficiency support, but you can plug in safe colors (for example, this article includes some) and skirt the issue until there’s real support.

Comments closed