MRAppMaster Errors Running MapReduce Jobs

I have a post looking at potential causes when PolyBase MapReduce jobs are unable to find the MRAppMaster class:

Let me tell you about one of my least favorite things I like to see in PolyBase:

Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

This error is not limited to PolyBase but is instead an issue when trying to run MapReduce jobs in Hadoop. There are several potential causes, so let’s cover each of them as they relate to PolyBase and hopefully one of these solves your issue.

Click through for four potential solutions to what ails you.

Sizing Azure SQL Database

Arun Sirpal takes us through finding the right size for Azure SQL Database:

Do you want to identify the correct Service Tier and Compute Size ( was once known as performance level) for your Azure SQL Database? How would you go about it? Would you use the DTU (Database Transaction Unit) calculator? What about the new pricing model vCore? How would you translate you current on-premises workload to the cloud?

It can be a form of trial and error especially if you are new to this but I really do recommend trying out the PowerShell script that you can access once you have installed  DMA – Database Migration Assistant.

Read on to see how to run this tool and potentially save some money.

Pivoting Performance Counter Data

Dave Bland shows how you can build a dynamic pivot to see performance counter data over a stretch of time:

The next step is to write the code to capture the counter values and insert the data it the temporary table created above.  Because we need to capture the values over a period of time, the WAITFOR DELAY is used.  In this case the delay is 10 seconds, although you can change this to suit your needs. Of course, don’t forget to increment the counter variable. You will need to determine what counters you would like to capture.  Notice in the WHERE clause, we are looking for an instance_name of ” or ‘_total’.  This will allow the code to only capture one row for each counter.  The number 10 is the number of times we want to capture the counter values.  If you want to capture the data more frequently, simpley modify the number of seconds in the WAITFOR DELAY.  Here is link to my post in this topic, WAITFOR.

Dynamic pivoting in SQL is unnecessarily difficult, especially compared to languages like R.

Cleaning Up After Yourself in Azure Data Factory

Rayis Imayev shows how you can automatically delete old files in Azure Data Factory:

File management may not be at the top of my list of priorities during data integration projects. I assume that once I learn enough about sourcing data systems and target destination platform, I’m ready to design and build a data integration solution between two or more connecting points. Then, a historical file management process becomes a necessity or a need to log and remove some of the incorrectly loaded data files. Basically, a step in my data integration process to remove (or clean) such files would be helpful. 

Click through to see how to do this.

Reading From a Secondary with Azure Data Studio

Mohammad Darab shows how you can access a readable secondary using Azure Data Studio:

Recently, I had to use Azure Data Studio to access a application intent read only secondary replica. I had to use Azure Data Studio because I was using a Mac. I usually use SSMS on my Windows machines. If you want to connect with the “applicationintent=readonly” property via SQL Server Management Studio, you do so by typing it out in the “Additional Connection Parameters” as shown in the screenshot below:

Since I am fairly new to Azure Data Studio I was fumbling my way around to find the equivalent setting. And I finally found it…

Read on to see how you can set this in Azure Data Studio.

Building a DMV Diagnostic Queries Notebook

Gianluca Sartori shows how you can use dbatools and Powershell to build a Jupyter notebook in Azure Data Studio for Glenn Berry’s DMV scripts:

For presentations, it is fairly obvious what the use case is: you can prepare notebooks to show in your presentations, with code and results combined in a convenient way. It helps when you have to establish a workflow in your demos that the attendees can repeat at home when they download the demos for your presentation.

For troubleshooting scenarios, the interesting feature is the ability to include results inside a Notebook file, so that you can create an empty Notebook, send it to your client and make them run the queries and send it back to you with the results populated. For this particular usage scenario, the first thing that came to my mind is running the diagnostic queries by Glenn Berry in a Notebook.

Obviously, I don’t want to create such a Notebook manually by adding all the code cells one by one. Fortunately, PowerShell is my friend and can do the heavy lifting for me.

This type of scenario is one of the best ones I see for database administrators: consistent, documented troubleshooting guides. Oh, and you can save results off if you need to review them later. This has the potential to be a killer feature for Azure Data Studio.

String or Binary Data and Associated Bug

Brent Ozar looks at the “String or binary data would be truncated” improvement:

Don’t leave this trace flag enabled.
There’s at least one bug with it as of today on SQL Server 2017 CU13: table variables will throw errors saying their contents are being truncated even when no data is going into them. Andrew Pruski reports:

Special shout out to three of my co-workers on finding that issue. I had nothing to do with it but will take credit nonetheless.

Categories

March 2019
MTWTFSS
« Feb Apr »
 123
45678910
11121314151617
18192021222324
25262728293031