What’s New In Ambari 2.7

Paul Codding and Kat Petre share some of the new features in Ambari 2.7:

With this release, we wanted to make Ambari more enjoyable to use every day, simplify finding and using our API, and unblock teams managing very large clusters.  Here is a preview of a few features we’re excited to share with you:
Revamped UI – Our Hortonworks Product Design Team has created a great new UI design system, and we’ve taken the time to update Ambari to use the new styles, colors, and navigation concepts from it.

I’ll have to rebuild my home Hadoop cluster someday, so I’ll have to check this out when I do.

Deploying An R Service To Azure Kubernetes Service

Hong Ooi shows us how we can use Azure Container Registry and Azure Kubernetes Service to deploy an R model via Plumber:

If you run this code, you should see a lot of output indicating that R is downloading, compiling and installing randomForest, and finally that the image is being pushed to Azure. (You will see this output even if your machine already has the randomForest package installed. This is because the package is being installed to the R session inside the container, which is distinct from the one running the code shown here.)
All docker calls in AzureContainers, like the one to build the image, return the actual docker commandline as the cmdline attribute of the (invisible) returned value. In this case, the commandline is docker build -t bos_rf . Similarly, the push() method actually involves two Docker calls, one to retag the image, and the second to do the actual pushing; the returned value in this case will be a 2-component list with the command lines being docker tag bos_rf deployreg.azurecr.io/bos_rf and docker push deployreg.azurecr.io/bos_rf.

I love this confluence of technologies and at the same time get a “descent into madness” feeling from the sheer number of worlds colliding.

The Power BI Visual Vocabulary

Jason Thomas has put together a great Power BI report:

Note that there are some R/Python visuals and currently, R/Python visuals are not available on “Publish to Web”. Hence, I have just used a checkbox on the top of the report to show the images wherever R visuals are used (can be identified by the colorful border around the image). However, you can download the source file and then publish it to your tenant, and see the actual R visuals there in a browser by unselecting the checkbox. You can also look at the pbix file and see the source code behind the visuals.

Definitely check this out.  Jason did a great job.

UI Versus UX

Rajeev Thakur explains the differences between UI and UX and how they fit together:

Most of the people swap UI with UX and that’s the most common mistake. Let’s understand their difference.
UI mainly focuses on the look of your application. It is the process of improving the interactivity and presentation of your web or mobile app. Being a UI designer you need to have creative and convergent thinking, so you can improve its look and contribute to better user interaction with the application. With the unique visualization of UI designer, we can have every screen page, buttons and other visual elements of the app look intuitive. UI Designer must also have basic knowledge of the tools in order to create a better app UI plus keeping in mind the user’s requirements. Tools that designer basically use are Adobe XD, Adobe Photoshop, Illustrator and sketch.
Whereas UX is all about creating the basic skeleton of any application. It works on wireframing of an application and structuring all its components appropriately to create the user flow. The thought process of a UX Designer must be both a mix of critical and creative thinking. UX design is more of a human-centric design an enhancement of user’s experience is the main goal here. User’s needs and research play a significant role here. Usability testing must also be done frequently after the basic skeleton of the app has been prepared because that helps in cross-checking all the components.

Read the whole thing.

Fill Factor And The Performance Tradeoff

Tara Kizer explains the performance tradeoff when setting fill factor for an index:

There are workloads where frequent page splits are a problem. I thought I had a system like this many years ago, so I tested various fill factor settings for the culprit table’s clustered index. While insert performance improved by lowering the fill factor, read performance drastically got worse. Read performance was deemed much more critical than write performance on this system. I abandoned that change and instead recommended a table design change since it made sense for that particular table.

Click through for a demo.

Check Those SSMS Warnings

Arthur Daniels recommends you review any warning signs in execution plans:

Some things in life we ignore. For example, the “check engine” light. That’s just there as a suggestion, right?
But when you’re performance tuning, you can’t afford to ignore the warning signs. I can’t count the number of times that I’ve found the issue with a query by looking at the warnings.

The example Arthur uses involves implicit conversion, but there are several important warnings SSMS bubbles up.

Finding The Slow Query In A Procedure

Erin Stellato shows us how we can find the slowest query within a stored procedure:

Figuring out exactly what causes slow performance for a stored procedure can sometimes feel like trying to unravel a ball of Clark Griswold’s Christmas lights.  It’s not uncommon to see procedures with hundreds, even thousands of lines of code.  You may have been told which stored procedure runs slow by a user or manager, or you might have found it by looking in SQL Server DMVs.  Either way, once you have detected the offending procedure, where do you start?
If you’re running SQL Server 2016, one option is Query Store.  Query Store captures individual queries, but it also captures the object_id, so you can find all the queries that are associated with an object to determine which ones are problematic.

This is quite useful when you have to tune a procedure you’ve never seen before, and as you go to open that procedure, the vertical scroll bar keeps getting smaller and smaller.

Using Replication With SQL Server In Containers

Andrew Pruski shows us how we can build up snapshot replication with SQL Server in containers:

Last week I saw a thread on twitter about how to get replication setup for SQL Server running in a container. Now I know very little about replication, it’s not an area of SQL that I’ve had a lot of exposure to but I’m always up for figuring stuff out (especially when it comes to SQL in containers).
So let’s run through how to set it up here.
First, create a dockerfile to build an image from the SQL Server 2019 CTP 2.2 image with the SQL Server Agent enabled: –

Now that Andrew is a replication expert…

Road Construction Incentive Contracts And R

Sebastian Kranz promotes an interesting RTutor project:

Patrick Bajari and Gregory Lewis have collected a detailed sample of 466 road construction projects in Minnesota to study this question in their very interesting article Moral Hazard, Incentive Contracts and Risk: Evidence from Procurement in the Review of Economic Studies, 2014.
They estimate a structural econometric model and find that changes in contract design could substantially reduce the duration of road blockages and largely increase total welfare at only minor increases in the risk that road construction firms face.
As part of his Master Thesis at Ulm University, Claudius Schmid has generated a nice and detailed RTutor problem set that allows you to replicate the findings in an interactive fashion. You learn a lot about the structure and outcomes of the currently used contracts, the theory behind better contract design and how the structural model to assess the quantitative effects can be estimated and simulated. At the same time, you can hone your general data science and R skills.

Click through to a couple of ways to get to this RTutor project and learn a bit about building incentive contracts to modify behavior.  H/T R-Bloggers

Analyzing Customer Churn With Keras And H2O

Shirin Glander has released code pertaining to a forthcoming book chapter:

This is code that accompanies a book chapter on customer churn that I have written for the German dpunkt Verlag. The book is in German and will probably appear in February: https://www.dpunkt.de/buecher/13208/9783864906107-data-science.html.
The code you find below can be used to recreate all figures and analyses from this book chapter. Because the content is exclusively for the book, my descriptions around the code had to be minimal. But I’m sure, you can get the gist, even without the book. 😉

Click through for the code.  This is using the venerable AT&T customer churn data set.


December 2018
« Nov