Press "Enter" to skip to content

Author: Kevin Feasel

Road Construction Incentive Contracts And R

Sebastian Kranz promotes an interesting RTutor project:

Patrick Bajari and Gregory Lewis have collected a detailed sample of 466 road construction projects in Minnesota to study this question in their very interesting article Moral Hazard, Incentive Contracts and Risk: Evidence from Procurement in the Review of Economic Studies, 2014.
They estimate a structural econometric model and find that changes in contract design could substantially reduce the duration of road blockages and largely increase total welfare at only minor increases in the risk that road construction firms face.
As part of his Master Thesis at Ulm University, Claudius Schmid has generated a nice and detailed RTutor problem set that allows you to replicate the findings in an interactive fashion. You learn a lot about the structure and outcomes of the currently used contracts, the theory behind better contract design and how the structural model to assess the quantitative effects can be estimated and simulated. At the same time, you can hone your general data science and R skills.

Click through to a couple of ways to get to this RTutor project and learn a bit about building incentive contracts to modify behavior.  H/T R-Bloggers

Comments closed

Analyzing Customer Churn With Keras And H2O

Shirin Glander has released code pertaining to a forthcoming book chapter:

This is code that accompanies a book chapter on customer churn that I have written for the German dpunkt Verlag. The book is in German and will probably appear in February: https://www.dpunkt.de/buecher/13208/9783864906107-data-science.html.
The code you find below can be used to recreate all figures and analyses from this book chapter. Because the content is exclusively for the book, my descriptions around the code had to be minimal. But I’m sure, you can get the gist, even without the book. 😉

Click through for the code.  This is using the venerable AT&T customer churn data set.

Comments closed

Tips On Running SQL Server In RDS

Matthew McGiffen shares some tips on running SQL Server in Amazon RDS:

Or you can go with Amazon RDS (Relational Database Service).  This is more of a managed service where Amazon looks after some aspects of your database server for you. In return you give up some of the control you would have with your own server or VM. You can still pick the version of SQL Server you want installed, usually down to which cumulative update you want – though note that RDS normally lags behind the latest box version of SQL by 3 months or so. RDS is what’s known as a PaaS offering (Platform as a Service).
So, what do you give up and what do you gain? Here’s a quick summary of a few things I’ve noticed. This is not intended to be comprehensive and please bear in mind that AWS is a fast-moving beast – changes happen regularly.

There are some good tips here, so check them out.

Comments closed

Window Functions And Default Frames

Steve Jones elaborates on the default frame that a window function has:

There is a framing clause that I can use after the ORDER BY in the OVER clause. The default frame is RANGE UNBOUNDED PRECEDING AND CURRENT ROW. At least, this is what appears when you include an ORDER BY clause. Many of us do this, but still get confused with the LAST_VALUE() and FIRST_VALUE functions.
What I really want is a complete set of data, which is either starting from the current row to the end, or  includes all values. If I modify my framing clause, I’ll get what I expect.

Click through for a demonstration.

Comments closed

Finding The Closest Numeric Match

Itzik Ben-Gan has a T-SQL puzzle for us:

As you can see, both T1 and T2 have a numeric column (INT type in this example) called val. The challenge is to match to each row from T1 the row from T2 where the absolute difference between T2.val and T1.val is the lowest. In case of ties (multiple matching rows in T2), match the top row based on val ascending, keycol ascending order. That is, the row with the lowest value in the val column, and if you still have ties, the row with the lowest keycol value. The tiebreaker is used to guarantee determinism.

Click through for the details as well as several solutions.

Comments closed

Lessons From Being Self-Employed

Eugene Meidinger shares some hard-learned lessons from being self-employed:

I’m naturally an introvert. If you and I have a conversation, it’s like a little taxi meter starts running. I may deeply, deeply enjoy the conversation and find it incredibly exciting, but it still taxes my energy levels. Small talk even more so. Imagine that every time someone chatted about the weather, you had to pay the same price as a Lyft ride to go 4 blocks. That’s how I feel about small talk.
That being said, we are still social creatures, and even introverts need human interaction. Especially so when you need to think through new situations, new problems. One of the things I realized attending PASS Summit is that I need social interaction to thrive. So now I spend a lot more time on Twitter and am part of a peer group of authors. I work down at the library whenever I have the chance.

When I did the work-from-home full-time thing, I sought out user groups to build up some technical skills and, more importantly, to get out of the house and talk to a group of people a couple times a week.  That paid off really well in the long run.

Speaking of paying off in the long run, check out Eugene’s BI newsletter.

Comments closed

Naming Is Hard

Tim Mitchell calls out some bad naming practices:

Spaces in object names. The bane of my existence. Most relational databases will allow you to use spaces in object names, requiring anyone accessing that object to put brackets around the name. My rule of thumb is that if it’s something that I will interact with programmatically, it doesn’t get a space in the name. Spaces in object names tend to break things, so please stop doing this.

My favorite naming convention oddity is the idea that if something has “Number” in the name, it’s never a number.

Comments closed

Take The Data Professional Salary Survey

Brent Ozar has a new edition of the Data Professional Salary Survey:

It’s time for our annual salary survey to find out what data professionals make. You fill out the data, we open source the whole thing, and you can analyze the data to spot trends and do a better job of negotiating your own salary:

Take the Data Professional Salary Survey now.

The anonymous survey closes Sunday, January 6, 2019. The results will be completely open source, and shared with the community for your analysis.

I like this survey so much that I delivered a talk at PASS Summit making heavy use of it.

Comments closed

Donating To The R Foundation

Mark Niemann-Ross explains how you can donate to the R Foundation:

I benefit from the work of the R Foundation. They oversee the language, but also encourage a healthy ecosystem. CRAN happens because of them. Updates to R happen because of them. useR! happens because of them. Every day, you and I are the recipients of some part of their time.
The least we can do is show them some appreciation. If you point your web browser at https://www.r-project.org/foundation/donations.html you’ll find a convenient (and surprisingly inexpensive) place to express your appreciation. As an individual, you can send these kind folks twenty-five euros to tell them you’re in favor of what they do.

But be sure to read the whole thing, especially if you are an American who wants the donation to be tax-deductible.  I believe that earmarking in this case is adding special instructions on SIAA’s PayPal page.

Comments closed

Timing Means Of Groups With R

John Mount shares some performance measures pitting data.table against various dplyr methods for calculating grouped means:

In this reproduction attempt we see:
– The dplyr time being around 0.05 seconds. This is about 5 times slower than claimed.
– The dplyr sum()/n() time is about 0.2 seconds, about 5 times faster than claimed.
– The data.table time being around 0.004 seconds. This is about three times as fast as the dplyr claims, and over ten times as fast as the actual observed dplyr behavior.

Read the whole thing.  If you want to replicate it yourself, check out the RMarkdown file.

Comments closed