Patrick Bajari and Gregory Lewis have collected a detailed sample of 466 road construction projects in Minnesota to study this question in their very interesting article Moral Hazard, Incentive Contracts and Risk: Evidence from Procurement in the Review of Economic Studies, 2014.
They estimate a structural econometric model and find that changes in contract design could substantially reduce the duration of road blockages and largely increase total welfare at only minor increases in the risk that road construction firms face.
As part of his Master Thesis at Ulm University, Claudius Schmid has generated a nice and detailed RTutor problem set that allows you to replicate the findings in an interactive fashion. You learn a lot about the structure and outcomes of the currently used contracts, the theory behind better contract design and how the structural model to assess the quantitative effects can be estimated and simulated. At the same time, you can hone your general data science and R skills.
Click through to a couple of ways to get to this RTutor project and learn a bit about building incentive contracts to modify behavior. H/T R-Bloggers
This is code that accompanies a book chapter on customer churn that I have written for the German dpunkt Verlag. The book is in German and will probably appear in February: https://www.dpunkt.de/buecher/13208/9783864906107-data-science.html.
The code you find below can be used to recreate all figures and analyses from this book chapter. Because the content is exclusively for the book, my descriptions around the code had to be minimal. But I’m sure, you can get the gist, even without the book. 😉
Click through for the code. This is using the venerable AT&T customer churn data set.
Or you can go with Amazon RDS (Relational Database Service). This is more of a managed service where Amazon looks after some aspects of your database server for you. In return you give up some of the control you would have with your own server or VM. You can still pick the version of SQL Server you want installed, usually down to which cumulative update you want – though note that RDS normally lags behind the latest box version of SQL by 3 months or so. RDS is what’s known as a PaaS offering (Platform as a Service).
So, what do you give up and what do you gain? Here’s a quick summary of a few things I’ve noticed. This is not intended to be comprehensive and please bear in mind that AWS is a fast-moving beast – changes happen regularly.
There are some good tips here, so check them out.
There is a framing clause that I can use after the ORDER BY in the OVER clause. The default frame is RANGE UNBOUNDED PRECEDING AND CURRENT ROW. At least, this is what appears when you include an ORDER BY clause. Many of us do this, but still get confused with the LAST_VALUE() and FIRST_VALUE functions.
What I really want is a complete set of data, which is either starting from the current row to the end, or includes all values. If I modify my framing clause, I’ll get what I expect.
Click through for a demonstration.
As you can see, both T1 and T2 have a numeric column (INT type in this example) called val. The challenge is to match to each row from T1 the row from T2 where the absolute difference between T2.val and T1.val is the lowest. In case of ties (multiple matching rows in T2), match the top row based on val ascending, keycol ascending order. That is, the row with the lowest value in the val column, and if you still have ties, the row with the lowest keycol value. The tiebreaker is used to guarantee determinism.
Click through for the details as well as several solutions.
I’m naturally an introvert. If you and I have a conversation, it’s like a little taxi meter starts running. I may deeply, deeply enjoy the conversation and find it incredibly exciting, but it still taxes my energy levels. Small talk even more so. Imagine that every time someone chatted about the weather, you had to pay the same price as a Lyft ride to go 4 blocks. That’s how I feel about small talk.
That being said, we are still social creatures, and even introverts need human interaction. Especially so when you need to think through new situations, new problems. One of the things I realized attending PASS Summit is that I need social interaction to thrive. So now I spend a lot more time on Twitter and am part of a peer group of authors. I work down at the library whenever I have the chance.
When I did the work-from-home full-time thing, I sought out user groups to build up some technical skills and, more importantly, to get out of the house and talk to a group of people a couple times a week. That paid off really well in the long run.
Speaking of paying off in the long run, check out Eugene’s BI newsletter.
Spaces in object names. The bane of my existence. Most relational databases will allow you to use spaces in object names, requiring anyone accessing that object to put brackets around the name. My rule of thumb is that if it’s something that I will interact with programmatically, it doesn’t get a space in the name. Spaces in object names tend to break things, so please stop doing this.
My favorite naming convention oddity is the idea that if something has “Number” in the name, it’s never a number.
It’s time for our annual salary survey to find out what data professionals make. You fill out the data, we open source the whole thing, and you can analyze the data to spot trends and do a better job of negotiating your own salary:
The anonymous survey closes Sunday, January 6, 2019. The results will be completely open source, and shared with the community for your analysis.
I like this survey so much that I delivered a talk at PASS Summit making heavy use of it.