Press "Enter" to skip to content

Category: Learning

2022 Data Professional Salary Survey

Brent Ozar wraps up another year of surveying:

Every year, I run a salary survey for folks in the database industry. This year, I was especially curious to see the results to find out whether salaries went up. Anecdotally, I’ve seen a lot of people jumping ship to new companies due to the Great Resignation – but what does the data actually show? Let’s find out.

Click through to grab a copy of the survey and get analyzing.

Comments closed

An Introduction to BugLab

Miltos Allamanis and Marc Brockschmidt take us through a new paper:

Finding and fixing bugs in code is a time-consuming, and often frustrating, part of everyday work for software developers. Can deep learning address this problem and help developers deliver better software, faster? In a new paper, Self-Supervised Bug Detection and Repair, presented at the 2021 Conference on Neural Information Processing Systems (NeurIPS 2021), we show a promising deep learning model, which we call BugLab. BugLab can be taught to detect and fix bugs, without using labelled data, through a “hide and seek” game.

I think there’s a lot more research required before we get to the point where this is useful in practical circumstances, but it’s exciting to see.

Comments closed

The Data Professional Salary Survey

Brent Ozar has re-opened the data professional salary survey:

We’re data people, you and I. We make better decisions when we work off data instead of feelings.

It’s time for our annual salary survey to find out what data professionals make. You fill out the data, we open source the whole thing, and you can analyze the data to spot trends and do a better job of negotiating your own salary:

Click through for the link to the survey. It looks like most of the questions have stayed the same this year, which is good for longer-term analysis.

Comments closed

Decision-Making with Bayes’s Theorem

Bill Schmarzo lays out a framework to classify decision-making:

In my blog “Making Informed Decisions in Imperfect Situations”, I discussed the importance of properly and objectively framing the decision that we seek to make and how that impacts the data that we gather (and ignore) in an effort to make an informed decision. That is:

Are you trying to gather data to determine the right decisions or are you gathering data to support the decision that you have already made? 

In that blog, I introduced two tools that can help us make informed decisions using the best available data, even when that data might be incomplete, conflicting, and/or distorted by others. 

Read the whole thing.

Comments closed

API Servers and the Importance of Learning

Steve Jones tells a story:

While talking with a client recently about their performance challenges, I was relieved to find that the database wasn’t the problem. Instead, their API server was overloaded by the number of calls taking place in their application. While the database did provide the backing for the API calls, there was a fair amount of caching. However, as they’d moved to microservices, more and more of the interaction between modules was taking place as a network call to a single server, which became overloaded.

Steve goes on to the broader point of people freely donating their time and expertise to explain how to solve problems. And the above is a major problem of moving to microservices: everything gets several times chattier. The biggest tricks I have there are to embrace asynchronous processing via queues and ensure that messages passed back and forth are as small as possible, which means getting rid of the idea of passing big lists of fully-hyrdated objects around.

Comments closed

SQL Saturday Orlando Notes

Andy Warren reflects on hosting the only in-person SQL Saturday in the United States this year:

We held an in-person SQLSaturday here in Orlando last weekend (Oct 30th). We didn’t organize one last year, there was just too much risk and too much uncertainty, so it felt good to return to something close to normal this year, even in scaled back fashion. I’ve got a lot of notes to share about how we ran the event this year!

The journey started at the end of 2020. We wrote up our plan for 2021 knowing there were a lot of unknowns, but hoping things would improve enough to resume doing the things we used to do as a local group and that included organizing a SQLSaturday. As this year has progressed attendance at our virtual meetings dropped, as did our enthusiasm for having them. Enthusiasm matters a lot when it comes to volunteer work and while I know many of you like the virtual format, it’s just not what I want to do. That narrowed the option list to having an in-person SQLSaturday or not doing one at all, not a great range of choices.

Read on for a lot of details. I appreciate how transparent Andy has always been with respect to running events like this and if you’re thinking about a SQL Saturday in 2022, definitely read Andy’s post.

Also, the event was small, but it was really nice to get to see people I hadn’t seen in years, so thank you, Andy, for putting on the show.

Comments closed

Eliminate the DeWitt Clause

Justin Olsson and Reynold Xin throw down the gauntlet:

At Databricks, we often use the phrase “the future is open” to refer to technology; it reflects our belief that open data architecture will win out and subsume proprietary ones (we just set a new official record on TPC-DS). But “open” isn’t just about code. It’s about how we as an industry operate and foster debate. Today, many companies in tech have tried to control the narrative on their products’ performance through a legal maneuver called the DeWitt Clause, which prevents comparative benchmarking. We think this practice is bad for customers and bad for innovation, and it’s time for it to go. That’s why we are removing the DeWitt Clause from our service terms, and calling upon the rest of the industry to follow.

One example of how you can tell if you’re influential is how many legal terms are named after you, which I’m pretty sure makes Dr. DeWitt the Steve Tasker of the database industry. So put David DeWitt in the Data Platform Hall of Fame.

And good of Databricks to eliminate their DeWitt Clause. Vendors put the clause in ostensibly to prevent rigged or invalid comparisons between products, but there’s a much better way to do this: publish the benchmark configuration and allow peer validation. If you put out garbage numbers (including on accident because you didn’t know the right way to do something), people are smart enough to catch that. And if people aren’t willing to publish the process, call for them to do it and if they still don’t, ignore the results. 100 times out of 100, that’s the right way to do it…assuming that you’re looking for the truth and not just trying to hide inferiorities in your product *cough* Oracle *cough*.

1 Comment

Thinking like an Escalation Engineer

Stacy Gray shares stories:

“You new?” asked with an amused grin.

“Yes,” I replied floating 2 inches off the ground with a huge, toothy smile.

“Which team?”

“SQL!”

“Good luck.”

I glanced at the badge.  It was blue.  My opportunity to get some secret, inside wisdom!

“I want to become a blue badge.  Do you have any advice on that?” The elevator doors opened.

“Solve your own cases,” was the reply.

Read on for stories, advice, and more.

Comments closed

A Primer on Kafka Streams

Bill Bejeck has an introduction to Kafka Streams:

Kafka Streams is an abstraction over Apache Kafka® producers and consumers that lets you forget about low-level details and focus on processing your Kafka data. You could of course write your own code to process your data using the vanilla Kafka clients, but the Kafka Streams equivalent will have far fewer lines, because it’s declarative rather than imperative. As a library, Kafka Streams lets you create a standalone application that can be run anywhere that can connect to a Kafka broker, whether that’s a laptop or a hefty cloud server. You just need to provide it with the host and port name of a broker. Combining Kafka Streams with Confluent Cloud grants you even more processing power with very little code investment.

Click through for a description as well as a whole series of embedded videos.

Comments closed