Any DBA who specializes in optimization knows that hardware offers around 15% overall opportunity for improvement. My favorite quote from Cary Millsap, “You can’t hardware your way out of a software problem” is quite fitting, too. A hardware upgrade can offer a quick increase in performance, only to find that the problem seemingly returns after a period of time. As we’ve discussed in previous posts. The natural life of a database is growth- growth in data, growth in processing, growth in users. This growth requires more resources and if the environment is not performing as optimally and efficiently as possible, more resources will always be required.
Someday I will write my “No, the DBA isn’t going anywhere” opus, but today is not that day. Anyhow, this is a good post for anyone worried that automation will kill the DBA.
The basics of the “Learn SQL Server Starter Pack”:
- SQL Server 2016 DE
- You can get the Developer Edition (DE) for…wait for it…FREE!
- Out in the wild you’ll see mostly Standard Edition (SE) or Enterprise Edition (EE). The great thing about DE is that it is identical to EE (it has all the features) in every aspect except that it cannot be licensed on a production machine. It must only be used for TEST or DEV environments. For home lab purposes you can use it as your development environment and have access to all the features to learn on!
- Download it here – SQL Server Downloads
- While you are here get used to reading the release notes and what is new in the version. You don’t need to understand everything in here right away but get used to the jargon and how Microsoft describes their features.
- Windows Server 2016 Evaluation Edition
- Download Windows Server 2016 Evaluation Edition
- You can evaluate the software for 180 days then will need to activate it. Then you can try to register for another eval and try again
- Virtual Box
- Virtual Box is a free, simple, and reliable virtualization tool. You’ll be able to do a lot to get started and build up your virtualization knowledge with this.
- Download the latest version of Virtual Box
- You don’t need to know very much about hypervisors and such – Virtual Box is very easy to learn with good documentation.
Evaluation versions are good for learning because they force you to tear down and rebuild your environment!
Jeff then links to a number of free resources to help out with the learning experience.
Now you may be wondering how these errors are identified and we get advice related to it.
Simple, these are provided by the Scala community. If you visit their official website Scala Clippy where you can find a tab “Contribute”. Under that, we can post our own errors. These errors are parsed first, and when successful we can add our advice which will be reviewed and if accepted it will be added to their database which will, in turn, be beneficial to others.
Take a close look at the screenshots; I missed it at first, but there’s helpful advice above the error message.
As it turns out SQL Server Management Studio (SSMS) can display many types of emoji and this is the key.
Keeping in mind that the whole thing was not written with performance or best practice in mind, I’d like to introduce the world to the very first action adventure game 100% written and played in SQL Server!
The goal here is to have a game which helps teach some basics of development practices. Interesting concept.
I’d summarize the two “competing” curricula as follows:
- Base R first: teach syntax such as
[], loops and conditionals, data types (numeric, character, data frame, matrix), and built-in functions like
tapply. Possibly follow up by introducing dplyr or data.table as alternatives.
- Tidyverse first: Start from scratch with the dplyr package for manipulating a data frame, and introduce others like ggplot2, tidyr and purrr shortly afterwards. Introduce the
%>%operator from magrittr immediately, but skip syntax like
$or leave them for late in the course. Keep a single-minded focus on data frames.
I’ve come to strongly prefer the “tidyverse first” educational approach. This isn’t a trivial decision, and this post is my attempt to summarize my opinions and arguments for this position. Overall, they mirror my opinions about ggplot2: packages like dplyr and tidyr are not “advanced”; they’re suitable as a first introduction to R.
I think this is the better position of the two, particularly for people who already have some experience with languages like SQL.
the write-ahead log
This chapter also helped me understand what’s going on with write-ahead logs better! Write-ahead logs are different from log-structured storage, both kinds of storage engines can use write-ahead logs.
Recently at work the team that maintains Splunk wrote a post called “Splunk is not a write-ahead log”. I thought this was interesting because I had never heard the term “write-ahead log” before!
There are a few different topics in here, all of which are important to understand how databases work.
Why does help exist?
When you think about it, why is there even a function called
As far as I’m aware it’s basically the same as
Get-Helpexcept it automatically pipes the output to
| moreso we get pages rather than a wall of text.
Is there more that we can do with
Get-Helpthough? Is there a way that we can return the examples only? Syntax only? Parameters only?
Is there not a way that we can do such things?!
Read on to find out if there is.
Although most analytics applications today still leverage older data warehouse and OLAP technologies on-premises, the pace of the cloud shift is significantly increasing. Infrastructure is getting better and is almost invisible in mature markets. Cloud fears are subsiding as more organizations witness the triumphs of early adopters. Instant, easy cloud solutions continue to win the hearts and minds of non-technical users. Cloud also accelerates time to market allowing for innovation at faster speeds than ever before. As data and analytics professionals, be sure to make time to learn a variety of cloud and hybrid analytics tools.
Exploring novel technologies across various ecosystems in the cloud world is usually as simple as spinning up a cloud image or service to get started. There are literally zillions of free and low cost resources for learning. As you dive into a new world of data, you will find common analytics architectures, design patterns, and types of technologies (hybrid connectivity, storage, compute, microservices, IoT, streaming, orchestration, database, big data, visualization, artificial intelligence, etc.) being used to solve problems.
It’s worth reading the whole thing.
Raise your standards as high as you can live with, avoid wasting your time on routine problems, and always try to work as closely as possible at the boundary of your abilities. Do this because it is the only way of discovering how that boundary should be moved forward.
Readers of this blog post are just as likely as anyone to fall victim to the classic maxim, “When all you have is a hammer, everything is a nail.” I remember a job interview where my interrogator appeared disinterested in talking further after I wasn’t able to solve a certain optimization using Lagrange multipliers. The mindset isn’t uncommon: “I have my toolbox. It’s worked in the past, so everything else must be irrelevant.”
There’s some good advice in here.
It’s still early days for machine learning. The bounds and guidelines about what is possible or likely are still unknown in a lot of places, and bigger projects that test more of those limitations are more likely to fail. As a fledgling data engineer, especially in the industry, it’s almost certainly the more prudent course to go for the “low-hanging fruit” — easy-to-find optimizations that have real world impact for your organization. This is the way to build trust among skeptical colleagues and also the way to figure out where those boundaries are, both for the field and for yourself.
As a personal example, I was once on a project where we worked with failure data from large machines with many components. The obvious and difficult problem was to use regression analysis to predict the time to failure for a given part. I had some success with this, but nothing that ever made it to production. However, a simple clustering analysis that grouped machines by the frequency of replacement for all parts had some lasting impact; this enabled the organization to “red flag” machines that fell into “high replacement” group where the users may have been misusing the machines and bring these users in for training.
There’s some good advice. Also read the linked Dijkstra note; even in bullet point form, he was a brilliant guy.