If you’re looking a guide to making publication-ready data visualizations in R, check out the BBC Visual and Data Journalism cookbook for R graphics. Announced in a BBC blog post this week, it provides scripts for making line charts, bar charts, and other visualizations like those below used in the BBC’s data journalism.
I’m still reading through the linked cookbook but it’s a good one.
This is a query I would run when I needed to quickly make bulk changes to Reporting Services subscriptions. It’s part of an “emergency fix” toolkit.
Maybe a DB has went down and I have to quickly suspend specific subscriptions or locate Agent jobs for subscriptions. This was always a quick starting point.
I could take the generated Start, Enable and Disable commands and record these in tickets or email threads to demonstrate actions taken. There are other ways to make bulk changes to SSRS subscriptions involving custom queries but this can be run immediately, I don’t have to tailor a WHERE clause first. I also wrote previously on managing subscription failures.
Click through for the script.
There are far more options when using Faker. Looking at the official documentation you’ll see the list of different data types you can generate as well as options such as region specific data.
Go have fun trying this, it’s a small setup for a large amount of time saved.
These types of tools can be great for generating a bunch of data but come with a couple of risks. One is that in the fake addresses Rich shows, ZIP codes don’t match their states at all, so if your application needs valid combos, it can cause issues. The other problem comes from distributions: generated data often gets created off of a uniform distribution, so you might not find skewness-related problems (e.g., parameter sniffing issues) strictly in your test data.
That said, easily generating test data is powerful and I don’t want to let the good be the enemy of the great.
What does this mean? To have a supported WSFC-based configuration (doesn’t matter what you are running on it – could be something non-SQL Server), you need to pass validation. xFailOverCluster does not allow this to be run. You can create the WSFC, you just can’t validate it. The point from a support view is that the WSFC has to be vetted before you create it. Could you run it after? Sure, but you still have no proof you had a valid configuration to start with which is what matters. This is a crucial step for all AGs and FCIs, especially since AGs do not check this whereas the installation process for FCIs does.
If you look at MSFT_xCluster, you’ll see what I am saying is true. It builds the WSFC without a whiff of Test-Cluster. To be fair, this can be done in non-Azure environments, too, but Microsoft givs you warnings not to do that for good reason. I understand why Microsoft did it this way. There is currently no tool, parser, or cmdlet to examine the output of Test-Cluster results. This goes back to why building WSFCs is *very* hard to automate.
I’m not sure how easy some of these fixes would be, but they’d definitely be nice.
Microsoft has decent instructions on installing SQLCMD on
linux. My current company blocks access to external yum repositories, so I followed the “Offline” installation. There’s two packages to install:
msodbcsql– aka Microsoft’s ODBC driver for SQL Server
mssql-tools – SQLCMD
Read on for the step-by-step instructions.
And there you have it, exact timings for each of the Power Query M queries associated with each of the tables in your dataset. Remember that the time taken by each Power Query M query will include the time taken by any other queries that it references, and it does not seem to be possible to find out the amount of time taken by any individual referenced query in Profiler.
There is a lot more interesting information that can be found in this way: for example, dataset refresh performance is not just related to the performance of the Power Query M queries that are used to load data; time is also needed to build all of the structures inside the dataset by the Vertipaq engine once the data has been returned, and Profiler gives you a lot of information on these operations too.
Check it out if you do any work with Power BI.
No matter how long you work with something, you can always find something that you never knew before. I found one about foreign keys this week.
I was reviewing SQL scripts for coworkers and I noticed that the foreign keys were written without referencing the parent table’s column. But the script didn’t fail and created the foreign keys correctly. So how did this work?
I don’t think I’ve ever seen this syntax either. I’m not a big fan of it for the same reason that Deborah isn’t a big fan of it: adding a couple more words does clarify your intent, and so add the words.
For better or worse I spend some time each day at Stack Overflow [r], reading and answering questions. If you do the same, you probably notice certain features in questions that recur frequently. It’s as though everyone is copying from one source – perhaps the one at the top of the search results. And it seems highest-ranked is not always best.
Nowhere is this more apparent to me than in the way many users create data frames. So here is my introductory guide “how not to create data frames”, aimed at beginners writing their first questions.
Read on for a few tips. These are aimed at people asking questions but they’re sound advice in general.
This bootcamp is a free online course for everyone who wants to learn hands-on machine learning and AI techniques, from basic algorithms to deep learning, computer vision and NLP. However, the course language is German only, but for every chapter I did, you will find an English R-version here on my blog (see below for links).
Right now, the course is in beta phase, so we are happy about everyone who tests our content and leaves feedback. Also, not the entire curriculum is finished yet, we will update and extend the course during the next months. If there are specific topics you’d like to have us cover, just let us know!
If you understand German and want to learn about data science, check this out and leave feedback.
Another thing to keep in mind here is that you’re only going to load your calendar table once, so if it takes two minutes to do, who really cares? The version I have should run reasonably fast–I calculated 726 years on slow hardware in 19 seconds and fast hardware in 11 seconds. I’m sure you can play code golf and get it done faster, but that’s probably not a good use of your time
. Whatyou want to sweat instead is query time: how long is it taking to access this data?
Click through for a script.