Press "Enter" to skip to content

Curated SQL Posts

Diagnosing Database Corruption

Jeff Mlakar has started a series on database corruption.  His first post involves finding corruption:

The cause lies in layers below SQL Server. The most common are hardware faults; in particular, issues with the I/O Subsystem. Any component in the I/O Subsystem can fail and be the cause of database corruption: disks, controllers, CPU, memory, network switch, network cables, SAN, etc.

Database corruption cannot entirely be prevented. It is not a matter of if but rather when

Disks go bad. So do NICs, cables, routers, and everything else physical below the SQL Server Instance. This is why it is important to know that we cannot entirely prevent corruption – only deal with and mitigate it.

Click through for a few ways to find potential corruption.

Comments closed

Power BI Log Files

Kellyn Pot’vin-Gorman shows us how to access Power BI log and trace files:

We’ve now identified the process, the amount of memory allocated to perform a task captured in the log, start time and the duration.  The information in these log files can assist when diagnosing if Power BI desktop crashes, but the data collected is quite rudimentary.

If you shut down Power BI Desktop, the PBIDesktop* log file writes to the startup file, which was once empty and it then empties and saves off the timestamp of the exit of the program.

The Microsoft Mashup file has much of the same information, but includes deeper level processing work by Power BI, such as work done in the Query Editor or when we create a measure or new column/table.

In the three examples from the file below, you can see a compile, a save and then an evaluate task.

There’s some useful information here for debugging.

Comments closed

Plotting ML Results In R

Bernardo Lares shows off the plots he creates in R to compare ML models:

Split and compare quantiles

This parameter is the easiest to sell to the C-level guys. “Did you know that with this model, if we chop the worst 20% of leads we would have avoided 60% of the frauds and only lose 8% of our sales?” That’s what this plot will give you.

The math behind the plot might be a bit foggy for some readers so let me try and explain further: if you sort from the lowest to the highest score all your observations / people / leads, then you can literally, for instance, select the top 5 or bottom 15% or so. What we do now is split all those “ranked” rows into similar-sized-buckets to get the best bucket, the second best one, and so on. Then, if you split all the “Goods” and the “Bads” into two columns, keeping their buckets’ colours, we still have it sorted and separated, right? To conclude, if you’d say that the worst 20% cases (all from the same worst colour and bucket) were to take an action, then how many of each label would that represent on your test set? There you go!

Read on to see what else he uses and how you can build it yourself.

Comments closed

SQL Operations Studio July Edition

Alan Yu announces a new version of SQL Operations Studio:

Highlights for this release include the following.

  • SQL Server Agent preview extension Job configuration support
  • SQL Server Profiler preview extension Improvements
  • Combine Scripts Extension
  • Wizard and Dialog Extensibility
  • Social content
  • Fix GitHub Issues

For complete updates, refer to the Release Notes.

Alan also has demos for each of these.  I still wish that they wouldn’t call their Extended Events viewer “Profiler” because that makes it harder for us to explain the difference between “good Profiler” and “bad Profiler.”

Comments closed

Custom Statistics Block Column Alteration DDL

Max Vernon demonstrates that custom statistics and prevent you for modifying a column:

Interestingly, if SQL Server has auto-created a stats object on a column, and you subsequently modify that column, you receive no such error. SQL Server silently drops the statistics object, and modifies the column. The auto-created stats object is *not* automatically recreated until a query is executed that needs the stats object. This difference in how auto-created stats and manually created stats are treated by the engine can make for some confusion.

Just one more thing to think about if you manually create statistics on tables.  But at least the error message is clear.

Comments closed

Power BI Helper 3.0

Reza Rad has a new version of Power BI Helper out:

It is a pleasure to announce the newest version of Power BI helper, version 3.0 July 2018 with the great feature of exporting model documentation. The documentation part of the insight from Power BI Helper has been always in our backlog, but haven’t had a chance to work on it. Gladly now you can export the document to an HTML file. The exported documentation at the moment, has information about all tables in the model, all measures, all columns and calculated columns in each table with possible expressions and descriptions. If you like to learn more about Power BI Helper, read this page.

Read on to see what Power BI Helper has in terms of documentation.

Comments closed

SQL Server Backup Restoration With LOADHISTORY

Kenneth Fisher explains what the LOADHISTORY option means when you run a RESTORE VERIFYONLY command:

So, first of all, it only works with RESTORE VERIFYONLY. RESTORE VERIFYONLY does some basic checking on a backup to make sure that it can be read and understood by SQL. Please note, it does not mean that the backup can be restored. It will check things like the checksum, available diskspace (if you specify a location), the header and that the backup set is actually complete and readable. Basically enough to see if it will start restoring, but it could still have errors later on.

As for what LOADHISTORY actually does? It causes you to write an entry to the restore history table. You can tell which record this is because the restore_type is set to a V. Really, the only benefit here (as I see it) is that you can do reporting on what backups you’ve verified.

Click through for a demo.

Comments closed

The Decorator Pattern

Nancy Jain explains the Decorator pattern:

Decorator design pattern is a structural design pattern.

Structural design patterns focus on Class and Object composition and decorator design pattern is about adding responsibilities to objects dynamically.

Decorator design pattern gives some additional responsibility to our base class.

This pattern is about creating a decorator class that can wrap original class and can provide additional functionality keeping class methods signature intact.

I don’t use the Decorator pattern as often as I probably should, but it can be quite useful.

Comments closed

When Paging To Disk Became Cool Again

The Netflix Technology Blog walks us through how they do caching on SSDs:

Storing large amounts of data in volatile memory (RAM) is expensive. Modern disk technologies based on SSD are providing fast access to data but at a much lower cost when compared to RAM. Hence, we wanted to move part of the data out of memory without sacrificing availability or performance. The cost to store 1 TB of data on SSD is much lower than storing the same amount in RAM.

We observed during experimentation that RAM random read latencies were rarely higher than 1 microsecond whereas typical SSD random read speeds are between 100–500 microseconds. For EVCache our typical SLA (Service Level Agreement) is around 1 millisecond with a default timeout of 20 milliseconds while serving around 100K RPS. During our testing using the storage optimized EC2 instances (I3.2xlarge) we noticed that we were able to perform over 200K IOPS of 1K byte items thus meeting our throughput goals with latency rarely exceeding 1 millisecond. This meant that by using SSD (NVMe) we were able to meet our SLA and throughput requirements at a significantly lower cost.

NVMe isn’t as fast as RAM, but we are well beyond the days of spinning disk hard drives.

Comments closed

Data Lakes And Data Swamps

Randolph West talks about data lakes:

Internet companies including search engines (Google, Bing), social media companies (Facebook, Twitter), and email providers (Yahoo!, Outlook.com) are managing data stores measured in petabytes. On a daily basis these organizations handle all sorts of structured and unstructured data.

Assuming they put all their data in one repository, that could technically be thought of as a data lake. These organizations have adapted existing tools, and even created new technologies, to manage data of this magnitude in a field called big data.

The short version: big data is not a 100 GB SQL Server database or data warehouse. Big data is a relatively new field that came about because traditional data management tools are simply unable to deal with such large volumes of data. Even so, a single SQL Server database can allegedly be more than 500 petabytes in size, but Michael J. Swart warns usif you’re using over 10% of what SQL Server restricts you to, you’re doing it wrong.

Incidentally, I’ll note that the term data swamp has a storied history here at Curated SQL.

Comments closed