Press "Enter" to skip to content

Author: Kevin Feasel

Extractors In Scala

Jyoti Sachdeva explains what extractors are in Scala and why they’re useful:

An extractor is an object that has an unapply method. It takes an object as an input and gives back arguments. Custom extractors are created using the unapply method. The unapply method is called extractor because it takes an element of the same set and extracts some of its parts,  apply method also called injection acts as a constructor, takes some arguments and yields an element of a given set.

Click through for explanatory examples.

Comments closed

The Power Of Predicate Pushdown

Pedro Lopes explains how predicate pushdown helps improve performance on queries:

First, let’s define a few terms, so we can see how to detect whether we’re making good use of our indexes, as they relate to the queries running in our SQL Server.

  1. Whenever you submit a query to SQL Server, if it includes a JOIN and/or WHERE clause, that constitutes a row filtering pattern known as a predicate.
  2. The query optimizer can use that to estimate how to best retrieve only the intended rows, after that predicate has been applied, this surfaces in the query plan as the Estimated Number of Rows.
  3. When that estimated plan is executed, and you look at the actual execution plan, this surfaces as the Actual Number of Rows. Usually, a big difference between Estimated and Actual number of rows indicates a misestimation that may need to be addressed to improve performance: maybe you don’t have the right indexes in place?

These are the two properties related to rows you had on every SQL Server plan up to SQL Server 2014.

Read on to learn how predicate pushdown can make queries faster.

Comments closed

Generating Index Drop And Create Statements

Drew Furgiuele says “Game over, man, game over!” to indexes:

The premise is simple: it will generate a series of DROP and then CREATE INDEX commands for every index. The process is a little more complex in practice, but at a high level it:

  1. Creates a special schema to house a temporary object,
  2. Creates a special stored procedure to run the code,
  3. Calls said stored procedure,
  4. Generates a bunch of PRINT statements that serve as the output (along with new line support for readability),
  5. Cleans up the stored procedure it generated,
  6. And finally deletes the schema it created.

Nifty.

Click through for the script, as well as a bonus Powershell script.  Because hey, it’s only six lines of code.

Comments closed

Instance-To-Instance Migrations With Start-DbaMigration

Chrissy LeMaire touts one of the best parts of dbatools:

dbatools is such a fun toolset to work on, but specifically, I can no longer live without Start-DbaMigration. Even in smaller shops, migrations are often required and they are always a lot of work.

At least they used to be, before I built the command that started it all: Start-DbaMigration. Start-DbaMigration is an instance to instance migration command that migrates just about everything. It’s really a wrapper that simplifies nearly 30 other copy commands, including Copy-DbaDatabaseCopy-DbaLogin, and Copy-DbaSqlServerAgent.

Also a bonus shout out to dbachecks.

Comments closed

The Impact Of Auto-Growth Settings For Log Files

Jamie Wick has started a series on log growth and starts by looking at auto-growth settings:

For the data file, the impact can be illustrated in the following chain of events:

  1. A new 1MB data file is created that contains no information. (ie. a 1MB data file containing 0MB of data)
  2. Data is written to the data until it reaches the file size. (ie. the 1MB data file now contains 1MB of data)
  3. The SQL server suspends normal operations to the database while the data file is grown by 1MB. (ie. the data file is now 2MB and contains 1MB of data) If Instant File Initialization (IFI) is enabled, the file is expanded and database operations resume. If IFI is not enabled, the expanded part of the data file must be zeroed before db operations resume, resulting in an additional delay.
  4.  Once the data file has been grown successfully, the server resumes normal database processing. At this point the server loops back to Step 2.

The server will continue this run-pause-run-pause processing until the data file reaches its Maxsize, or the disk becomes full. If the disk that the data file resides on has other files on it (ie. the C drive, or a disk that is shared by several databases), there will be other disk write events happening between the data file growth events. This may cause the data file expansion segments to be non-contiguous, increasing the file fragmentation and further decreasing the database performance.

This is all to answer the question, “What’s the problem with missing a few log backups?”

Comments closed

Gathering Info On Tables

Raul Gonzales has a script which provides useful information for tables and columns:

Useful information it provides at table level:

  • tableType, to identify HEAP tables
  • row_count, to identify tables with plenty of rows or now rows at all
  • TotalSpaceMB, to identify big tables in size
  • LastUserAccess, to identify tables that are not used
  • TotalUserAccess, to identify tables that are heavily used
  • TableTriggers, to identify tables that have triggers

Useful information it provides at column level:

  • DataType-Size, to identify supersized, incorrect or deprecated data types

  • Identity, to identify identity columns

  • Mandatory-DefaultValue, to identify NULL/NOT NULL columns or with default constraints

  • PrimaryKey, to identify primary key columns

  • Collation, to identify columns that might have different collation from the database

  • ForeignKey-ReferencedColumn, to identify foreign keys and the table.column they reference

Click through for the script.

Comments closed

Installing SQL Server On Ubuntu 18.04

Max Trinidad shows us how to install SQL Server on Ubuntu 18.04, though he leads off with a warning:

This has been an issue for sometime until now. I found the following link that help me install SQL Server on the latest Ubuntu 18.04:

https://askubuntu.com/questions/1032532/how-do-i-install-ms-sql-for-ubuntu-18-04-lts

But, there are few missing steps which can help ease the burden of errors. At the same time, the information is a little out-dated.

But, it works with the following adjustments.

Please Understand!!  This is NOT approved by Microsoft.  Use this method for Test Only!!

I’m waiting somewhat impatiently for Microsoft and Hortonworks to support Ubuntu 18.04.

Comments closed

Partitioning Data For Performance Improvement In R

John Mount shares a few examples of partitioning and parallelizing data operations in R:

In this note we will show how to speed up work in R by partitioning data and process-level parallelization. We will show the technique with three different R packages: rqdatatabledata.table, and dplyr. The methods shown will also work with base-R and other packages.

For each of the above packages we speed up work by using wrapr::execute_parallel which in turn uses wrapr::partition_tables to partition un-related data.frame rows and then distributes them to different processors to be executed. rqdatatable::ex_data_table_parallelconveniently bundles all of these steps together when working with rquery pipelines.

There were some interesting results.  I expected data.table to be fast, but did not expect dplyr to parallelize so well.

Comments closed

Sharing R Notebooks

Hanyu Cui and Hossein Falaki show how to share a notebook using RMarkdown:

RMarkdown is the dynamic document format RStudio uses. It is normal Markdown plus embedded R (or any other language) code that can be executed to produce outputs, including tables and charts, within the document. Hence, after changing your R code, you can just rerun all code in the RMarkdown file rather than redo the whole run-copy-paste cycle. And an RMarkdown file can be directly exported into multiple formats, including HTML, PDF,  and Word.

Click through for the demo.

Comments closed

Pipe-Friendly Functions In R

William Doane gives some tips on writing pipe-friendly functions in R:

Languages that don’t begin by supporting pipes often eventually implement some version of them. In R, the magrittr package introduced the %>% infix operator as a pipe operator and is most often pronounced as “then”. For example, “take the mtcarsdata.frame, THEN take the head of it, THEN…” and so on.

For a function to be pipe friendly, it should at least take a data object (often named .data) as its first argument and return an object of the same type—possibly even the same, unaltered object. This contract ensures that your pipe-friendly function can exist in the middle of a piped workflow, accepting the input from its left-hand side and passing along output to its right-hand side.

Click through for a couple of examples.  H/T R-Bloggers

Comments closed