Press "Enter" to skip to content

Author: Kevin Feasel

Frequency Tables

Mala Mahadevan shows how to generate a frequency table in T-SQL and in R:

My results are as below. I have 1000 records in the table. This tells me that I have 82 occurences of age cohort 0-5, 8.2% of my dataset is from this bracket, 82 again is the cumulative frequency since this is the first record and 8.2 cumulative percent. For the next bracket 06-12 I have 175 occurences, 17.5 %, 257 occurences of age below 12, and 25.7 % of my data is in this age bracket. And so on.

Click through for the T-SQL and R scripts.

Comments closed

Backup And Recovery With Hadoop

Tim Spann explains how to perform backup/recovery operations and disaster recovery using Hadoop:

You can mirror datasets with Falcon. Mirroring is a very useful option for enterprises and is well-documented. This is something that you may want to get validated by a third party. See the following resources:

Tim shows several recovery options, making it useful reading if you use Hadoop as a source system for anything (or if you can’t afford it to be down for a 2-3 day period as you recover data).

Comments closed

Trace Flag Basics

Erin Stellato explains the basics behind trace flags in SQL Server:

Trace flag 1118 addresses contention that can exist on a particular type of page in a database, the SGAM page.  This trace flag typically provides benefit for customers that make heavy use of the tempdb system database.  In SQL Server 2016, you change this behavior using the MIXED_PAGE_ALLOCATION database option, and there is no need for TF 1118.

Trace flag 3023 is used to enable the CHECKSUM option, by default, for all backups taken on an instance.  With this option enabled, page checksums are validated during a backup, and a checksum for the entire backup is generated.  Starting in SQL Server 2014, this option can be set instance-wide through sp_configure (‘backup checksum default’).

The last trace flag, 3226, prevents the writing of successful backup messages to the SQL Server ERRORLOG.  Information about successful backups is still written to msdb and can be queried using T-SQL.  For servers with multiple databases and regular transaction log backups, enabling this option means the ERRORLOG is no longer bloated with BACKUP DATABASE and Database backed up messages.  As a DBA, this is a good thing because when I look in my ERRORLOG, I really only want to see errors, I don’t want to scroll through hundreds or thousands of entries about successful backups.

Click through for more useful information, including a list of officially supported trace flags.

Comments closed

PFS Page Repair

Paul Randal explains why DBCC CHECKDB cannot repair Page Free Space pages:

PFS pages occur every 8088 pages in every data file and store a byte of information about itself and the following 8087 pages. The most important piece of information it stores is whether a page is allocated (in use) or not. You can read more about PFS pages and the other per-database allocation bitmaps in this blog post.

So why can’t they be repaired by DBCC CHECKDB, when all the other per-database allocation bitmaps can?

The answer is that the is-this-page-allocated-or-not information is not duplicated anywhere else in the database, and it’s impossible to reconstruct it in all cases.

In case you’re not particularly familiar with PFS pages, Paul has a blog post from 2006 describing GAM, SGAM, and PFS pages.

Comments closed

Using Azure Data Factory With Biml

Meagan Longoria has a multi-part series on using Biml to script Azure Data Factory tasks to migrate data from an on-prem SQL Server instance to Azure Data Lake Store.  Here’s part 1:

My Azure Data Factory is made up of the following components:

  • Gateway – Allows ADF to retrieve data from an on premises data source

  • Linked Services – define the connection string and other connection properties for each source and destination

  • Datasets – Define a pointer to the data you want to process, sometimes defining the schema of the input and output data

  • Pipelines – combine the data sets and activities and define an execution schedule

Click through for the Biml.

Comments closed

Test User Generation With Powershell

Rob Sewell shows how to use dbatools to create test user accounts quickly:

Of course we can use any source for our users – a database, an excel file, Active Directory or even just type them in.

We can use the Add-SQLLogin command from the sqlserver module to add our users as SQL Logins, but at present we cannot add them as database users and assign them to a role.

Rob includes a demo script as well, thereby making it even easier.

Comments closed

Automated Database Restoration And CHECKDB

Anthony Nocentino shows how to use dbatools to automate testing restoration of database backups and running DBCC CHECKDB against these restored backups:

Requirements

  1. Automation – Complete autopilot, no human interaction.
  2. Report job status – Accurate reporting in the event the job failed, the CHECKDB failed or the restore failed.

Solution

  1. Use dbaltools cmdlets for restore and CHECKDB operations
  2. Use SQL Agent Job automation, logging and alerting

So let’s walk through this implementation together.

You won’t get a turnkey solution from this blog post, but you will get a good process to follow.

Comments closed

Using DAX For Dynamic Grouping

Jason Thomas shows how to include an “All Others” member in a Power BI visual:

Requirement: The user wants a report with a column chart. The X axis will have Subcategory Name and the value will be the sum of Internet Sales. Along with this chart, the user will have a slicer where they can select the Subcategory Names. The column chart should “update” showing one column for each selected subcategory, and another column named “Others” with the summed amount of the rest of the unselected categories.

Basically, they wanted a dynamic group called “Others” and the members within this group should change based on what is selected on the slicer.

This would be a good time to show a visual representation of what the requirement means.

Click through for that visualization, as well as the solution.

Comments closed

Error Log Column Size

Cody Konior tests how long messages in the error log can be:

Stupid question… what’s the schema of a table with sys.sp_readerrorlog output? Well you might be surprised if you’re used to using nvarchar(max) or nvarchar(2048).

There’s a datetime (modern datetime2(3)) obviously. ProcessInfo is either “Server” or “spidxxxy” where xxx is an int (max of 11 characters including minus) and y is an optional single character suffix. But as for the text…

Let’s try to max it out!

Moral of the story:  keep those error messages as short as possible while still being meaningful.

Comments closed

HBase Compaction

Jitendra Bafna explains how HBase compaction works:

Compaction is a process by which HBase cleans itself. It comes in two flavors: minor compaction and major compaction.

Minor compaction is the process of combining the configurable number of smaller HFiles into one Large HFile. Minor compaction is very important because without it, reading particular rows requires many disk reads and can reduce overall performance.

Major compaction is a process of combining the StoreFiles of regions into a single StoreFile. It also deletes remove and expired versions. By default, major compaction runs every 24 hours and merges all StoreFiles into single StoreFile. After compaction, if the new larger StoreFile is greater than a certain size (defined by property), the region will split into new regions.

Read on for more information about compaction and data locality, which is a totally different topic.

Comments closed