Press "Enter" to skip to content

Category: Administration

Azure AD On Azure SQL DB

Arun Sirpal shows how to set up Azure SQL Database to use Azure Active Directory accounts:

I think it is important to highlight a couple of points, more specifically around the requirement of ADALSQL.DLL and proper setup of AD which I will highlight below and reference some links, please do this as it lays the foundation for you.

ADALSQL.DLL

You need ADALSQL.DLL which is part of the latest SQL Server Management Studio (SSMS) to test access. This stands for Active Directory Authentication Library for SQL Server.

This goes through some of the issues Arun had setting everything up and provides workarounds and explanations.

Comments closed

Check Your Transactions

John Morehouse talks about a mistake he made:

The other day I had to update some records, in Production.  I’m a firm believer of using explicit transactions and double checking things before committing a transaction.  This helps ensure things go as expected.  This also allows me a way to rollback the changes if they don’t.  It happens.

However, this means that I have to COMMIT said explicit transaction.  And not go to lunch without doing so.

Can you see my mistake?  I bet you can.

Fortunately, it sounds like it wasn’t a critical problem.  If you want to check for open transactions, Jack Vamvas has a couple methods.

Comments closed

Finding Partition Boundaries

Kenneth Fisher shows how to find the min and max values for a partition:

So what does it do? Per BOL

Returns the partition number into which a set of partitioning column values would be mapped for any specified partition function in SQL Server 2016.

So it basically tells us which partition any given row is in. This can be particularly handy at times. For example, if you want to know the min and max values of a column per partition.

Read on for a couple scripts which use $Partition.

Comments closed

Log Info DMF

Andrew Pruski looks at the new sys.dm_db_log_info dynamic management function in SQL Server 2017:

This new DMF is intended to replace the (not so) undocumented DBCC LOGINFO command. I say undocumented as I’ve seen tonnes of blog posts about it but never an official Microsoft page.

This is great imho, we should all be analyzing our database’s transaction log and this will help us to just that. Now there are other undocumented functions that allow us to review the log (fn_dblog and fn_dump_dblog, be careful with the last one).

I’m glad to see this make the cut, as replacing informative / idempotent DBCC commands with DMVs makes it easier to query this data, particularly in conjunction with other DMVs or system tables.

Comments closed

Apache Solr Backup And Recovery

Hrishikesh Gadre shows how to back up indexes in Apache Solr:

The backup mechanism allows an administrator to create a physically separate copy of index files and configuration metadata for a Solr collection. Any subsequent change to a Solr collection state (e.g. removing documents, deleting index files or changing collection configuration) has no impact on the state of this backup. As part of disaster recovery, the restore operation creates a new Solr collection and initializes it to the state represented by a Solr collection backup.

It’s probably safest to treat data in Solr as secondary data, in the sense that you should be able to rebuild the entire data set from scratch instead of Solr being a primary data store.  I’m not a big fan of the author using the term “disaster recovery” instead of just “recovery” or “backup restoration” (as they’re different concepts), but it’s worth the read.

Comments closed

Against Shrinking Database Log Files

Kenneth Fisher is wary of shrinking your database log file:

It’s too big
I find that people who say this frequently don’t have a firm idea of what is too big or even why it might be as big as it is. This also goes in with the I need to free up disk space with no good reason why the space needs to be freed up.

There are good reasons to shrink the log and they do revolve around space. For example:

  • I had a one-time explosive growth of the log due to a large data load.

  • The usage of the database has changed and we aren’t using as much of the log as we used to.

  • We are billed at 2 am based on space used. We will grow the log back again after the billing period.

  • I need to clean up a large number of VLFs. (of course, then you are going to grow it back manually right?)

I quoted the caveats but Kenneth makes a solid case against shrinking log files without a good counterbalancing reason.

Comments closed

Triggers And Memory-Optimized Table Modifications

Jack Li troubleshoots a customer issue when trying to modify a memory-optimized table:

Recently I assisted on a customer issue where customer wasn’t able to alter a memory optimized table with the following error

Msg 41317, Level 16, State 3, Procedure ddl, Line 4 [Batch Start Line 35]
A user transaction that accesses memory optimized tables or natively compiled modules cannot access more than one user database or databases model and msdb, and it cannot write to master.

If you access a memory optimized table, you can’t span database or access model or msdb.  The alter statement doesn’t involve any other database.

It turns out there was a DDL trigger defined on the instance that wrote data to msdb.  Click through for Jack’s repro script.  I’d be able to use memory-optimized tables a lot more frequently (to the chagrin of company DBAs, perhaps) if they supported cross-database operations.

Comments closed

Using Azure Cloud Shell

Jeffrey Verheul shows off a bit of Azure Cloud Shell:

Connecting to a database
Now that your Cloud Shell is ready to go, you can start using Bash. This means you can also use sqlcmd from within Bash.

You can connect to a database with sqlcmd, by using the following command:

sqlcmd -S servername.database.windows.net -U username -P password
Once the connection to your database has been made, you can run queries against it.

There’s no Powershell support yet, but Bash is currently supported and Powershell is in the works.

Comments closed

Creating An Azure Database For MySQL

Arun Sirpal shows how to create a new MySQL database in Azure:

Here you have the concept of compute units. No such thing as DTUs here but just as confusing.

Compute Units are a measure of CPU processing throughput guaranteed to be available to a single Azure Database for MySQL server. A Compute Unit is a blended measure of CPU and memory resources. In general, 50 Compute Units equate to half-core, 100 Compute Units equate to one core, and 2000 Compute Units equate to twenty cores of guaranteed processing throughput available to your server. I am not going to rehash official documentation on these concepts so I recommend reading https://docs.microsoft.com/en-gb/azure/mysql/concepts-compute-unit-and-storage

Different database product, different metric, it seems.  Check out Arun’s post as he walks you through the process step by step.

Comments closed

Monitoring Spark And Kafka

Larry Murdock gives some hints on monitoring Kafka topics and their associated Spark jobs:

Besides alerting for the hardware health, monitoring answers questions about the health of the overall distributed data pipeline. The Site Reliability Engineering book identifies “The Four Golden Signals” as the minimum of what you need to be able to determine: latency, traffic, errors, and saturation.

Latency is the time it takes for work to happen. In the case of data pipelines, that work is a message that has gone through many systems. To time it, you need to have some kind of work unit identifier that is reflected in the metrics that happen on the many segments of the workflow. One way to do this is to have an ID on the message, and have components place that ID in their logs. Alternatively, the messaging system itself could manage that in metadata attached to the messages.

Traffic is the demand from external sources, or the size of what is available to be consumed. Measuring traffic requires metrics that either specifically mean a new arrival or a new volume of data to be processed, or rules about metrics that allow you to proxy the measure of traffic.

Errors are particularly tricky to monitor in data pipelines because these systems don’t typically error out on the first sign of trouble. Some errors in data are to be expected and are captured and corrected. However, there are other errors that may be tolerated by the pipeline, but need to be feed into the monitoring system as error events. This requires specific logic in an application’s error capture code to emit this information in a way that will be captured by the monitoring system.

Saturation is the workload consuming all the resources available for doing work. Saturation can be the memory, network, compute, or disk of any system in the data pipeline. The kinds of indicators that we discussed in the previous post on tuning are all about avoiding saturation.

Larry then applies these concepts and gives links to some useful tools.

Comments closed