I recently ran into this very same problem in which the bcp error message stated:
An error occurred while processing the file “D:\MyBigFeedFile.txt” on data row 123798766555678.
Given that the text file was far in excess of notepad++ limits (or even easy human interaction usability limits), I decided not to waste my time trying to find an editor that could cope and simply looked for a way to pull out a batch of lines from within the file programmatically to compare the good rows with the bad.
Click through for a precise and useful Powershell snippet. The .NET file stream libraries are also good for this kind of thing.
This week, we look at the TIME data type. It is formatted as HH:mm:ss.fffffff, where HH is hours between 0 and 23, mmis minutes between 0 and 59, ss is seconds between 0 and 59, and f represents 0 or more fractional seconds, up to a maximum of seven decimal places.
With a maximum length of 5 bytes, TIME can store a value with a granularity of up to 100 nanoseconds.
I tend not to use TIME very often. It’s useful if you need it, but I rarely find myself needing a dateless time.
One limitation in the current public preview is that tempdb don’t preserves custom settings after fail-over happens. If you add new files to tempdb or change file size, these settings will not be preserved after fail-over, and original tempdb will be re-created on the new instance. This is a temporary limitation and it will be fixed during public preview.
However, since Managed Instance supports SQL Agent, and SQL Agent can be configured to execute some script when SQL Agent start, you can workaround this issue and create a SQL Agent job that will pre-configure your tempdb.
SQL Agent will start whenever Managed Instance fail-over and the job that contains script above can increase tempdb size before you start running your workload on the new instance.
Managed Instance is your dedicated resource that is placed in Azure Virtual network with assigned private IP address. Before you create Managed Instance, you need to create Azure Virtual network using Azure portal, PowerShell, or Azure CLI.
If you are using Azure portal, make sure that you use Resource Manager ake sure that Service Endpoints option is Disabled in Creating Virtual Network Blade (this is default option so don’t change it).
If you want to have only one subnet in your Virtual Network (Virtual Network blade will enable you to define first subnet called default), you need to know that Managed Instance subnet can have between 16 and 256 addresses. Therefore, use subnet masks /28 to /24 when defining your subnet IP ranges for default subnet. If you know how many instances you will have make sure that you have at least 2 addresses per instance + 5 system addresses in the default subnet.
Both posts are useful if you’re interested in getting started with a managed instance.
This command only applies to Azure SQL Database, at a high level it empties the database authentication cache for logins and firewall rules for the current USER database.
In Azure SQL Database the authentication cache makes a copy of logins and server firewall rules which are in the master database and puts them into memory within the user database. The Database Engine attempts re-authorisation using the originally submitted password and no user input is required.
If this still doesn’t make sense, then an example will really help.
The new data set “iris2” does not have any rows containing “setosa” as a possible value of “Species”, yet the levels() function still shows “setosa” in its output.
According to the user G5W in Stack Overflow, this is a desirable behaviour for the levels() function. Here is my interpretation of the intent behind the creators of base R: The possible values of a factor are fundamental attributes of that variable, which should not be altered because of changes in the data.
There’s some back-and-forth in the comments; my takeaway is that both are useful functions depending upon what, exactly, you want to learn.
The optimizer assumes that people write a semi join (indirectly e.g. using EXISTS) with the expectation that the row being searched for will be found. An apply semi join row goal is set by the optimizer to help find that expected matching row quickly.
For anti join (expressed e.g. using NOT EXISTS) the optimizer’s assumption is that a matching row will not be found. An apply anti join row goal is not set by the optimizer, because it expects to have to check all rows to confirm there is no match.
If there does turn out to be a matching row, the apply anti join might take longer to locate this row than it would if a row goal had been used. Nevertheless, the anti join will still terminate its search as soon as the (unexpected) match is encountered.
Another very interesting part of the series and well worth the time to read.
If you’ve ever dug down in the SQL Server transaction logs or had to build up restore chains, then you’ll have come across Log Sequence Numbers (LSNs). Ever wondered why they’re so large, why they all look suspiciously the same, why don’t they start from 0 and just how does SQL Server generate these LSNs? Well, here we’re going to take a look at them
Below we’ll go through examples of how to look inside the current transaction log, and backed up transaction logs. This will involve using some DBCC commands and the undocumented fn_dblog and fn_dump_dblog function. The last 2 are very handy for digging into SQL Server internals, but be wary about running them on a production system without understanding what’s going on. They can leave filehandles and processes behind that can impact on your system.
It’s an interesting look into SQL Server’s internals.
As you may have seen at PASS Summit 2017 or another event, with the announcement of Azure Data Factory v2 (adf), Biml will natively support adf objects.
Please note, that the native support is currently only available in BimlStudio 2018. If you’re using BimlExpress, you can still generate the JSON for your pipelines, datasets etc. using Biml but you cannot use the newly introduced tags.
The really good parts are only available in the paid product; if you do a lot of Azure Data Factory work, that might tip the scales in favor of getting BimlStudio.
Since we are prognosticating, I want to take a guess at one of the constraints limiting the future. I present you with Meidinger’s law:
An industry’s growth is constrained by how much your junior dev can learn in two years.
Let me explain. On my team, one of our developers’ just left for a different company. We also have a college student who will be going full time in May, upon graduation. How long do you think it’s going to take the new guy to get up to speed?
And how long do you think he’s going to stay?
This I think is a useful dictum which explains a pretty good amount of industry movement.