Fellow Canadian Doran Douglas brought this issue to my attention recently, and I wanted to share it with you as well.
Let’s say you have a file in UTF-8 format. What this means is that some of the characters will be single-byte, and some may be more than that.
Where this becomes problematic is that a fixed-width file has fields that are, well, fixed in size. If a Unicode character requires more than one byte, it’s going to cry havoc and let slip the dogs of truncation.
Click through for an example. This seems like a bug to me—I interpret fixed-width as fixed number of characters, not fixed number of bytes. At the very least, it’s liable to cause confusion.