File Size mismatch on FTP and local machine

Always prefer to use Binary mode in FTP transfers wherever it is possible. It guarantees you to copy all bytes of files and save you from any image file corruption. Generally image files like jpg, jpeg, tiff etc get corrupted when they are transferred in text mode.
This is the second post of Troubleshooting series

  1. Tourbleshooting with log4net RollingFileAppender
  2. Current post, you are here
  3. Who eats FTP file bytes silently when transferred in binary mode

Why File Size are different on FTP and local machine ?

Well, the person was just tallying with file size on disk. Size-on-disk is something different than the contents. To understand this one need to understand the file system management of operating system. Whenever, disk space requirement is placed against OS to store content in a file, a fixed size of area is allocated. This fixed area is known as cluster and vary from file-systems. Microsoft itself uses various flavour of file-system like FAT16, FAT32, NTFS etc. (depending on OS to OS) . And this is easy understood, why file size on disk may be differ. No harm in it. Sound good?

What if the file size are mismatches rather disk-size ?

In that case, really you need to troubleshoot. Always remember that you need to verify size in Bytes and in KB or MB. If still there is a bytes mismatch then, first zeroed on the source file. Chances are there that your source file may be corrupted one. You may argue that it is not because you are able to open it at your local machine. You may be puzzled to see that locally(source machine) you are able to see all the contents of your text file, but not at remote machine because it has been silently & partially posted/copied/uploaded. Missing bytes/data of your text files? It is surprising thing, No ?

Cause / Finding(s) :

  • Text file transfer (to FTP) may be done in other than ASCII/Text format

    It is always advise to use binary mode transfer to make exact byte copy. But exceptions are always there. This may behave strange for EOL notation character(s). Some operating system uses only carriage return to indicate a new line. Few operating systems uses line feed character to indicate new line. Microsoft operating systems uses both, carriage return + line feed for each new line. So, when a file is transferred in binary mode, this conversion is not cared and resulted in junk characters, loss of characters or even corrupt transfer. So, it is a good idea to have a switcher in code which will change the transfer-mode from binary to ASCII/text for files like txt, htm, css etc. Be noted, this may also give you some exception in some cases, because now a days text files are rich and using 2bytes for a single character (UTF files) to support other languages like chinese etc.

  • Your source file is itself has been copied from another remote machine (using msrtc/rdp)

    In this case, text files are more likely victim of corrupt. It is always advise to get download your files by zipping them on server and then copying them. After downloading in zip format, unzip them in their actual form. In this way at-least you can minimize the risk of getting corrupted.

  • Your system may have proxy firewall which is interrupting

Thanks for reading.