How to code ? — 3
Sometimes you need to think about failure upfront.
If you have not read Post 2, you can read it to get the background for this one.
I mentioned that we will write the code in python for the technical requirements identified in Post 2. But before we go ahead with that, the analysis we did to arrive at the technical requirements brought out some interesting question on how to handle a PC shutdown when a backup is in progress.
Normally, anything that causes the failure of the application you are coding on the machine it runs are not so important to consider in round one. The term used for this is High Availability. When the software you write is deployed on the target machine or what we call as Production systems or in our case when users run bakitup application on their PC and it fails due to a crash of the system, the ways by which we make the application handle such failures and make itself available continuously fall under the High availability requirements of the system or application we develop.
High available systems are designed mainly when there are complex software like Banking or a Airline reservation system or a Software running your prepaid recharges in a Telecom network. They ensure that a transaction like putting some money into your bank account or buying tickets are done in a fault-tolerant manner to avoid monetary losses and bring in reliability to the software. A simple backup application like bakitup which runs like an App on a PC need not be highly available.
So how did we end up with such a requirement ? Let us examine it in much more detail here.
Suppose we do not handle the abrupt shutdown of a PC, as we copy files between the source folders and the destination folders one by one, if the PC shuts down, a file copy may still be happening and not completed. Suppose, the PC dies and the user has to restore the files from backup, the backed up version will be inconsistent.User losing their files is not acceptable. Imagine you lose all your favorite photographs or music files.
So we conclude that bakitup should be Reliable. It should be able to handle shutdown of PC or power off while it is running and maintain the state of the backup in terms of the file that was being backed up.
We need to also understand what exactly happens when a file is copied from one folder to another, say on a Windows PC. From a cursory check, I deduce the following
- When a file is copied, the original timestamp is maintained.
- When a folder is copied, the folder is created there and it has the current timestamp. However, the files inside it maintain their original timestamp.
- When a file is copied, may be, am not sure here, may be, the OS will ensure the file is completely copied before a PC shutdown happens. But I strongly feel this is possible only when a normal shutdown happens. If it is a power shutdown or the PC crashes abruptly no housekeeping is possible for the OS to keep things in order.
This examination highlights the question of ‘meta data’ or properties of a file apart from backing up the content itself. Should backup worry about the metadata of the files which are the created time, modified time, file permissions etc. apart from the content (or the data) of the file itself? Would it be possible to preserve the metadata of the files being backed up ? How do we make it happen if it is not possible? Should we drop this need altogether and just ensure the contents are only backed up?
Preserving the metadata of the files being copied is possible only if the underlying OS allows that. In case of Windows which is the primary target for bakitup, it is not clear if Windows OS supports copying files and preserving their created, modified timestamps. So, for now, we will drop this as a requirement and circle back later to this.
See post-4 for a summary of our requirements on bakitup.