How to code? — 9

Thalapathy Krishnamurthy
4 min readOct 28, 2019

Arriving at simplicity is always hard.

We did a recursive copy algorithm in post-8.

The question is, how much of the requirements is this addressing? Since it is copying files and folders from a given set of source folders into a destination folder, aren’t we good enough?

While this is probably enough to do the backup, we do not have any tracking of what we are backing up to continue backing up whenever a PC is shutdown and comes back up.

First we need to order the files we are copying by the timestamp of last modification. Then we need to maintain some meta information related to the file we are copying in terms of its name, last modified timestamp and a state that tells that the file is about to be copied and once copying is done, the file copying is done. These three pieces of information we can add to a meta file in the destination. Before a file is copied, or a folder is created in the copy() function, we update this meta file that we are doing that action and after a file is copied or folder is created we are updating the meta file that the action is done.

So we can imagine that when a meta file is present, the file mentioned in it is either about to be copied or the copy is done. If the bakitup program is stopped and when it runs again, it can refer to the last file backed up from this meta file. If this file is not yet copied, it can redo it. If the file is copied then it can find the next file to be backed up as per the last modified timestamp.

There is a window of vulnerability in this scheme of things. When a meta file is updated with the name of the file that is about to be backed up, the PC can go down. So before the name of the file is written into the meta file, or while writing the name of the file into the meta file the PC can go down. This can mean when bakitup runs again, it does not know the last backed up file. How do we ensure the meta file is not corrupted?

May be there is a more simpler way to solve this?

Probably there is no need for a meta file to track the last copied file. The file that was last copied itself may be enough to tell us if it was copied properly or not from the size and whether to continue from it or from the next if we order the files based on last modified timestamp.

So, we will change the copy algorithm as follows

backup(source folder array, destination folder)1. create the destination folder if it does not exist2. for each folder in source folder array      2.1 copy(source folder, destination folder)copy (source folder, destination folder)1. ordered_list =Order the items under source folder by last modified timestamp2. if destination folder\source folder does not exist   2.1. create destination folder\source folder3. begin_file = last backed up file in destination folder\source folder as per 1.4. modified_ordered_list = ordered_list subset from begin_file5. for each item in modified_ordered_list   2.1 if the item is a folder

2.1.1 copy(item, destination folder\source folder)
2.2 else // the item is a file 2.2.1 copy file under destination folder\source folder

In the above, we are basically checking the files in destination and files in source after ordering them by the last modified timestamp and begin to copy from the last copied file found in destination folder. If the last copied file has the same size, we begin the copy from the next file in the ordered list. If the last copied file is not having the same size as in source, we begin to copy from the last copied file in the ordered list.

As you see, the copy() is again cluttered with checks. So we will restructure it as follows to maintain the readability.

backup(source folder array, destination folder)1. create the destination folder if it does not exist2. for each folder in source folder array2.1 copy(source folder, destination folder)get_backup_list(source folder, destination folder)1. list1 = sort items in source folder by increasing order of last modified timestamp of the items2. list2 = sort items in destination folder by increasing order of last modified timestamp of the items3. if list2 is empty, return list14. last_item = last item in list25. if size of list1 == size of list2 and size of last_item in list1 == size of last_item   5.1 return empty list6. if size of last_item == size of last_item in list1,

6.1 return a sub-list from next item after last_item in list1
7. else 7.1 return a sub-list from last_item in list1copy (source folder, destination folder)1. backup_list = get_backup_list(source folder, destination folder)2. if backup_list is empty return3. for each item in backup_list 3.1 if the item is a folder

3.1.1 copy(item, destination folder\source folder)
3.2 else // the item is a file 3.2.1 copy file under destination folder\source folder

--

--