I have a master folder which contains many subfolders and files in it, and a child folder.
When a new file is added to the master folder, I need to update the file, if the file exists, or add that file in child folder along with any subfolder if present. However, I don’t want to delete any file that is present in child folder but is missing from master folder.
I am calculating MD5 checksum of all files in child and master folder to figure out which files need to be updated/created.
import os import hashlib def md5_checksum(filename): m = hashlib.md5() with open(filename, 'rb') as f: for data in iter(lambda: f.read(1024 * 1024), b''): m.update(data) return m.hexdigest() def getListOfFiles(dirName): listOfFile = os.listdir(dirName) allFiles = list() for entry in listOfFile: fullPath = os.path.join(dirName, entry) if os.path.isdir(fullPath): allFiles = allFiles + getListOfFiles(fullPath) else: allFiles.append(fullPath+"::"+md5_checksum(fullPath)) return allFiles local_path=r'C:\test' incoming_path=os.path.join(local_path,'Incoming') ## Master Folder existing_path=os.path.join(local_path,'Colors') ## Child Folder existing_list=getListOfFiles(existing_path) download_list=getListOfFiles(incoming_path) existing_md5= for file in existing_list: existing_md5.append(file.split('::')) for file in download_list: if file.split('::') not in existing_md5: print(file.split('::'))
However, I’m not sure how to make the subfolder structure same, along with the copying of the files?
Turns out, there is a python library for exactly this requirement, called dirsync.