I have a master folder which contains many subfolders and files in it, and a child folder.

When a new file is added to the master folder, I need to update the file, if the file exists, or add that file in child folder along with any subfolder if present. However, I don’t want to delete any file that is present in child folder but is missing from master folder.

I am calculating MD5 checksum of all files in child and master folder to figure out which files need to be updated/created.

import os
import hashlib

def md5_checksum(filename):
    m = hashlib.md5()
    with open(filename, 'rb') as f:
        for data in iter(lambda: f.read(1024 * 1024), b''):
            m.update(data)
    return m.hexdigest()

def getListOfFiles(dirName):
    listOfFile = os.listdir(dirName)
    allFiles = list()
    for entry in listOfFile:
        fullPath = os.path.join(dirName, entry)
        if os.path.isdir(fullPath):
            allFiles = allFiles + getListOfFiles(fullPath)
        else:
            allFiles.append(fullPath+"::"+md5_checksum(fullPath))               
    return allFiles

local_path=r'C:\test'
incoming_path=os.path.join(local_path,'Incoming') ## Master Folder
existing_path=os.path.join(local_path,'Colors') ## Child Folder

existing_list=getListOfFiles(existing_path)
download_list=getListOfFiles(incoming_path)
existing_md5=[]
for file in existing_list:
    existing_md5.append(file.split('::')[1])
for file in download_list:
    if file.split('::')[1] not in existing_md5:
        print(file.split('::')[0])

However, I’m not sure how to make the subfolder structure same, along with the copying of the files?

Answer

Turns out, there is a python library for exactly this requirement, called dirsync.

https://pypi.org/project/dirsync/

Source: https://stackoverflow.com/q/67636334

172 5 5 0