Smart Folder Synchronization with Python
I have various files that download to my computer automatically. They arrive at different times of the day or night. When they do, I like to transfer them to a network drive on another computer automatically.
But there’s a wrinkle.
When the files auto-download to my computer, they go into one folder automatically, but when I transfer them to my network drive, I like them to go into different folders, based on their file name.
Python to the rescue.
I whipped out this little script that runs from cron every 5 minutes on my Mac, rsync’ing files from my “Downloads” folder into a specific subfolder of “/Volumes/Shared”, following a set of rules. In this script, any file with the string “dave” in its name gets copied to the “/Volumes/Shared/Dave Stuff” folder. Any file with “bob” in its name gets sent to “/Volumes/Shared/Files for Bob”, and so on.
Here it is for your enjoyment:
#!/usr/bin/python # Filename filters, and which folders to send them to: filters = { # Filename : Dest Folder 'dave' : 'Dave Stuff', 'bob' : 'Files For Bob', 'frank' : 'Franks Junk', } src = '/Users/Dave/Downloads' dest = '/Volumes/Shared' rsync = 'rsync --times ' # ---------------------------------------------------------- import os; import sys; import subprocess; # Only show progress when we're running in a terminal (and not cron): if sys.stdout.isatty(): rsync = rsync + '--progress ' for dir, dirs, files in os.walk(src): for filename in files: if filename.startswith(".") or filename.endswith(".part"): continue fullpath = os.path.join(dir, filename) for filter, destfolder in filters.iteritems(): if filename.lower().find(filter) >= 0: fulldest = os.path.join(dest, destfolder) print "Copying '" + filename + "' to folder '" + destfolder + "'" cmd = rsync + ' "' + fullpath + '" "' + fulldest + '/."' process = subprocess.Popen(cmd, shell=True) try: process.wait() except KeyboardInterrupt: process.kill() sys.exit(1) break else: print 'Could not find a home for file "' + filename + '"'
When this script runs, it just blindly tells rsync to transfer the files, but rsync will only transfer them if they are newer in the “src” folder than the “dest” folder. That’s thanks to rsync’s “–times” argument, which tells rsync to preserve the file times when it does the transfer.
So far it’s working great. I called it “sync-download-files” and created a crontab file called /Users/Dave/etc/crontab, that looks like this:
*/5 * * * * ~/bin/sync-download-files >/dev/null 2>&1
Then I ran this crontab command to install the job:
crontab /Users/Dave/etc/crontab
And voila! Files now auto-sync to /Volumes/Shared every 5 minutes. When there are no files to sync, the script completes in a couple seconds.
By the way, in my case, /Volumes/Shared is a Samba mounted network share to a WDTV Live box.
Has one comment to “Smart Folder Synchronization with Python”
That is really nice tight script. I’ve been looking for a good example of file discovery and sync. Thanks for sharing.