FileZoomer » Batch Processing

Beta Version 0.9 Add S3 Lifecycle, Versioning, Glacier, Batch Processing

Steve — Thu, 20 Dec 2012 14:59:54 +0000

The newest version of FileZoomer Adds support for several new Amazon Web Services S3 capabilities, including:

Object Life Cycle: specify that files be deleted or moved to low-cost AWS Glacier storage after a set number of days or after a certain date.

Versioning: Turned on at the bucket level, versioning means that even if you upload multiple updates to a file all previous versions are saved. The newest version shows up as usual, but if you right-click the file and “show versions” all the prior versions will be displayed, and they can then be downloaded.

It’s important to know that with the current version of S3 these two features — Object Life Cycle and Versioning — are mutually exclusive. If you turn on Versioning you can’t also use Life Cycle rules, and if you are using Life Cycle you can’t turn on Versioning.

The new version of FileZoomer also includes a Batch Processing option. After interactively defining a batch process using “File…Batch Configuration”, you can later initiate that process in FileZoomer with “File…Run Batch”. This makes it easy, for instance, to update a folder and its contents with all new and updated files since the last time the batch upload was run.

Using a pure batch processing version of the filezoomer java “jar” file, along with a configuration file you have created interactively, you can also do things like schedule an unattended run of an upload job.

For more details on these new features see the individual posts for Object Life Cycle, Versioning, and Batch Processing..

FileZoomer Batch Processing Automates Repetitive S3 Upload Tasks

Steve — Thu, 20 Dec 2012 14:49:49 +0000

A major new feature of FileZoomer Beta 0.9 is “Batch Processing,” designed to facilitate and automate repetitive S3 transfer tasks.

We needed to upload web site backups and logs to S3 from Linux systems, and the only access from Linux to S3, at the time, was with a Perl module which did not handle large files well. S3 access code from FileZoomer was reused to create a command line java jar file that could run using the scheduling system provided by the operating system. Being java-based, it could be used on Windows and OS X systems as well.

Manually creating the configuration file was soon replaced by an interactive GUI component in FileZoomer itself. After that it was a small step to also add a “Run Batch” command to FileZoomer so that common, repetitive tasks could be easily initiated from the application itself, when that is easier or more useful than using a scheduler to run an actual batch job. The ability to do interactive configuration and easily initiate regular “housekeeping” tasks (like backing up new and changed files in a Documents folder) soon made it popular around the office for Windows and Mac use.

Now it’s available to FileZoomer users. It’s a powerful, but somewhat complicated tool, so if you use it take some time to understand it.

We’ll describe here how to get started using Batch Processing, using as our example the most obvious and common reason to use it — uploading and updating a folder to S3. And remember that it works with all the other new S3 features, most notably Object Life Cycle migration to Glacier for low-cost file archiving.

To start, choose “Batch Configuration” from the File menu:

Then Highlight “Add New Configuration File” and click “OK”. Later after you’ve created one or more configurations, you will see them listed for selection.

Next, specify all the important details about your configuration:

Notice the current bucket and S3 path is filled in for you. Browse to select where you want log files to be saved. Then “Add New Action” to specify what your batch configuration will accomplish:

This example shows an Upload to S3 of all files modified (or created) since the last time the job ran. This means the first time it runs all the files in the selected path will be uploaded. We’ve chosen “Server Side Encryption” so the files are “encrypted at rest” at AWS. We want to process subfolders. We’re not going to compress on upload, which would cause files to be stored in zip format (if you are concerned about saving space and related charges, consider setting up a migration to Glacier). Uncompress on Download is checked so that IF you used FileZoomer compression the application would uncompress files on a later download.

Browse for the local path of the folder and files to be processed, and click “OK.”

Be sure to give the configuration a descriptive name, notice your new configuration is now listed, and BE SURE TO “SAVE CONFIG.”

Now you are ready to “Run Batch” from the File menu:

Select the desired configuration (there could be more than one) and click “OK”:

FileZoomer will tell you it’s “Running Batch Configuration” but if you need details on what’s transpiring you need to look at the log file.

There are more options available for Batch Processing.

In addition to the “Run Batch” command you can also run scheduled, or from a command line..

FileZoomer Batch Processing Provides Powerful Options

Steve — Thu, 20 Dec 2012 14:48:50 +0000

An earlier post introduced the new FileZoomer Batch Processing feature and walked through the configuration steps needed to prepare to do a “Run Batch” using a single, but very common, batch processing option. Here we go into more detail on all the available options and what they offer:

When you add a new configuration file (File…Batch Configuration…Add New Configuration File) and then “Add a New Action” you see this dialog that contains all the options available:

Let’s go through all the options starting with Select Action:

Upload will upload files from the Local Path on the PC to the current S3 bucket and path.
Download will download files from the current S3 bucket and path to the Local Path on the PC.
Sync will synchronize the current S3 bucket and path and the Local Path on your PC. Sync will not do subfolders. It will only do files in the specified folder itself. Files that exist only on S3 will be copied to the PC. Files that exist only on the PC will be copied to S3. Files that exist in both places will be compared, with the newer version replacing the older. That means if you delete a file on your PC that is also on S3 it will be copied back from S3. Likewise if you delete a file on S3 that is also on your Pc it will be copied back to S3. Sync will NOT resolve any update conflict issues. For instance, if an S3 file is updated from a source other than where FileZoomer Batch Processing is being used, and is the same file is also updated on the local PC then a FileZoomer Batch Sync will overwrite the older file. It will NOT merge changes to the file from multiple sources. Sync does “All Files” regardless of other settings.
Prune is a specialized action that offers a way to control the number of files in an S3 bucket and path. Unlike Object Life Cycle, which removes (via Delete or Glacier migration) files based on a number of days elapsed, Prune deletes files based on the “Prune File Count”, retaining the most recent Prune-File-Count number of files in the current S3 bucket and path and deleting the rest. Prune should usually not be used on a bucket with Versioning enabled as the two features work at cross-purposes. Note that when doing a “Prune” action, other parameters are ignored as not relevant (Local Path, Process Subfolders, Encryption, Compression, All Files, Files Modified Today, Files Modified since last run). If you don’t have a clear idea of why you need Prune, it’s best not to use it.

Effect of All Files, Files Modified Today, and Files Modified Since Last Run:

The “Default” is All Files, which means all files will be processed regardless date and time stamps. On a Download this could mean, for example, that older files on S3 replace updated files on the local PC. This is always used for “Sync“, and ignored for “Prune“.
Files Modified Today uses the date on the PC for Upload. Not recommended for use with Download, and ignored for “Sync” and “Prune“.
Files Modified Since Last Run works the same as All Files for the first run. After that it looks at the date and time stamp on the files and only does those files created or modified since the last run. IMPORTANT: on Windows and OS X if you move an older file to the local path the date is not changed, and if it is a file created or modified before the Last Run date it will NOT be uploaded using this option.

Effect of Server Side Encryption, Compress on Upload, Process Sub Folders and Uncompress on Download:

Server Side Encryption, when checked, tells S3 to store the file encrypted (“encryption at rest”). Technically all this does is protect files in the event AWS discards or otherwise loses control of a drive containing your files without first destroying the files. Note that when files are being uploaded or downloaded using FileZoomer, SSL (encrypted transmission) is also used. Neither requires any key management on the user’s part.
Process Sub Folders, when checked, includes files in subfolders (and will create a subfolder as needed) when doing an Upload or a Download. This option is ignored for Sync and Prune.
Compress on Upload, if checked, zips files before uploading and marks the files as being “FileZoomer Zipped“. When paired with “Uncompress on Download” any file zipped by FileZoomer will be unzipped. Other S3 clients will simply see the files as regular zip files. Files already in a compressed format (e.g. jpeg, png, mp3, and actual zip files) will not be compressed. This option makes uploads complete more quickly and saves a bit on S3 cost.
Uncompress on Download, which checked, will unzip any files marked on S3 as having been ”FileZoomer Zipped“. Otherwise it has no effect. It’s a good idea to leave this checked.

Local Path is required (except for “Prune”, and identifies the folder on the PC to include in processing.

Once your configuration is created, you can use the “Run Batch” option, or set it up to run from a command line, or on a schedule.

Run or Schedule FileZoomer Batch Using Command Line or .BAT file

Steve — Thu, 20 Dec 2012 14:47:24 +0000

Previous posts on the new FileZoomer Batch Option have shown how to interactively create a batch configuration file, and start a batch process from within FileZoomer.

You can also use the true batch version of the java jar file and run it from a command line, batch file, or terminal session on any operating system that supports command line or shell commands or terminal commands (and of course they all do). This makes it easy to create a configuration file (do it interactively) but to run the process unattended, including on a server or other machine different from the one used to create the configuration file.

When you save a configuration file you will notice that it tells you where that config file was saved, and the name:

You can also check the box to put the path in the clipboard. You can also get this info anytime by clicking “Show Config File” on the Batch Processing Configuration page:

Next, get a copy of FileZoomerBatch.jar and its associated lib folder (the link is to a zip file, unzip and and put it where you want to execute it from) Here’s the zip file’s MD5 if you want to confirm it..

Then use the appropriate technique for your OS to execute FileZoomerBatch.jar, remembering to provide the location of the config file. For instance, a Windows .BAT file located in the same directory as FileZoomerBatch.jar and the config file (mcfg) would have a command that looked like this:

java -jar FileZoomerBatch.jar “xxxxxx.mcfg”

(where xxxxxx was the name you gave your config file).

If you are executing on a linux or Mac OS X system use appropriate method. If you aren’t comfortable with that consider executing the batch process from within the FileZoomer App with “Run Batch”.