Amazon S3 Bucket Management with C#: Part 10 – Uploading all files in a directory recursively to an S3 Bucket
Before getting started
Skill Level: Intermediate
Assumptions:
- You already gone through Parts 1-9 of Managing Amazon AWS with C#.
Additional information: I sometimes cover small sub-topics in a post. Along with AWS, you will also be exposed to:
- Rhyous.SimpleArgs
- Single Responsibility Principle (S in S.O.L.I.D.)
- async, await, parallelism
- 10/100 rule
Doing things by convention.
Step 1 – Add a method to get the list of files in a local directory
This isn’t the focus of our post, however, in order to upload all files in a directory recursively, we have to be able to list them. We are going to create a method that is 10 lines of code. The method has one single repsponsibility, to return all files in a directory recursively. It is not the responsibility of BucketManager.cs to do this. Hence we need a new class that has this responsibility.
Another reason to move this method to its own file is that this method is itself 10 lines of code. While you can have methods longer than ten lines, more than ten lines is usually the first sign that the Single Responsibility principal is broken. Most beginning developers have a hard time seeing the many ways a method may be breaking the single responsibility principle. So a much easier rule, is the 10/100 rule. In the 10/100 rule, a method can only have 10 lines. This rule is pretty soft. Do brackets count? It doesn’t matter. What matters is that the 10 line mark, with or without brackets, is where you start looking at refactoring the method by splitting it into two or more smaller and simpler methods. This is a a Keep It Super Simple (K.I.S.S.) rule.
- Add the following utility class: FileUtils.cs.
using System.Collections.Generic; using System.IO; using System.Linq; using System.Threading.Tasks; namespace Rhyous.AmazonS3BucketManager { public static class FileUtils { public static async Task<List<string>> GetFiles(string directory, bool recursive) { var files = Directory.GetFiles(directory).ToList(); if (!recursive) return files; var dirs = Directory.GetDirectories(directory); var tasks = dirs.Select(d => GetFiles(d, recursive)).ToList(); while (tasks.Any()) { var task = await Task.WhenAny(tasks); files.AddRange(task.Result); tasks.Remove(task); } return files; } } }
Notice: The above class will get all files and directories, recursively. It will do it in parallel. Parallelism likely isn’t needed most of the time. Any non-parallel code that could list files and directories recursively would work. But if you were going to sync directories with tens of thousands of files each, parallelism might be a huge benefit.
Step 2 – Add an UploadFiles method to BucketManager.cs
- Edit file called BucketManager.cs.
- Enter this new method:
public static async Task UploadFiles(TransferUtility transferUtility, string bucketName, string directory) { var files = await FileUtils.GetFiles(directory, true); var directoryName = Path.GetFileName(directory); // This is not a typo. GetFileName is correct. var tasks = files.Select(f => UploadFile(transferUtility, bucketName, f, f.Substring(f.IndexOf(directoryName)).Replace('\\', '/'))); await Task.WhenAll(tasks); }
Notice 1: We follow the “Don’t Repeat Yourself (DRY) principle by having UploadFiles() forward each file to the singular UploadFile().
Notice 2: We don’t use the await keyword when we redirect each file UploadFile. Instead we capture the returned Task objects and then we will await the completion of each of them.
Step 3 – Update the Action Argument
We should be very good at this by now. We need to make this method a valid action for the Action Argument.
- Edit the ArgsHandler.cs file to define an Action argument.
... AllowedValues = new ObservableCollection<string> { "CreateBucket", "CreateBucketDirectory", "CreateTextFile", "DeleteBucket", "DeleteBucketDirectory", "ListFiles", "UploadFile", "UploadFiles" }, ...
Note: There are enough of these now that I alphabetized them.
Step 4 – Delete the Parameter dictionary
In Part 4, we created a method to pass different parameters to different methods.We took note in Part 8 and Part 9 that we now have more exceptions than we have commonalities. It is time to refactor this.
Another reason to refactor this is because the OnArgumentsHandled method is seriously breaking the 10/100 rule.
Let’s start by deleting what we have.
- Delete the Dictionary line from Program.cs.
static Dictionary<string, object[]> CustomParameters = new Dictionary<string, object[]>();
- Delete the section where we populated the dictionary.
// Use the Custom or Common pattern CustomParameters.Add("CreateBucketDirectory", new object[] { s3client, bucketName, Args.Value("Directory") }); CustomParameters.Add("CreateTextFile", new object[] { s3client, bucketName, Args.Value("Filename"), Args.Value("Text") }); CustomParameters.Add("DeleteBucketDirectory", new object[] { s3client, bucketName, Args.Value("Directory") }); CustomParameters.Add("DeleteFile", new object[] { transferUtility, bucketName, Args.Value("Filename") }); CustomParameters.Add("UploadFile", new object[] { transferUtility, bucketName, Args.Value("File"), Args.Value("RemoteDirectory") });
Step 5 – Implement parameters by convention
To refactor the parameter passing, To refactor this, we are going use a convention.
A convention is some arbitrary rule that when followed makes the code work. You have to be very careful when using conventions because they are usually not obvious. Because they are not obvious, the first rule of using a convention is this: Conventions must be documented.
The convention is this: Make the Argument names match the method parameters. Argument names are not case sensitive, so we don’t have to worry about case. Just name.
There are two exceptions to this convention. AmazonsS3Client and TransferUtility. We will handle those exceptions statically in code.
Now, let’s implement our convention.
- For each Argument, make sure the associated parameter is the same name.
- Change bucketName to bucket in all methods.
- Change file to filename in the DeleteFile method.
- Change UploadLocation to RemoteDirectory in the UploadFile method.
- Change directory to LocalDirectory in the UploadFiles method.
- Create the following MethodInfoExtension.cs.
using Amazon; using Amazon.S3; using Amazon.S3.Transfer; using Rhyous.SimpleArgs; using System; using System.Collections.Generic; using System.Configuration; using System.Reflection; namespace Rhyous.AmazonS3BucketManager { public static class MethodInfoExtensions { public static List<object> DynamicallyGenerateParameters(this MethodInfo mi) { var parameterInfoArray = mi.GetParameters(); var parameters = new List<object>(); var region = RegionEndpoint.GetBySystemName(ConfigurationManager.AppSettings["AWSRegion"]); foreach (var paramInfo in parameterInfoArray) { if (paramInfo.ParameterType == typeof(AmazonS3Client) || paramInfo.ParameterType == typeof(TransferUtility)) parameters.Add(Activator.CreateInstance(paramInfo.ParameterType, region)); if (paramInfo.ParameterType == typeof(string)) parameters.Add(Args.Value(paramInfo.Name)); } return parameters; } } }
Notice this class will dynamically query the parameters. AmazonS3Client and TransferUtility are exceptions. The rest of the parameters are created using a convention and pulled from Argument values.
- Update Program.cs to use this new extension method.
internal static void OnArgumentsHandled() { var action = Args.Value("Action"); var flags = BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static | BindingFlags.FlattenHierarchy; MethodInfo mi = typeof(BucketManager).GetMethod(action, flags); List<object> parameters = mi.DynamicallyGenerateParameters(); var task = mi.Invoke(null, parameters.ToArray()) as Task; task.Wait(); }
Notice: Look how simple Program.OnArgumentsHandled method has become. By using this convention, and by moving the parameter creation to an extension method, we are down to six lines. The total size for the Program.cs class is 25 lines, including spaces.
You can now move a directory to an Amazon S3 bucket using C#.
<h3>Design Pattern: Facade</h3>
Yes, we have just implement the popular Facade design pattern.
Our project, and most specifically BucketManger.cs, represent an entire system: Amazon S3. When code is written to represent an entire system or substem, that code is called a Facade.
Go to: Rhyous.AmazonS3BucketManager on GitHub to see the full example project from this 10 part tutorial.
Return to: Managing Amazon AWS with C#