Sometimes the user needs to schedule a complex report generation and download it once ready.
Putting this into a “user story” fashion it will become:
as logged User i want to generate a monthly report and download it.
This simple scenario hides several problems:
UX problem
Request timeout
Scheduling a long running task
Persisting the generated content
Making the content available
Setup ACL on the content
Disk space of the system
You can’t let a user click a button and let him waiting five minutes or more to get the generated file, because he will have a bad user experience and probably will see a timeout error instead. He will eventually start hating and cursing you..
The true story
We need to find another way to accomplish this task. We could try splitting it in separate phases.
Lets’ reformulate the previous “user story”
as logged User i want to provide a one-time only password and generate a monthly report.
I want to receive an email with detailed instruction explaining how to download the archive.
I want to visit the URL inside the email, insert the previous password and download the archive.
The archive must be deleted after two hours.
The new phases are:
Report generation
Email delivery
Report Download
Report cleanup
Now we need to have new “actors” on the main stage.
A job to generate the report, we will call it ReportExporterWorker
A Mailer to send email with download instruction, we will call it ReportMailer
A controller used to check user ACL and serve the report, we will call it ReportsController
A job used to cleanup the report after two hours, we will call it ReportExportCleaner
Now we can try addressing all the previous problems.
Ruby Environment
The application and gems used in this post are:
Ruby on Rails 3.x
Devise Gem
Sidekiq Gem
Routes
We need to add some new routes inside route.rb file.
Index route
The first route is used to display a form with the email field inside.
This is the index action inside the ReportsController
The form will show two fields. The email will be filled with the User’s email but will be editable to allow the use of a different one.
The password is used to make data unreadable by others.
Here the form’s part of the view:
This is the DownloadRequest object used inside the form and controller (@download). It’s PORO object plus the methods needed to be used inside a form and some validation rules for its fields.
Generate route
The second route is triggered with the click on submit button inside the previous form.
Generate action
The generate action inside the ReportsController initialize a new DownloadRequest object with the request’s parameters and perform the validation
on email and password field.
If everything is ok a new ReportExporterWorker job will be scheduled.
This is the generate action inside the ReportsController
ReportExporterWorker
Generate the report
The ReportExporterWorker has several steps into his perform method. Let’s dive into it.
Scheduling the job
First we can create a Sidekiq job to handle the report creation. This job create a new file under "#{Rails.root}/tmp/reports" folder.
The perform method will invoke the LastMonthReportGenerator, a service used to generate the report.
When the file is created we compress it to lower the size.
Sending the content
Now we have a compressed file.
We setup the data that will be crypted.
We crypt the expiration and the file path with DownloadEncrypter.
The encoded path is passed to ReportMailer mailer. The email contains the URL needed to download the report.
The Mailer is pretty simple. It pick the path and email from the argument and deliver the email.
Full ReportMailer source code:
and the email template
Cleanup our system
After the email delivery we’ll schedule another job named ReportExportCleaner. It’s responsible for the report file deletion from the system.
We also delete the uncompressed report file from system.
Full ReportExporterWorker source code:
Avoid leaking sensitive data
In this scenario we don’t use any external storage service, we just save the report in a temporary folder.
We don’t persist any report information into our database, so we need a way to pass the archive’s path between each step.
PayloadBuilder will help us to create a data with the following information:
report expiration
report file path
The expiration will be within two hours after the report creation.
The file path is taken from the argument.
The extract_expiration the extract_path methods do the opposite.
The encrypt method will return the combination of IV + separator + our_secret_data + separator + tag in Base64
The decrypt method perform decode the Base64, then split the result by the separator and collect each needed piece of information to decrypt our data.
Full DownloadEncrypter source code:
Report cleanup
ReportExportCleaner will delete the report from our system. The perform method receive a file_name as argument.
Sanitize the argument
This job could be exploited by malicious users trying passing path argument like '../../..some_file_name' and deceiving the job and deleting file into our system so we had to find a way to sanitize the argument.
First step
The first step is checking if the file_name is valid.
This is accomplished by sanitize_file_name method called inside is_valid?
The sanitize_file_name method was taken from this blog post
of Gavin Miller (@gavin_miller).
I decide to use both whitelist and blacklist approach. To do this, I only use the basename of file without the extension part as argument (the . character is not allowed in the whitelist).
Second step
The second step is checking if a file "#{Rails.root}/tmp/reports/#{file_name}.gz" exists (where file_name is the argument of perform).
If all the steps are ok we can safely delete the file.
Full ReportExportCleaner source code:
Report download
The third route is triggered when the User click inside the report link inside the email.
Decode action
The action reads the id parameter and shows a view with a form containing a password field and an hidden field with the id inside.
The view lets the User input the previous password and click download to invoke the download action.
This is the view:
Download action
The last route is called after the User has submitted the form after filling it with the previous password.
The action reads id and password parameter from the request and tries to decrypt it using decrypt method of DownloadEncrypter.
After it will decode the base64 unencrypted data.
After it retrieves the expiration with extract_expiration method of PayloadBuilder and check for invalidation
The last part retrieves the report path using extract_path of PayloadBuilder and use send_file to start the download.
This is the download action inside the ReportsController
This is the Download object used inside the form and ReportsController’s download method.
It’s PORO object plus the methods needed to be used inside a form and some validation rules for his fields.
Authorization
We use Devise’s directive to check authorizations on each action:
Full ReportsController source code:
Final thoughts
There is still space for improvements.
My solution is far from being the best way to address this task, but I hope it is a starting point to help you tackling this problem.
There are several topics that i’d like to improve which may be subject of next blog posts.
Strong Parameters
You should use strong parameters to validate the params inside each controller action. This example is based on an old application that needs to be updated.
If you have a similar scenario this answer could be a starting point.
Filename sanitization
I’m not sure about this solution. I fear there are additional ways to circumvent the checks done.
Initialization vector into our data
I had some doubts about putting components such as the initialization vector (IV) inside our output but, according to this answer, it should be legit.
Waiting for your feedbacks
Thanks for reading up here.
I’d like to hear some suggestions from you about my solution.
Leave A Comment