Friday 27 July 2018

Delivering your content faster with CloudFront

About

In this blog, I will be writing about how you can connect your website to Cloud Front and make it load faster across the world. I will also show how to connect your website to Cloud Front if it is using SSL.


What is CloudFront?

Amazon CloudFront is a global content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to your viewers with low latency and high transfer speeds. CloudFront is integrated with AWS – including physical locations that are directly connected to the AWS global infrastructure, as well as software that works seamlessly with services including AWS Shield for DDoS mitigation, Amazon S3, Elastic Load Balancing or Amazon EC2 as origins for your applications, and Lambda@Edge to run custom code close to your viewers.


What is a CloudFront edge location?

CloudFront delivers your content through a worldwide network of data centers called edge locations. The regional edge caches are located between your origin web server and the global edge locations that serve content directly to your viewers. This helps improve performance for your viewers while lowering the operational burden and cost of scaling your origin resources.


How CloudFront helps you deliver content faster?

CloudFront caches your data at its edge locations around the world. So, when a user requests a page from your website it checks the cache and if it is present in the cache returns the response to the user otherwise sends the request your web server which sends the response. If the request made by the user is not cached the cloud front edge location caches it depending on your cache behavior settings.


How the request is getting routed?

After you configure CloudFront to deliver your content, here's what happens when users request your objects:


  1. A user accesses your website or application and requests one or more objects, such as an image file and an HTML file.
  2. DNS routes the request to the CloudFront edge location that can best serve the request—typically the nearest CloudFront edge location in terms of latency—and routes the request to that edge location.
  3. In the edge location, CloudFront checks its cache for the requested files. If the files are in the cache, CloudFront returns them to the user. If the files are not in the cache, it does the following:


  1. CloudFront compares the request with the specifications in your distribution and forwards the request for the files to the applicable origin server for the corresponding file type—for example, to your Amazon S3 bucket for image files and to your HTTP server for the HTML files.
  2. The origin servers send the files back to the CloudFront edge location.

As soon as the first byte arrives from the origin, CloudFront begins to forward the files to the user. CloudFront also adds the files to the cache in the edge location for the next time someone requests those files.



Creating a  distribution

Go to cloud front home on AWS console and click on "create distribution" and then "get started" for "Web".

Origin Settings

Enter the "Origin Domain name". This is the URL of your website. If you are using an elastic beanstalk you have to enter the URL of it.

Leave the Origin Path empty for now.

Enter the Origin ID, this is the unique identifier for your Cloud Front distribution.

For Origin SSL Protocols,  leave it default.

Origin Protocol Policy -> This is how the cloud front makes the connection with your website. If you have SSL setup you can use HTTPS only, if your website does not support HTTPS you can select HTTP only. If your website supports both HTTPS and HTTP you can select match viewer. Although if your website supports both HTTPS, I would suggest using HTTPS Only
Leave the rest settings to default. 

Default Cache Behavior Settings

Set the Viewer Protocol Policy according to your needs. If your website is just on HTTP keep it the default one(HTTP and HTTPS). If you want your website to be only accessed by HTTPS select the second option(Redirect HTTP to HTTPS).

Cache Based on Selected Request Headers -> If you are using HTTP for your website, leave this the default(None). If you are using HTTPS for your website, select Whitelist and click on Host from the list of headers and then click on Add>> to add whitelist it. This is important because if you are connecting to the website for eg https://xyz.com and your origin has an SSL certificate for this URL it will forward the Host header that will tell that the request is actually coming from https://xyz.com.
In my case, I had an Elastic Beanstalk which had an SSL certificate for https://xyz.com on its load balancer. See packet routing to see how the connection is made to your web server.

Object Caching -> If you select Use Origin Cache Headers, you would have to specify the TTL through the headers of the requests and responses. If you select Customize, you can select how long to cache the particular object. You can specify the time for which the objects are getting cached. In creating cache behaviors you will see how you can cache different kind of objects.
Leave the rest settings to default.

Distribution Settings

Price Class-> Select where you want your objects to be cached.

Alternate Domain Names -> This is where you specify the URL from which your website will be accessed from. For eg., if you want to access your website through the URL https:xyz.com, enter xyz.com here. 

SSL Certificate ->  If you are using Alternate Domain Names, you will have to put the SSL certificate of that URL here. If you want to get SSL certificate through AWS Certificate Manager(ACM), make sure you apply for this certificate in the N.Virginia region.
Leave the rest of the settings to default.

From the place you bought your domain, point it to the CloudFront url. 


Creating cache behaviors

Once you have created your distribution, click on it and go to Behaviors and then click on Create Behavior. Cache behaviors tell cloud front what to cache, how long to cache it etc.

Path Pattern -> This tells the cloud front what objects to cache. You can write path to specific objects like images/logo.jpg or you can write *.jpg to cache all the jpg files. The same thing can be done for other files like javascript, CSS. If you change these files in your web-server, you can invalidate the cache and the new files will be cached.

Viewer Protocol Policy -> Do according to the same logic we used earlier.

Cache Based on Selected Request Headers ->  Do according to the same logic we used earlier. If the objects need to be loaded over HTTPS whitelist the Host header.

Object Caching -> For the statics file like all the images you can set a high cache time. Select Customize and write a high value for all the TTL(these values are in seconds).
Leave other settings to default.
Click on create.


Error pages

You can have CloudFront return an object to the viewer (for example, an HTML file) when your Amazon S3 or custom origin returns an HTTP 4xx or 5xx status code to CloudFront. You can also specify how long an error response from your origin or a custom error page is cached in CloudFront edge caches. You can have different error pages that are on your S3 or other location to show error page for different status codes. For eg., if your web-server if not functioning properly and returning 5xx status code you can have the error page for this. 


Invalidating cache

Invalidating cache is important when you update your static content. For eg., if you are caching CSS, Javascript files and you update them in your web-server, the changes would not reflect on the website unless the TTL of the cached objects expires. To force them to update you could invalidate the cache. To do this click on the CloudFront distribution then Invalidations and then click on Create Invalidation. In the Object Paths write the path of the object you need to invalidate, for eg., *.CSS to invalidate all CSS files.  

Geo Restriction 

If you need to prevent users in selected countries from accessing your content, you can specify either a whitelist (countries where they can access your content) or a blacklist (countries where they cannot). For more information, see Restricting the Geographic Distribution of Your Content in the Amazon CloudFront Developer Guide.

Trouble Shooting 

If you are not seeing the changes you made to your website, try invalidating the cache.

If you get a 502 error, there may be a problem with the connection between cloud front and your origin. Try to see if the SSL certificate is set if you are using https. Also, see if you have whitelisted the Host header in your Cache Based on Selected Request Headers.


Useful links





Sunday 22 July 2018

Deploying Django project to Elastic Beanstalk part 2

What is Elastic Beanstalk? 

AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.

You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring. At the same time, you retain full control over the AWS resources powering your application and can access the underlying resources at any time.

In this post, I will be showing you how to deploy your code to elastic Beanstalk through the command line.


Setting up AWS Elastic Beanstalk command line tool

AWS has a command line tool for its elastic beanstalk service which you can install by running the following command.

pip install awsebcli

But before you do this, you should install aws cli.
Please see my previous post named Creating EC2 instance with AWS CLI  to see how to install and setup AWS CLI.

Deploying your code

Creating the configurations and requirements file

  • Create a folder '.ebextension' in your Django root folder.
  • Create a config file in this folder and name it Django.config (name doesn't matter, the extension should be .config).
  • Add the following content in this file:

option_settings:
  aws:elasticbeanstalk:container:python:
    WSGIPath: django_project_name/wsgi.py

  • Replace django_project_name with the name of your Django project.

Elastic Beanstalk uses the requirements.txt file to see which python modules you are using. So put all of the modules you are using in the requirements.txt file and put this file in the root folder of your django code. You can do the by executing the following command:

pip freeze > requirements.txt

Setting up git

AWS EB CLI uses git to track which files it should use for the deployment package. So all the files that you want to deploy should be added to git and the changes must be committed.
Go to the root folder of your django code
Execute the following commands to initialize the folder with git.

git init
git add . 
git commit -m "your commit message"

Please note that git add .  will add all the files in current folder and its subfolders. If you don't want this add the files that you don't want to include in a .gitignore file or add each file you want in the deployment package manually one by one.

Once you are done with the git you need to initialize the directory with Elastic Beanstalk as well and then create an environment in it.

Initializing Elastic Beanstalk

Execute the following command to initialize:

eb init

When you execute this command it will ask you for the options like application name, region etc. The first option you have to enter is the default region where you want your Elastic Beanstalk to be located. You should select a region that is close to the place where your website will be used the most as it would improve its load time. If you plan to set up a content delivery network for your website the region doesn't matter much.



 The second thing you have to set up is the application to use. This is the application on Elastic Beanstalk. If you already have an application where you want to deploy the code select it from the list otherwise select create new application.




Set up the language you are using. As we are setting this up for Django it will be python. It will automatically detect that and you just need to confirm it and provide the python version you are using for your Django project.


Setting up SSH. If you want to ssh into the EC2 which gets created by the Elastic Beanstalk select yes from the options. From my experience, I have found it really helpful to ssh into the EC2 to see live logs(tail -f). Enter the name of the key file that will be used to SSH into the EC2. This file will be saved in your .SSH folder.

Creating an Environment

Use eb create command to create an environment. You will be asked to enter your environment name and the load balancer type. Set the name to whatever you want and the load balancer type to classic. You can learn more about load balancers here.
Once you do this your code will be deployed to Elastic Beanstalk. This may take a few minutes. Once done you can use eb open to open the Django project in web browser.

Updating your code 

Add the changes you made in your code to git and commit them.
Use eb deploy to deploy the latest code.

Troubleshooting

Checking the logs

There are two ways to see the logs. The first way is to use the command eb logs and the logs will be displayed in ther terminal. The second way is to ssh( eb SSH) into the EC2 and see the logs with tail, nano etc. 




Wednesday 11 July 2018

Deploying Django project to Elastic Beanstalk part 1

What is Elastic Beanstalk? 

AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.

You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring. At the same time, you retain full control over the AWS resources powering your application and can access the underlying resources at any time.

How do you deploy your code to Elastic Beanstalk?

There are two ways to deploy your Django project to Elastic Beanstalk. The first method is through the AWS console and the other way is through the Elastic Beanstalk command line tool. In this blog, I will be showing you how to deploy the Django project through the AWS console. I assume that you already have a working Django project.

Creating a zip of your code

   1. Create a folder '.ebextension' in your Django root folder.
   2. Create a config file in this folder and name it Django.config (name doesn't matter, the extension should be .config).
   3. Add the following content in this file:

option_settings:
  aws:elasticbeanstalk:container:python:
    WSGIPath: django_project_name/wsgi.py

    4. Replace django_project_name with the name of your Django project.
    5. Add a requirements.txt file in your Django project root folder and put the name of all the required libraries there. You can do the by executing the following command:

pip freeze > requirements.txt

    6. Create a zip of your Django root folder and make sure that .ebextentions folder is in zip.

Uploading your code to elastic beanstalk


  1. Go the Elastic Beanstalk page on AWS console.
  2. Click on get started
  3. Give a name to your application
  4. Set its platform to python
  5. Select upload your code and upload the zip file that you created in the previous step.
  6. Click on 'create application'.

It will take a few minutes to create an environment for your app.



There is an option to configure more settings before you create an application but you can change these configurations anytime so, for now, let's just leave that. These configurations are for things like setting up load-balancer, attaching databases, setting environment variables.

Redeploying previous version of your code

Whenever you deploy(upload) your code to Elastic Beanstalk, Elastic Beanstalk keeps track of different deployments. So anytime if you feel like the new code you deployed is not good and you want to revert to the previous version you can just select the version from AWS Elastic Beanstalk console. To do this go to your environments homepage on Elastic Beanstalk and click on upload and deploy button in the middle of the screen.


After you click on upload and deploy, click on application versions page. This will show you the list of your previous deployments from which you can select any one and deploy it again.

Troubleshooting

If you get the error Your WSGIPath refers to a file that does not exist. Go to the Django.config file in .ebextensions folder and make sure that the WSGI path is right.

You can see the logs as well to get detailed info of what is happening on the EC2 instance that Elastic Beanstalk creates to run your code. There are two ways to see the logs. First and the easy way is to go to the environment webpage you just created. On the left-hand side, you will see that there is a Logs option, click on it and then on request logs, and then you can select whether you want last 100 lines or the complete logs. The second way is to ssh into the underlying EC2 instance and see the logs. I will tell you how to do this in the second part where I also write about how to deploy your Django project through the command line.