Amazon EC2 Instance

About

← Back

Amazon EC2

Reference: https://medium.com/@josemarcialportilla/getting-spark-python-and-jupyter-notebook-running-on-amazon-ec2-dec599e1c297

Amazon Elastic Compute Cloud (Amazon EC2)

Web service providing resizable cloud computing; sort of like a virtual machine (VM)

Quick Checks

  • Verify that instances are turned off to limit usage
  • Verify security for ports

Workflow

  1. Create EC2 Ubuntu instance on AWS
  2. Connect to EC2 instance via PuTTY SSH client on Windows
  3. Setup instance with applicable Python libraries, including Jupyter access, Spark & Hadoop
  4. Access Jupyter Notebook for data operations
  5. Terminate EC2 instance when complete

EC2 Setup Guide

There are many guides available online documenting similar processes and they may differ in configurations and successful deployment. Regardless, the following is my reference that I have been able to use to set up an EC2 Ubuntu instance for use with Spark.

Create EC2 Instance

  1. Amazon Machine Image (AMI)
    • Preference is an Ubuntu Server
  2. Instance Type
    • CPU/Memory: Specify as applicable to project requirements
  3. Instance Configuration
    • Number of Instances: 1, unless intent is to deploy to cluster of instances
    • Storage: 8 GB General Purpose SSD (Default)
    • Tag Instance
    • Key: name (ex. myinstance)

      Value: webserver (ex. mymachine)

      Note that these values are case-sensitive.

    • Security Group Configuration
  4. Review Instance
  5. Key Pair
  6. Launch Instances