Objectives
Access your individual jumphost that has already been setup for each student
Create, access and then terminate an AWS EMR cluster
Responsible use of cloud resources
Introduction
In order to use AWS EMR, RMIT ITS has created a relatively inexpensive jumphost on AWS for each student enrolled in the course. From this jumphost you will be to create, access and manage an EMR cluster. Creating a cluster requires allocation of additional physical machines in the AWS datacentre (which takes minutes to happen, depending on the size of cluster). However, as the EMR clusters are more expensive machines (and cost by time that the cluster is left set up), it is important that clusters are not left running after you have finished using them (and saved any output). So, at the end of a lab class, it is very important you terminate your cluster.
The jumphost and any EMR clusters launched will all be in the AWS US East Region (N.Virginia) (Also known as "us-east-1" or "Standard"). This means any clusters you create will have access to the Public Data Sets hosted in S3 that are provided by AWS in that region.
Example of student jumphost DNS: sXXXXXXX.jump.cosc2637.route53.aws.rmit.edu.au
Each student will have a personal SSH key and it will provide access to your jumphost and EMR cluster.
You can run ./create_cluster.sh from your jumphost to launch your EMR cluster. After the script finishes running, there will be further instructions on how to access that specific cluster (Hue and Hadoop Master node).
Please do not forget to run ./terminate_cluster.sh (from your jumphost) each time you have finished using your cluster (e.g., at the end of the lab class).
The steps to get you on to the AWS environment (Windows)
Keys (these should be emailed to you at or prior to your practical class)
1. Save the key location on your device
2. Convert the key using PuTTYGen in Windows OS
Converting Your Private Key Using PuTTYgen
PuTTY does not natively support the private key format (.pem) generated by Amazon EC2. PuTTY has a tool named PuTTYgen, which can convert keys to the required PuTTY format (.ppk). You must convert your private key into this format (.ppk) before attempting to connect to your instance using PuTTY.
To convert your private key
a. Start PuTTYgen (for example, from the Start menu, choose All Programs > PuTTY > PuTTYgen).
b. Under Type of key to generate, choose RSA.
c. Choose Load. By default, PuTTYgen displays only files with the extension .ppk. To locate your .pem file, select the option to display files of all types.
d. Select your .pem file for the key pair that you specified when you launched your instance, and then choose Open. Choose OK to dismiss the confirmation dialog box.
e. Choose Save private key to save the key in the format that PuTTY can use. PuTTYgen displays a warning about saving the key without a passphrase. Choose Yes. Note: A passphrase on a private key is an extra layer of protection, so even if your private key is discovered, it can't be used without the passphrase. The downside to using a passphrase is that it makes automation harder because human intervention is needed to log on to an instance or copy files to an instance.
f. Specify the same name for the key (.pem). PuTTYgen automatically adds the .ppk file extension.
Your private key is now in the correct format for use with PuTTY. You can now connect to your instance using PuTTY's SSH client. Links for more information:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstances.html http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html
3. Set up PuTTY connection
a. Configuration required to prevent the inactive shell phenomenon
b. Add private key you saved earlier (extracted from the key pair file emailed to you)
c. Enter jump host name:
ec2-user@sXXXXXXX.jump.cosc2637.route53.aws.rmit.edu.au Where XXXXXXX is your student number
d. Save settings
4. Open PuTTY connection
Open button
5. If you entered a passphrase when saving the private key, you will be prompted for it now.
6. You should now have the AWS $ prompt
7. Enter ./create_cluster.sh
Note: sh create_cluster.sh also works
The EMR cluster will now be created. It can take 15 minutes or more. The script begins by checking whether a cluster already exists, so the ValidationError is expected as it is simply saying that the cluster doesn’t already exist.
8. Once the cluster is created, it will show the URL you need use to connect to your cluster
9. Use your browser to use the Hadoop environment via Hue. You will need to setup an account each time as the cluster is new. http://sXXXXXXX.hue.cosc2637.route53.aws.rmit.edu.au:8888
For example, my Hue account is “e20925”. After logging in HUE, click Files, I will see the HDFS file system like below:
Using WinSCP for transferring key to JumpHost home (the key should be the xxx-xxx.pem emailed to you at or prior to your practical class)
1. Enter jump host name, user and Click Advanced, then select Advanced
2. Under SSH |Authentication, browse to the private key you extracted from the emailed key pair
3. Click OK, and when you’re back to the Login screen, save the connection if desired.
4. Login. If you entered a passphrase when saving the private key, you will be prompted for it now.
5. In the left panel, browse to the location of the key pair file you were emailed (*.pwm), then drag this to the right panel (the jump host), to copy the key pair file.
Then, go to JumpHost home (as shown in the last picture in page 5 of this document) and you should find xxx-xxx.pem. Next
$ chmod 400 xxx-xxx.pem
And login the newly created cluster by ssh hadoop@sxxxx.emr.... as shown below
If logging in successfully, you will see
Leave the cluster by inputting “exit”, then you will return back to Jumphost.
Don’t forget to shutdown the cluster using terminate_cluster.sh. The shutting down will take about 5 minutes during which you’ll still be able to access it via the browser.
A common issue of cluster login
You may see the following error when ssh from your jumphost to the hadoop master node,
Every time a new cluster is created it is a new set of hosts so the ssh host key changes, and ssh gives you a warning that it has been changed. To fix the problem, please run the following:
$ssh-keygen -R sxxxxxxx.emr.cosc2637.route53.aws.rmit.edu.au
AWS EMR - Cheat sheet
If you need any help related to big data, Big Data EMR or other big data related help. Then you can contact Us at:
realcode4you@gmail.com
Comments