HowTo: AWS CLI Elastic MapReduce - Cascading Job Flow
Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the AWS CLI tool. In this article, we will look specifically at running a Cascading cluster.
Elastic MapReduce ruby client
Credentials
~/.aws/credentials.json
1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}
Create the job flow
Console - user@hostname ~ $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
elastic-mapreduce -v \
--create \
--name "Test Cascading" \
--instance-group MASTER \
--bid-price 0.06 \
--instance-count 1 \
--instance-type m1.small \
--instance-group CORE \
--bid-price 0.06 \
--instance-count 2 \
--instance-type m1.small \
--jar "s3n://elasticmapreduce/samples/cloudfront/logprocessor.jar" \
--args \
"-input","s3n://elasticmapreduce/samples/cloudfront/input",\
"-start","any",\
"-end","2010-12-27-02 300",\
"-output","s3n://my-bucket/cloudfront/output/2010-12-27-02",\
"-overallVolumeReport",\
"-objectPopularityReport",\
"-clientIPReport",\
"-edgeLocationReport" \
-c ~/.aws/credentials.json
Output
1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Steps.member.1.HadoopJarStep.Args.member.7=-output&Instances.KeepJobFlowAliveWhenNoSteps=false&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Steps.member.1.HadoopJarStep.Args.member.5=-end&Steps.member.1.HadoopJarStep.Args.member.4=any&Instances.Ec2KeyName=my-key&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Instances.InstanceGroups.member.2.InstanceType=m1.small&Name=Test%20Cascading&Steps.member.1.HadoopJarStep.Args.member.3=-start&Steps.member.1.HadoopJarStep.Jar=s3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fcloudfront%2Flogprocessor.jar&Steps.member.1.HadoopJarStep.Args.member.9=-overallVolumeReport&Instances.InstanceGroups.member.1.Market=SPOT&Timestamp=2013-05-16T00%3A28%3A39%2B00%3A00&Instances.InstanceGroups.member.1.BidPrice=0.06&Instances.InstanceGroups.member.2.Market=SPOT&VisibleToAllUsers=false&Steps.member.1.HadoopJarStep.Args.member.10=-objectPopularityReport&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Steps.member.1.HadoopJarStep.Args.member.8=s3n%3A%2F%2Fmy-bucket%2Fcloudfront%2Foutput%2F2010-12-27-02&Instances.InstanceGroups.member.2.InstanceRole=CORE&Instances.TerminationProtected=false&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.1.HadoopJarStep.Args.member.11=-clientIPReport&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT&Steps.member.1.Name=Example%20Jar%20Step&Instances.InstanceGroups.member.1.InstanceType=m1.small&ContentType=JSON&Steps.member.1.HadoopJarStep.Args.member.2=s3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fcloudfront%2Finput&Signature=BpHRNVUCIPnfi%2B8rLQEpdr3chl7Bjiw5AOh4GZzChbs%3D&Instances.InstanceGroups.member.2.InstanceCount=2&Action=RunJobFlow&Instances.InstanceGroups.member.2.BidPrice=0.06&Steps.member.1.HadoopJarStep.Args.member.1=-input&Steps.member.1.HadoopJarStep.Args.member.12=-edgeLocationReport&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&Steps.member.1.HadoopJarStep.Args.member.6=2010-12-27-02%20300&AmiVersion=latest&SignatureMethod=HmacSHA256&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group
Headers:
x-amzn-RequestIdc96170a8-24d5-41fd-bd68-4a32cc7cf85dHostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-0G438CH39THZW
Formatted Output
Output - Requesting URL
1
https://us-east-1.elasticmapreduce.amazonaws.com/
Output - Parameters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.BidPrice=0.06
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.small
Instances.InstanceGroups.member.1.Market=SPOT
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.BidPrice=0.06
Instances.InstanceGroups.member.2.InstanceCount=2
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.small
Instances.InstanceGroups.member.2.Market=SPOT
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=false
Instances.TerminationProtected=false
LogUri=s3n://my-bucket/hadoop/
Name=Test Cascading
Signature=BpHRNVUCIPnfi+8rLQEpdr3chl7Bjiw5AOh4GZzChbs=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
Steps.member.1.HadoopJarStep.Args.member.1=-input
Steps.member.1.HadoopJarStep.Args.member.10=-objectPopularityReport
Steps.member.1.HadoopJarStep.Args.member.11=-clientIPReport
Steps.member.1.HadoopJarStep.Args.member.12=-edgeLocationReport
Steps.member.1.HadoopJarStep.Args.member.2=s3n://elasticmapreduce/samples/cloudfront/input
Steps.member.1.HadoopJarStep.Args.member.3=-start
Steps.member.1.HadoopJarStep.Args.member.4=any
Steps.member.1.HadoopJarStep.Args.member.5=-end
Steps.member.1.HadoopJarStep.Args.member.6=2010-12-27-02 300
Steps.member.1.HadoopJarStep.Args.member.7=-output
Steps.member.1.HadoopJarStep.Args.member.8=s3n://my-bucket/cloudfront/output/2010-12-27-02
Steps.member.1.HadoopJarStep.Args.member.9=-overallVolumeReport
Steps.member.1.HadoopJarStep.Jar=s3n://elasticmapreduce/samples/cloudfront/logprocessor.jar
Steps.member.1.Name=Example Jar Step
Timestamp=2013-05-16T00:28:39+00:00
VisibleToAllUsers=false
Output - Headers
1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: c96170a8-24d5-41fd-bd68-4a32cc7cf85d
Output - Non-verbose output
1
Created job flow j-0G438CH39THZW
API Request
Example API Request
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Test Cascading
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.small
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=SPOT
&Instances.InstanceGroups.member.1.BidPrice=0.06
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.small
&Instances.InstanceGroups.member.2.InstanceCount=2
&Instances.InstanceGroups.member.2.Market=SPOT
&Instances.InstanceGroups.member.2.BidPrice=0.06
&Instances.KeepJobFlowAliveWhenNoSteps=false
&Instances.TerminationProtected=false
&Steps.member.1.Name=Example Jar Step
&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
&Steps.member.1.HadoopJarStep.Jar=s3n://elasticmapreduce/samples/cloudfront/logprocessor.jar
&Steps.member.1.HadoopJarStep.Args.member.1=-input
&Steps.member.1.HadoopJarStep.Args.member.2=s3n://elasticmapreduce/samples/cloudfront/input
&Steps.member.1.HadoopJarStep.Args.member.3=-start
&Steps.member.1.HadoopJarStep.Args.member.4=any
&Steps.member.1.HadoopJarStep.Args.member.5=-end
&Steps.member.1.HadoopJarStep.Args.member.6=2010-12-27-02 300
&Steps.member.1.HadoopJarStep.Args.member.7=-output
&Steps.member.1.HadoopJarStep.Args.member.8=s3n://my-bucket/cloudfront/output/2010-12-27-02
&Steps.member.1.HadoopJarStep.Args.member.9=-overallVolumeReport
&Steps.member.1.HadoopJarStep.Args.member.10=-objectPopularityReport
&Steps.member.1.HadoopJarStep.Args.member.11=-clientIPReport
&Steps.member.1.HadoopJarStep.Args.member.12=-edgeLocationReport
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=false
&*AUTHPARAMS*
AWS CLI
Console - user@hostname ~ $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
aws --region us-east-1 emr \
run-job-flow \
--name "Test Cascading" \
--instances "{
\"ec_2_key_name\": \"my-key\",
\"instance_groups\": [
{
\"name\": \"Master Instance Group\",
\"instance_role\": \"MASTER\",
\"instance_type\": \"m1.small\",
\"instance_count\": 1,
\"market\": \"SPOT\",
\"bid_price\": \"0.06\"
},
{
\"name\": \"Core Instance Group\",
\"instance_role\": \"CORE\",
\"instance_type\": \"m1.small\",
\"instance_count\": 2,
\"market\": \"SPOT\",
\"bid_price\": \"0.06\"
}
],
\"keep_job_flow_alive_when_no_steps\": false,
\"termination_protected\": false
}" \
--steps "[
{
\"name\": \"Example Jar Step\",
\"action_on_failure\": \"CANCEL_AND_WAIT\",
\"hadoop_jar_step\": {
\"jar\": \"s3n://elasticmapreduce/samples/cloudfront/logprocessor.jar\",
\"args\": [
\"-input\",
\"s3n://elasticmapreduce/samples/cloudfront/input\",
\"-start\",
\"any\",
\"-end\",
\"2010-12-27-02 300\",
\"-output\",
\"s3n://my-bucket/cloudfront/output/2010-12-27-02\",
\"-overallVolumeReport\",
\"-objectPopularityReport\",
\"-clientIPReport\",
\"-edgeLocationReport\"
]
}
}
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest"
Output
1
2
3
4
5
6
{
"ResponseMetadata": {
"RequestId": "b1c37304-8d77-42d3-a678-97518e3dc3b1"
},
"JobFlowId": "j-Y0KOFGCVBPO87"
}
Resources
Parts in this series
- HowTo: AWS CLI Elastic MapReduce
- HowTo: AWS CLI Elastic MapReduce - Hive Script
- HowTo: AWS CLI Elastic MapReduce - Interactive Hive
- HowTo: AWS CLI Elastic MapReduce - Pig Script
- HowTo: AWS CLI Elastic MapReduce - Interactive Pig
- HowTo: AWS CLI Elastic MapReduce - Streaming Job Flow
- HowTo: AWS CLI Elastic MapReduce - Cascading Job Flow
- HowTo: AWS CLI Elastic MapReduce - Custom JAR Job Flow
- HowTo: AWS CLI Elastic MapReduce - HBase