HowTo: AWS CLI Elastic MapReduce - Streaming Job Flow
Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the AWS CLI tool. In this article, we will look specifically at running a Streaming cluster.
Elastic MapReduce ruby client
Credentials
~/.aws/credentials.json
1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}
Create the job flow
Console - user@hostname ~ $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
elastic-mapreduce -v \
--create \
--name "Test Streaming" \
--instance-group MASTER \
--bid-price 0.06 \
--instance-count 1 \
--instance-type m1.small \
--instance-group CORE \
--bid-price 0.06 \
--instance-count 2 \
--instance-type m1.small \
--stream \
--input s3n://elasticmapreduce/samples/wordcount/input \
--mapper s3://elasticmapreduce/samples/wordcount/wordSplitter.py \
--reducer aggregate \
--output s3n://my-bucket/streaming \
-c ~/.aws/credentials.json
Output
1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Steps.member.1.HadoopJarStep.Args.member.7=-reducer&Instances.KeepJobFlowAliveWhenNoSteps=false&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Steps.member.1.HadoopJarStep.Args.member.5=-mapper&Steps.member.1.HadoopJarStep.Args.member.4=s3n%3A%2F%2Fmy-bucket%2Fstreaming&Instances.Ec2KeyName=my-key&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Instances.InstanceGroups.member.2.InstanceType=m1.small&Name=Test%20Streaming&Steps.member.1.HadoopJarStep.Args.member.3=-output&Steps.member.1.HadoopJarStep.Jar=%2Fhome%2Fhadoop%2Fcontrib%2Fstreaming%2Fhadoop-streaming.jar&Instances.InstanceGroups.member.1.Market=SPOT&Timestamp=2013-05-16T00%3A38%3A34%2B00%3A00&Instances.InstanceGroups.member.1.BidPrice=0.06&Instances.InstanceGroups.member.2.Market=SPOT&VisibleToAllUsers=false&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Steps.member.1.HadoopJarStep.Args.member.8=aggregate&Instances.InstanceGroups.member.2.InstanceRole=CORE&Instances.TerminationProtected=false&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT&Steps.member.1.Name=Example%20Streaming%20Step&Instances.InstanceGroups.member.1.InstanceType=m1.small&ContentType=JSON&Steps.member.1.HadoopJarStep.Args.member.2=s3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fwordcount%2Finput&Signature=bxw4ztNqxh8%2F3fvhKq72FS%2BxIG8A3v9YNejf2tCwCkk%3D&Instances.InstanceGroups.member.2.InstanceCount=2&Action=RunJobFlow&Instances.InstanceGroups.member.2.BidPrice=0.06&Steps.member.1.HadoopJarStep.Args.member.1=-input&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&Steps.member.1.HadoopJarStep.Args.member.6=s3%3A%2F%2Felasticmapreduce%2Fsamples%2Fwordcount%2FwordSplitter.py&AmiVersion=latest&SignatureMethod=HmacSHA256&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group
Headers:
x-amzn-RequestId8cbb460f-5a8c-48b4-ae10-6da34e0af803Hostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-R48QY5DF8OS3M
Formatted Output
Output - Requesting URL
1
https://us-east-1.elasticmapreduce.amazonaws.com/
Output - Parameters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.BidPrice=0.06
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.small
Instances.InstanceGroups.member.1.Market=SPOT
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.BidPrice=0.06
Instances.InstanceGroups.member.2.InstanceCount=2
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.small
Instances.InstanceGroups.member.2.Market=SPOT
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=false
Instances.TerminationProtected=false
LogUri=s3n://my-bucket/hadoop/
Name=Test Streaming
Signature=bxw4ztNqxh8/3fvhKq72FS+xIG8A3v9YNejf2tCwCkk=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
Steps.member.1.HadoopJarStep.Args.member.1=-input
Steps.member.1.HadoopJarStep.Args.member.2=s3n://elasticmapreduce/samples/wordcount/input
Steps.member.1.HadoopJarStep.Args.member.3=-output
Steps.member.1.HadoopJarStep.Args.member.4=s3n://my-bucket/streaming
Steps.member.1.HadoopJarStep.Args.member.5=-mapper
Steps.member.1.HadoopJarStep.Args.member.6=s3://elasticmapreduce/samples/wordcount/wordSplitter.py
Steps.member.1.HadoopJarStep.Args.member.7=-reducer
Steps.member.1.HadoopJarStep.Args.member.8=aggregate
Steps.member.1.HadoopJarStep.Jar=/home/hadoop/contrib/streaming/hadoop-streaming.jar
Steps.member.1.Name=Example Streaming Step
Timestamp=2013-05-16T00:38:34+00:00
VisibleToAllUsers=false
Output - Headers
1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: 8cbb460f-5a8c-48b4-ae10-6da34e0af803
Output - Non-verbose output
1
Created job flow j-R48QY5DF8OS3M
API Request
Example API Request
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Test Streaming
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.small
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=SPOT
&Instances.InstanceGroups.member.1.BidPrice=0.06
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.small
&Instances.InstanceGroups.member.2.InstanceCount=2
&Instances.InstanceGroups.member.2.Market=SPOT
&Instances.InstanceGroups.member.2.BidPrice=0.06
&Instances.KeepJobFlowAliveWhenNoSteps=false
&Instances.TerminationProtected=false
&Steps.member.1.Name=Example Streaming Step
&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
&Steps.member.1.HadoopJarStep.Jar=/home/hadoop/contrib/streaming/hadoop-streaming.jar
&Steps.member.1.HadoopJarStep.Args.member.1=-input
&Steps.member.1.HadoopJarStep.Args.member.2=s3n://elasticmapreduce/samples/wordcount/input
&Steps.member.1.HadoopJarStep.Args.member.3=-output
&Steps.member.1.HadoopJarStep.Args.member.4=s3n://my-bucket/streaming
&Steps.member.1.HadoopJarStep.Args.member.5=-mapper
&Steps.member.1.HadoopJarStep.Args.member.6=s3://elasticmapreduce/samples/wordcount/wordSplitter.py
&Steps.member.1.HadoopJarStep.Args.member.7=-reducer
&Steps.member.1.HadoopJarStep.Args.member.8=aggregate
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=false
&*AUTHPARAMS*
AWS CLI
Console - user@hostname ~ $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
aws --region us-east-1 emr \
run-job-flow \
--name "Test Streaming" \
--instances "{
\"ec_2_key_name\": \"my-key\",
\"instance_groups\": [
{
\"name\": \"Master Instance Group\",
\"instance_role\": \"MASTER\",
\"instance_type\": \"m1.small\",
\"instance_count\": 1,
\"market\": \"SPOT\",
\"bid_price\": \"0.06\"
},
{
\"name\": \"Core Instance Group\",
\"instance_role\": \"CORE\",
\"instance_type\": \"m1.small\",
\"instance_count\": 2,
\"market\": \"SPOT\",
\"bid_price\": \"0.06\"
}
],
\"keep_job_flow_alive_when_no_steps\": false,
\"termination_protected\": false
}" \
--steps "[
{
\"name\": \"Example Streaming Step\",
\"action_on_failure\": \"CANCEL_AND_WAIT\",
\"hadoop_jar_step\": {
\"jar\": \"/home/hadoop/contrib/streaming/hadoop-streaming.jar\",
\"args\": [
\"-input\",
\"s3n://elasticmapreduce/samples/wordcount/input\",
\"-output\",
\"s3n://my-bucket/streaming\",
\"-mapper\",
\"s3://elasticmapreduce/samples/wordcount/wordSplitter.py\",
\"-reducer\",
\"aggregate\"
]
}
}
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest"
Output
1
2
3
4
5
6
{
"ResponseMetadata": {
"RequestId": "5a35d4dd-4e03-4fd0-a911-00b9dbb4c60d"
},
"JobFlowId": "j-2ED0BNL7QVENH"
}
Resources
Parts in this series
- HowTo: AWS CLI Elastic MapReduce
- HowTo: AWS CLI Elastic MapReduce - Hive Script
- HowTo: AWS CLI Elastic MapReduce - Interactive Hive
- HowTo: AWS CLI Elastic MapReduce - Pig Script
- HowTo: AWS CLI Elastic MapReduce - Interactive Pig
- HowTo: AWS CLI Elastic MapReduce - Streaming Job Flow
- HowTo: AWS CLI Elastic MapReduce - Cascading Job Flow
- HowTo: AWS CLI Elastic MapReduce - Custom JAR Job Flow
- HowTo: AWS CLI Elastic MapReduce - HBase