HowTo: AWS CLI Elastic MapReduce - Interactive Pig
Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the AWS CLI tool. In this article, we will look specifically at running an interactive pig session.
Credentials
~/.aws/credentials.json
1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}
Start cluster
Elastic MapReduce Ruby Client
Console - user@hostname ~ $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
elastic-mapreduce -v \
--create \
--name "Interactive Pig" \
--alive \
--instance-group MASTER \
--bid-price 0.06 \
--instance-count 1 \
--instance-type m1.small \
--instance-group CORE \
--bid-price 0.06 \
--instance-count 2 \
--instance-type m1.small \
--pig-interactive \
--visible-to-all-users \
-c ~/.aws/credentials.json
Output
1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Instances.KeepJobFlowAliveWhenNoSteps=true&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Steps.member.1.HadoopJarStep.Args.member.5=--pig-versions&Steps.member.1.HadoopJarStep.Args.member.4=--install-pig&Instances.Ec2KeyName=my-key&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Instances.InstanceGroups.member.2.InstanceType=m1.small&Name=Interactive%20Pig&Steps.member.1.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2F&Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar&Instances.InstanceGroups.member.1.Market=SPOT&Timestamp=2013-05-15T23%3A00%3A39%2B00%3A00&Instances.InstanceGroups.member.1.BidPrice=0.06&Instances.InstanceGroups.member.2.Market=SPOT&VisibleToAllUsers=true&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Instances.InstanceGroups.member.2.InstanceRole=CORE&Instances.TerminationProtected=false&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW&Steps.member.1.Name=Setup%20Pig&Instances.InstanceGroups.member.1.InstanceType=m1.small&ContentType=JSON&Steps.member.1.HadoopJarStep.Args.member.2=--base-path&Signature=6Dy2%2BAiRbk6Hq8RGyoKb1imcy94Xm9ESEzN1jEH7FVc%3D&Instances.InstanceGroups.member.2.InstanceCount=2&Action=RunJobFlow&Instances.InstanceGroups.member.2.BidPrice=0.06&Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2Fpig-script&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&Steps.member.1.HadoopJarStep.Args.member.6=latest&AmiVersion=latest&SignatureMethod=HmacSHA256&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group
Headers:
x-amzn-RequestId1a1b9c8c-3b5c-4ef2-bf20-249c4b7c4fdaHostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-WWF2N0603H0D9
Formatted Output
Output - Requesting URL
1
https://us-east-1.elasticmapreduce.amazonaws.com/
Output - Parameters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.BidPrice=0.06
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.small
Instances.InstanceGroups.member.1.Market=SPOT
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.BidPrice=0.06
Instances.InstanceGroups.member.2.InstanceCount=2
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.small
Instances.InstanceGroups.member.2.Market=SPOT
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=true
Instances.TerminationProtected=false
LogUri=s3n://my-bucket/hadoop/
Name=Interactive Pig
Signature=6Dy2+AiRbk6Hq8RGyoKb1imcy94Xm9ESEzN1jEH7FVc=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW
Steps.member.1.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/pig/pig-script
Steps.member.1.HadoopJarStep.Args.member.2=--base-path
Steps.member.1.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/pig/
Steps.member.1.HadoopJarStep.Args.member.4=--install-pig
Steps.member.1.HadoopJarStep.Args.member.5=--pig-versions
Steps.member.1.HadoopJarStep.Args.member.6=latest
Steps.member.1.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
Steps.member.1.Name=Setup Pig
Timestamp=2013-05-15T23:00:39+00:00
VisibleToAllUsers=true
Output - Headers
1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: 1a1b9c8c-3b5c-4ef2-bf20-249c4b7c4fda
Output - Non-verbose output
1
Created job flow j-WWF2N0603H0D9
API Request
Example API Request
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Interactive Pig
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.small
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=SPOT
&Instances.InstanceGroups.member.1.BidPrice=0.06
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.small
&Instances.InstanceGroups.member.2.InstanceCount=2
&Instances.InstanceGroups.member.2.Market=SPOT
&Instances.InstanceGroups.member.2.BidPrice=0.06
&Instances.KeepJobFlowAliveWhenNoSteps=true
&Instances.TerminationProtected=false
&Steps.member.1.Name=Setup Pig
&Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW
&Steps.member.1.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
&Steps.member.1.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/pig/pig-script
&Steps.member.1.HadoopJarStep.Args.member.2=--base-path
&Steps.member.1.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/pig/
&Steps.member.1.HadoopJarStep.Args.member.4=--install-pig
&Steps.member.1.HadoopJarStep.Args.member.5=--pig-versions
&Steps.member.1.HadoopJarStep.Args.member.6=latest
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=true
&*AUTHPARAMS*
AWS CLI
Console - user@hostname ~ $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
aws --region us-east-1 emr \
run-job-flow \
--name "Interactive Pig" \
--instances "{
\"ec_2_key_name\": \"my-key\",
\"instance_groups\": [
{
\"name\": \"Master Instance Group\",
\"instance_role\": \"MASTER\",
\"instance_type\": \"m1.small\",
\"instance_count\": 1,
\"market\": \"SPOT\",
\"bid_price\": \"0.06\"
},
{
\"name\": \"Core Instance Group\",
\"instance_role\": \"CORE\",
\"instance_type\": \"m1.small\",
\"instance_count\": 2,
\"market\": \"SPOT\",
\"bid_price\": \"0.06\"
}
],
\"keep_job_flow_alive_when_no_steps\": true,
\"termination_protected\": false
}" \
--steps "[
{
\"name\": \"Setup Pig\",
\"action_on_failure\": \"TERMINATE_JOB_FLOW\",
\"hadoop_jar_step\": {
\"jar\": \"s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar\",
\"args\": [
\"s3://us-east-1.elasticmapreduce/libs/pig/pig-script\",
\"--base-path\",
\"s3://us-east-1.elasticmapreduce/libs/pig/\",
\"--install-pig\",
\"--pig-versions\",
\"latest\"
]
}
}
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest"
Output
1
2
3
4
5
6
{
"ResponseMetadata": {
"RequestId": "9c8dbce7-bdb9-11e2-965a-07fb1be53dc4"
},
"JobFlowId": "j-3TYHC7VKXA235"
}
Describe Cluster
Elastic MapReduce Ruby Client
Console - user@hostname ~ $
1
elastic-mapreduce --describe j-3TYHC7VKXA235
API Request
Example API Request
1
2
3
4
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=DescribeJobFlows
&JobFlowIds.member.1=j-3TYHC7VKXA235
&*AUTHPARAMS*
AWS CLI
Console - user@hostname ~ $
1
2
3
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3TYHC7VKXA235\"]"
Output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
{
"JobFlows": [
{
"Name": "Interactive Pig",
"BootstrapActions": [],
"Instances": {
"InstanceCount": 3,
"Placement": {
"AvailabilityZone": "us-east-1e"
},
"MasterPublicDnsName": "ec2-23-23-54-39.compute-1.amazonaws.com",
"NormalizedInstanceHours": 0,
"MasterInstanceId": "i-062c1b6f",
"InstanceGroups": [
{
"ReadyDateTime": "2013-05-15T23:56:30Z",
"InstanceType": "m1.small",
"InstanceRole": "MASTER",
"InstanceRunningCount": 1,
"State": "RUNNING",
"BidPrice": "0.06",
"Market": "SPOT",
"StartDateTime": "2013-05-15T23:54:39Z",
"InstanceGroupId": "ig-3A46UIUD1WED7",
"CreationDateTime": "2013-05-15T23:46:06Z",
"InstanceRequestCount": 1,
"LastStateChangeReason": "",
"Name": "Master Instance Group"
},
{
"ReadyDateTime": "2013-05-15T23:56:43Z",
"InstanceType": "m1.small",
"InstanceRole": "CORE",
"InstanceRunningCount": 2,
"State": "RUNNING",
"BidPrice": "0.06",
"Market": "SPOT",
"StartDateTime": "2013-05-15T23:56:43Z",
"InstanceGroupId": "ig-OWLA04KCES02",
"CreationDateTime": "2013-05-15T23:46:06Z",
"InstanceRequestCount": 2,
"LastStateChangeReason": "",
"Name": "Core Instance Group"
}
],
"MasterInstanceType": "m1.small",
"TerminationProtected": false,
"HadoopVersion": "1.0.3",
"KeepJobFlowAliveWhenNoSteps": true,
"SlaveInstanceType": "m1.small",
"Ec2KeyName": "my-key"
},
"Steps": [
{
"ExecutionStatusDetail": {
"State": "COMPLETED",
"EndDateTime": "2013-05-15T23:57:47Z",
"CreationDateTime": "2013-05-15T23:46:06Z",
"StartDateTime": "2013-05-15T23:56:42Z"
},
"StepConfig": {
"HadoopJarStep": {
"Args": [
"s3://us-east-1.elasticmapreduce/libs/pig/pig-script",
"--base-path",
"s3://us-east-1.elasticmapreduce/libs/pig/",
"--install-pig",
"--pig-versions",
"latest"
],
"Jar": "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar",
"Properties": []
},
"Name": "Setup Pig",
"ActionOnFailure": "TERMINATE_JOB_FLOW"
}
}
],
"ExecutionStatusDetail": {
"State": "WAITING",
"ReadyDateTime": "2013-05-15T23:56:43Z",
"CreationDateTime": "2013-05-15T23:46:06Z",
"StartDateTime": "2013-05-15T23:56:43Z",
"LastStateChangeReason": "Waiting after step completed"
},
"VisibleToAllUsers": false,
"JobFlowId": "j-3TYHC7VKXA235",
"LogUri": "s3n://my-bucket/hadoop/",
"AmiVersion": "2.3.5",
"SupportedProducts": []
}
],
"ResponseMetadata": {
"RequestId": "936faeca-bdbb-11e2-8815-b3eb27409c27"
}
}
Connect to Master
Wait until the execution state is WAITING
Console - user@hostname ~ $
1
2
3
4
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3TYHC7VKXA235\"]" \
| jq -r '.JobFlows[0].ExecutionStatusDetail.State'
Output
1
WAITING
Get the master public DNS name
Console - user@hostname ~ $
1
2
3
4
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3TYHC7VKXA235\"]" \
| jq -r '.JobFlows[0].Instances.MasterPublicDnsName'
Output
1
ec2-204-236-247-160.compute-1.amazonaws.com
SSH to the master using the SSH key specified when starting the cluster and with the username hadoop
.
Console - user@hostname ~ $
1
ssh -i ~/.ssh/my-key.pem hadoop@ec2-204-236-247-160.compute-1.amazonaws.com
Run pig
on the master for our interactive session.
Console - hadoop@master ~ $
1
pig
Terminate Cluster
Elastic MapReduce Ruby Client
Console - user@hostname ~ $
1
elastic-mapreduce --terminate j-3TYHC7VKXA235
API Request
Example API Request
1
2
3
4
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=TerminateJobFlows
&JobFlowIds.member.1=j-3TYHC7VKXA235
&*AUTHPARAMS*
AWS CLI
Console - user@hostname ~ $
1
2
3
aws --region us-east-1 emr \
terminate-job-flows \
--job-flow-ids "[\"j-3TYHC7VKXA235\"]"
Output
1
2
3
4
5
{
"ResponseMetadata": {
"RequestId": "c4b9efa2-bdbb-11e2-b959-a99f5a815d16"
}
}
Parts in this series
- HowTo: AWS CLI Elastic MapReduce
- HowTo: AWS CLI Elastic MapReduce - Hive Script
- HowTo: AWS CLI Elastic MapReduce - Interactive Hive
- HowTo: AWS CLI Elastic MapReduce - Pig Script
- HowTo: AWS CLI Elastic MapReduce - Interactive Pig
- HowTo: AWS CLI Elastic MapReduce - Streaming Job Flow
- HowTo: AWS CLI Elastic MapReduce - Cascading Job Flow
- HowTo: AWS CLI Elastic MapReduce - Custom JAR Job Flow
- HowTo: AWS CLI Elastic MapReduce - HBase