Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the AWS CLI tool. In this article, we will look specifically at running an interactive hive session.

Credentials

~/.aws/credentials.json

1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}

Start cluster

Elastic MapReduce Ruby Client

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
elastic-mapreduce -v \
--create \
--name "Interactive Hive" \
--alive \
--instance-group MASTER \
--bid-price 0.06 \
--instance-count 1 \
--instance-type m1.small \
--instance-group CORE \
--bid-price 0.06 \
--instance-count 2 \
--instance-type m1.small \
--instance-group TASK \
--bid-price 0.06 \
--instance-count 2 \
--instance-type m1.small \
--hive-interactive \
--visible-to-all-users \
-c ~/.aws/credentials.json

Output

1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Instances.KeepJobFlowAliveWhenNoSteps=true&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Steps.member.1.HadoopJarStep.Args.member.5=--hive-versions&Steps.member.1.HadoopJarStep.Args.member.4=--install-hive&Instances.Ec2KeyName=my-key&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Name=Interactive%20Hive&Instances.InstanceGroups.member.2.InstanceType=m1.small&Steps.member.1.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fhive%2F&Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar&Instances.InstanceGroups.member.1.Market=SPOT&Timestamp=2013-04-16T05%3A21%3A03%2B00%3A00&Instances.InstanceGroups.member.1.BidPrice=0.06&Instances.InstanceGroups.member.2.Market=SPOT&VisibleToAllUsers=true&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Instances.InstanceGroups.member.3.InstanceRole=TASK&Instances.InstanceGroups.member.2.InstanceRole=CORE&Instances.TerminationProtected=false&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW&Instances.InstanceGroups.member.3.InstanceType=m1.small&Instances.InstanceGroups.member.3.InstanceCount=2&Steps.member.1.Name=Setup%20Hive&Instances.InstanceGroups.member.3.BidPrice=0.06&Instances.InstanceGroups.member.3.Market=SPOT&Instances.InstanceGroups.member.1.InstanceType=m1.small&ContentType=JSON&Steps.member.1.HadoopJarStep.Args.member.2=--base-path&Signature=gDujEissx3TYMAGGGMM7vJX%2Bfu%2FYzxZnAOIA5ogKm34%3D&Instances.InstanceGroups.member.2.InstanceCount=2&Instances.InstanceGroups.member.3.Name=Task%20Instance%20Group&Action=RunJobFlow&Instances.InstanceGroups.member.2.BidPrice=0.06&Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fhive%2Fhive-script&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&Steps.member.1.HadoopJarStep.Args.member.6=latest&AmiVersion=latest&SignatureMethod=HmacSHA256&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group
Headers:
x-amzn-RequestIdb3b3a806-213e-49de-a3f1-a705d687f67fHostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-A94KBLK016YJA

Formatted Output

Output - Requesting URL

1
https://us-east-1.elasticmapreduce.amazonaws.com/

Output - Parameters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.BidPrice=0.06
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.small
Instances.InstanceGroups.member.1.Market=SPOT
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.BidPrice=0.06
Instances.InstanceGroups.member.2.InstanceCount=2
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.small
Instances.InstanceGroups.member.2.Market=SPOT
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.InstanceGroups.member.3.BidPrice=0.06
Instances.InstanceGroups.member.3.InstanceCount=2
Instances.InstanceGroups.member.3.InstanceRole=TASK
Instances.InstanceGroups.member.3.InstanceType=m1.small
Instances.InstanceGroups.member.3.Market=SPOT
Instances.InstanceGroups.member.3.Name=Task Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=true
Instances.TerminationProtected=false
LogUri=s3n://my-bucket/hadoop/
Name=Interactive Hive
Signature=gDujEissx3TYMAGGGMM7vJX+fu/YzxZnAOIA5ogKm34=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW
Steps.member.1.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/hive/hive-script
Steps.member.1.HadoopJarStep.Args.member.2=--base-path
Steps.member.1.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/hive/
Steps.member.1.HadoopJarStep.Args.member.4=--install-hive
Steps.member.1.HadoopJarStep.Args.member.5=--hive-versions
Steps.member.1.HadoopJarStep.Args.member.6=latest
Steps.member.1.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
Steps.member.1.Name=Setup Hive
Timestamp=2013-04-16T05:21:03+00:00
VisibleToAllUsers=true

Output - Headers

1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: b3b3a806-213e-49de-a3f1-a705d687f67f

Output - Non-verbose output

1
Created job flow j-A94KBLK016YJA

API Request

Example API Request

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Interactive Hive
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.small
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=SPOT
&Instances.InstanceGroups.member.1.BidPrice=0.06
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.small
&Instances.InstanceGroups.member.2.InstanceCount=2
&Instances.InstanceGroups.member.2.Market=SPOT
&Instances.InstanceGroups.member.2.BidPrice=0.06
&Instances.InstanceGroups.member.3.Name=Task Instance Group
&Instances.InstanceGroups.member.3.InstanceRole=TASK
&Instances.InstanceGroups.member.3.InstanceType=m1.small
&Instances.InstanceGroups.member.3.InstanceCount=2
&Instances.InstanceGroups.member.3.Market=SPOT
&Instances.InstanceGroups.member.3.BidPrice=0.06
&Instances.KeepJobFlowAliveWhenNoSteps=true
&Instances.TerminationProtected=false
&Steps.member.1.Name=Setup Hive
&Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW
&Steps.member.1.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
&Steps.member.1.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/hive/hive-script
&Steps.member.1.HadoopJarStep.Args.member.2=--base-path
&Steps.member.1.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/hive/
&Steps.member.1.HadoopJarStep.Args.member.4=--install-hive
&Steps.member.1.HadoopJarStep.Args.member.5=--hive-versions
&Steps.member.1.HadoopJarStep.Args.member.6=latest
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=true
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
aws --region us-east-1 emr \
run-job-flow \
--name "Interactive Hive" \
--instances "{
    \"ec_2_key_name\": \"my-key\",
    \"instance_groups\": [
        {
            \"name\": \"Master Instance Group\",
            \"instance_role\": \"MASTER\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 1,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        },
        {
            \"name\": \"Core Instance Group\",
            \"instance_role\": \"CORE\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 2,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        },
        {
            \"name\": \"Task Instance Group\",
            \"instance_role\": \"TASK\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 2,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        }
    ],
    \"keep_job_flow_alive_when_no_steps\": true,
    \"termination_protected\": false
}" \
--steps "[
    {
        \"name\": \"Setup Hive\",
        \"action_on_failure\": \"TERMINATE_JOB_FLOW\",
        \"hadoop_jar_step\": {
            \"jar\": \"s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar\",
            \"args\": [
                \"s3://us-east-1.elasticmapreduce/libs/hive/hive-script\",
                \"--base-path\",
                \"s3://us-east-1.elasticmapreduce/libs/hive/\",
                \"--install-hive\",
                \"--hive-versions\",
                \"latest\"
            ]
        }
    }
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest" \
--visible-to-all-users

Output

1
2
3
4
5
6
{
    "ResponseMetadata": {
        "RequestId": "9a10b614-a65a-11e2-ba8d-d59ce4b37f90"
    },
    "JobFlowId": "j-Y9CC7P8SFNAU"
}

Describe cluster

Elastic MapReduce Ruby Client

Console - user@hostname ~ $

1
elastic-mapreduce --describe j-Y9CC7P8SFNAU

API Request

Example API Request

1
2
3
4
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=DescribeJobFlows
&JobFlowIds.memeber.1=j-Y9CC7P8SFNAU
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-Y9CC7P8SFNAU\"]"

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
{
    "JobFlows": [
        {
            "Name": "Interactive Hive", 
            "BootstrapActions": [], 
            "Instances": {
                "InstanceCount": 5, 
                "Placement": {
                    "AvailabilityZone": "us-east-1e"
                }, 
                "MasterPublicDnsName": "ec2-204-236-247-160.compute-1.amazonaws.com", 
                "NormalizedInstanceHours": 0, 
                "MasterInstanceId": "i-ac9ca5cf", 
                "InstanceGroups": [
                    {
                        "ReadyDateTime": "2013-04-16T06:10:17Z", 
                        "InstanceType": "m1.small", 
                        "InstanceRole": "MASTER", 
                        "InstanceRunningCount": 1, 
                        "State": "RUNNING", 
                        "BidPrice": "0.06", 
                        "Market": "SPOT", 
                        "StartDateTime": "2013-04-16T06:08:42Z", 
                        "InstanceGroupId": "ig-2Z6XY7WSAQKOL", 
                        "CreationDateTime": "2013-04-16T05:58:03Z", 
                        "InstanceRequestCount": 1, 
                        "LastStateChangeReason": "", 
                        "Name": "Master Instance Group"
                    }, 
                    {
                        "ReadyDateTime": "2013-04-16T06:10:24Z", 
                        "InstanceType": "m1.small", 
                        "InstanceRole": "CORE", 
                        "InstanceRunningCount": 2, 
                        "State": "RUNNING", 
                        "BidPrice": "0.06", 
                        "Market": "SPOT", 
                        "StartDateTime": "2013-04-16T06:10:24Z", 
                        "InstanceGroupId": "ig-3UPDGFCSJLXNW", 
                        "CreationDateTime": "2013-04-16T05:58:03Z", 
                        "InstanceRequestCount": 2, 
                        "LastStateChangeReason": "", 
                        "Name": "Core Instance Group"
                    }, 
                    {
                        "ReadyDateTime": "2013-04-16T06:14:55Z", 
                        "InstanceType": "m1.small", 
                        "InstanceRole": "TASK", 
                        "InstanceRunningCount": 2, 
                        "State": "RUNNING", 
                        "BidPrice": "0.06", 
                        "Market": "SPOT", 
                        "StartDateTime": "2013-04-16T06:14:55Z", 
                        "InstanceGroupId": "ig-YQX07WNO910N", 
                        "CreationDateTime": "2013-04-16T05:58:03Z", 
                        "InstanceRequestCount": 2, 
                        "LastStateChangeReason": "Resizing complete", 
                        "Name": "Task Instance Group"
                    }
                ], 
                "MasterInstanceType": "m1.small", 
                "TerminationProtected": false, 
                "HadoopVersion": "1.0.3", 
                "KeepJobFlowAliveWhenNoSteps": true, 
                "SlaveInstanceType": "m1.small", 
                "Ec2KeyName": "my-key"
            }, 
            "Steps": [
                {
                    "ExecutionStatusDetail": {
                        "State": "COMPLETED", 
                        "EndDateTime": "2013-04-16T06:11:35Z", 
                        "CreationDateTime": "2013-04-16T05:58:03Z", 
                        "StartDateTime": "2013-04-16T06:10:23Z"
                    }, 
                    "StepConfig": {
                        "HadoopJarStep": {
                            "Args": [
                                "s3://us-east-1.elasticmapreduce/libs/hive/hive-script", 
                                "--base-path", 
                                "s3://us-east-1.elasticmapreduce/libs/hive/", 
                                "--install-hive", 
                                "--hive-versions", 
                                "latest"
                            ], 
                            "Jar": "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar", 
                            "Properties": []
                        }, 
                        "Name": "Setup Hive", 
                        "ActionOnFailure": "TERMINATE_JOB_FLOW"
                    }
                }
            ], 
            "ExecutionStatusDetail": {
                "State": "WAITING", 
                "ReadyDateTime": "2013-04-16T06:10:24Z", 
                "CreationDateTime": "2013-04-16T05:58:03Z", 
                "StartDateTime": "2013-04-16T06:10:24Z", 
                "LastStateChangeReason": "Waiting after step completed"
            }, 
            "VisibleToAllUsers": true, 
            "JobFlowId": "j-Y9CC7P8SFNAU", 
            "LogUri": "s3n://my-bucket/hadoop/", 
            "AmiVersion": "2.3.3", 
            "SupportedProducts": []
        }
    ], 
    "ResponseMetadata": {
        "RequestId": "71ec785e-a660-11e2-a5bf-b1f408d32c54"
    }
}

Connect to Master

Wait until the execution state is WAITING

Console - user@hostname ~ $

1
2
3
4
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-Y9CC7P8SFNAU\"]" \
| jq -r '.JobFlows[0].ExecutionStatusDetail.State'

Output

1
WAITING

Get the master public DNS name

Console - user@hostname ~ $

1
2
3
4
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-Y9CC7P8SFNAU\"]" \
| jq -r '.JobFlows[0].Instances.MasterPublicDnsName'

Output

1
ec2-204-236-247-160.compute-1.amazonaws.com

SSH to the master using the SSH key specified when starting the cluster and with the username hadoop.

Console - user@hostname ~ $

1
ssh -i ~/.ssh/my-key.pem hadoop@ec2-204-236-247-160.compute-1.amazonaws.com

Run hive on the master for our interactive session.

Console - hadoop@master ~ $

1
hive

Terminate Cluster

Elastic MapReduce Ruby Client

Console - user@hostname ~ $

1
elastic-mapreduce --terminate j-Y9CC7P8SFNAU

API Request

Example API Request

1
2
3
4
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=TerminateJobFlows
&JobFlowIds.member.1=j-Y9CC7P8SFNAU
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
aws --region us-east-1 emr \
terminate-job-flows \
--job-flow-ids "[\"j-Y9CC7P8SFNAU\"]"

Output

1
2
3
4
5
{
    "ResponseMetadata": {
        "RequestId": "b4dcda43-a660-11e2-830b-5523a2fd5603"
    }
}

Parts in this series