Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the AWS CLI tool. In this article, we will look specifically at running a Hive script.

Elastic MapReduce ruby client

Credentials

~/.aws/credentials.json

1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}

Create the job flow

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
elastic-mapreduce -v \
--create \
--name "Test Hive" \
--instance-group MASTER \
--bid-price 0.06 \
--instance-count 1 \
--instance-type m1.small \
--instance-group CORE \
--bid-price 0.06 \
--instance-count 2 \
--instance-type m1.small \
--hive-script "s3n://elasticmapreduce/samples/hive-ads/libs/model-build.q" \
--args \
"-d","LIBS=s3n://elasticmapreduce/samples/hive-ads/libs",\
"-d","INPUT=s3n://elasticmapreduce/samples/hive-ads/tables",\
"-d","OUTPUT=s3n://my-bucket/hive-ads/output/" \
-c ~/.aws/credentials.json

Output

1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Steps.member.2.HadoopJarStep.Args.member.2=--base-path&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Steps.member.2.HadoopJarStep.Args.member.15=OUTPUT%3Ds3n%3A%2F%2Fmy-bucket%2Fhive-ads%2Foutput%2F&Instances.InstanceGroups.member.2.InstanceType=m1.small&Steps.member.2.HadoopJarStep.Args.member.14=-d&Steps.member.2.HadoopJarStep.Args.member.9=s3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fhive-ads%2Flibs%2Fmodel-build.q&Steps.member.2.HadoopJarStep.Args.member.8=-f&Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar&Steps.member.2.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fhive%2Fhive-script&Steps.member.2.HadoopJarStep.Args.member.10=-d&Instances.InstanceGroups.member.2.InstanceCount=2&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.2.HadoopJarStep.Args.member.11=LIBS%3Ds3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fhive-ads%2Flibs&Steps.member.1.HadoopJarStep.Args.member.2=--base-path&Steps.member.2.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar&Instances.InstanceGroups.member.1.InstanceType=m1.small&AmiVersion=latest&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group&VisibleToAllUsers=false&Steps.member.2.ActionOnFailure=CANCEL_AND_WAIT&Instances.InstanceGroups.member.2.InstanceRole=CORE&SignatureMethod=HmacSHA256&Steps.member.2.HadoopJarStep.Args.member.5=latest&Instances.InstanceGroups.member.1.Market=SPOT&ContentType=JSON&Instances.InstanceGroups.member.1.BidPrice=0.06&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Steps.member.1.HadoopJarStep.Args.member.6=latest&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&Signature=IS8Ni3XztEbL0cA%2BQWgfLNemfKaz9bZihF0m5rnC7cQ%3D&Steps.member.2.HadoopJarStep.Args.member.13=INPUT%3Ds3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fhive-ads%2Ftables&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Instances.InstanceGroups.member.2.BidPrice=0.06&Instances.KeepJobFlowAliveWhenNoSteps=false&Steps.member.2.Name=Run%20Hive%20Script&Name=Test%20Hive&Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fhive%2Fhive-script&Steps.member.2.HadoopJarStep.Args.member.7=--args&Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW&Instances.InstanceGroups.member.2.Market=SPOT&Steps.member.2.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fhive%2F&Steps.member.2.HadoopJarStep.Args.member.4=--hive-versions&Steps.member.1.Name=Setup%20Hive&Steps.member.1.HadoopJarStep.Args.member.5=--hive-versions&Timestamp=2013-03-24T20%3A41%3A19%2B00%3A00&Steps.member.1.HadoopJarStep.Args.member.4=--install-hive&Instances.Ec2KeyName=my-key&Instances.TerminationProtected=false&Steps.member.2.HadoopJarStep.Args.member.12=-d&Steps.member.2.HadoopJarStep.Args.member.6=--run-hive-script&Steps.member.1.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fhive%2F&Action=RunJobFlow
Headers:
x-amzn-RequestId8523674d-0729-4695-9500-a241a159f92eHostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-KIXYB1JD9MX0P

Formatted Output

Output - Requesting URL

1
https://us-east-1.elasticmapreduce.amazonaws.com/

Output - Parameters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.BidPrice=0.06
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.small
Instances.InstanceGroups.member.1.Market=SPOT
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.BidPrice=0.06
Instances.InstanceGroups.member.2.InstanceCount=2
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.small
Instances.InstanceGroups.member.2.Market=SPOT
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=false
Instances.TerminationProtected=false
LogUri=s3n://my-bucket/hadoop/
Name=Test Hive
Signature=IS8Ni3XztEbL0cA+QWgfLNemfKaz9bZihF0m5rnC7cQ=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW
Steps.member.1.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/hive/hive-script
Steps.member.1.HadoopJarStep.Args.member.2=--base-path
Steps.member.1.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/hive/
Steps.member.1.HadoopJarStep.Args.member.4=--install-hive
Steps.member.1.HadoopJarStep.Args.member.5=--hive-versions
Steps.member.1.HadoopJarStep.Args.member.6=latest
Steps.member.1.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
Steps.member.1.Name=Setup Hive
Steps.member.2.ActionOnFailure=CANCEL_AND_WAIT
Steps.member.2.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/hive/hive-script
Steps.member.2.HadoopJarStep.Args.member.10=-d
Steps.member.2.HadoopJarStep.Args.member.11=LIBS=s3n://elasticmapreduce/samples/hive-ads/libs
Steps.member.2.HadoopJarStep.Args.member.12=-d
Steps.member.2.HadoopJarStep.Args.member.13=INPUT=s3n://elasticmapreduce/samples/hive-ads/tables
Steps.member.2.HadoopJarStep.Args.member.14=-d
Steps.member.2.HadoopJarStep.Args.member.15=OUTPUT=s3n://my-bucket/hive-ads/output/
Steps.member.2.HadoopJarStep.Args.member.2=--base-path
Steps.member.2.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/hive/
Steps.member.2.HadoopJarStep.Args.member.4=--hive-versions
Steps.member.2.HadoopJarStep.Args.member.5=latest
Steps.member.2.HadoopJarStep.Args.member.6=--run-hive-script
Steps.member.2.HadoopJarStep.Args.member.7=--args
Steps.member.2.HadoopJarStep.Args.member.8=-f
Steps.member.2.HadoopJarStep.Args.member.9=s3n://elasticmapreduce/samples/hive-ads/libs/model-build.q
Steps.member.2.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
Steps.member.2.Name=Run Hive Script
Timestamp=2013-03-24T20:41:19+00:00
VisibleToAllUsers=false

Output - Headers

1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: 8523674d-0729-4695-9500-a241a159f92e

Output - Non-verbose output

1
Created job flow j-KIXYB1JD9MX0P

API Request

Example API Request

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Test Hive
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.small
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=SPOT
&Instances.InstanceGroups.member.1.BidPrice=0.06
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.small
&Instances.InstanceGroups.member.2.InstanceCount=2
&Instances.InstanceGroups.member.2.Market=SPOT
&Instances.InstanceGroups.member.2.BidPrice=0.06
&Instances.KeepJobFlowAliveWhenNoSteps=false
&Instances.TerminationProtected=false
&Steps.member.1.Name=Setup Hive
&Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW
&Steps.member.1.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
&Steps.member.1.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/hive/hive-script
&Steps.member.1.HadoopJarStep.Args.member.2=--base-path
&Steps.member.1.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/hive/
&Steps.member.1.HadoopJarStep.Args.member.4=--install-hive
&Steps.member.1.HadoopJarStep.Args.member.5=--hive-versions
&Steps.member.1.HadoopJarStep.Args.member.6=latest
&Steps.member.2.Name=Run Hive Script
&Steps.member.2.ActionOnFailure=CANCEL_AND_WAIT
&Steps.member.2.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
&Steps.member.2.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/hive/hive-script
&Steps.member.2.HadoopJarStep.Args.member.2=--base-path
&Steps.member.2.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/hive/
&Steps.member.2.HadoopJarStep.Args.member.4=--hive-versions
&Steps.member.2.HadoopJarStep.Args.member.5=latest
&Steps.member.2.HadoopJarStep.Args.member.6=--run-hive-script
&Steps.member.2.HadoopJarStep.Args.member.7=--args
&Steps.member.2.HadoopJarStep.Args.member.8=-f
&Steps.member.2.HadoopJarStep.Args.member.9=s3n://elasticmapreduce/samples/hive-ads/libs/model-build.q
&Steps.member.2.HadoopJarStep.Args.member.10=-d
&Steps.member.2.HadoopJarStep.Args.member.11=LIBS=s3n://elasticmapreduce/samples/hive-ads/libs
&Steps.member.2.HadoopJarStep.Args.member.12=-d
&Steps.member.2.HadoopJarStep.Args.member.13=INPUT=s3n://elasticmapreduce/samples/hive-ads/tables
&Steps.member.2.HadoopJarStep.Args.member.14=-d
&Steps.member.2.HadoopJarStep.Args.member.15=OUTPUT=s3n://my-bucket/hive-ads/output/
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=false
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
aws --region us-east-1 emr \
run-job-flow \
--name "Test Hive" \
--instances "{
    \"ec_2_key_name\": \"my-key\",
    \"instance_groups\": [
        {
            \"name\": \"Master Instance Group\",
            \"instance_role\": \"MASTER\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 1,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        },
        {
            \"name\": \"Core Instance Group\",
            \"instance_role\": \"CORE\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 2,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        }
    ],
    \"keep_job_flow_alive_when_no_steps\": false,
    \"termination_protected\": false
}" \
--steps "[
    {
        \"name\": \"Setup Hive\",
        \"action_on_failure\": \"TERMINATE_JOB_FLOW\",
        \"hadoop_jar_step\": {
            \"jar\": \"s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar\",
            \"args\": [
                \"s3://us-east-1.elasticmapreduce/libs/hive/hive-script\",
                \"--base-path\",
                \"s3://us-east-1.elasticmapreduce/libs/hive/\",
                \"--install-hive\",
                \"--hive-versions\",
                \"latest\"
            ]
        }
    },
    {
        \"name\": \"Run Hive Script\",
        \"action_on_failure\": \"CANCEL_AND_WAIT\",
        \"hadoop_jar_step\": {
            \"jar\": \"s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar\",
            \"args\": [
                \"s3://us-east-1.elasticmapreduce/libs/hive/hive-script\",
                \"--base-path\",
                \"s3://us-east-1.elasticmapreduce/libs/hive/\",
                \"--hive-versions\",
                \"latest\",
                \"--run-hive-script\",
                \"--args\",
                \"-f\",
                \"s3n://elasticmapreduce/samples/hive-ads/libs/model-build.q\",
                \"-d\",
                \"LIBS=s3n://elasticmapreduce/samples/hive-ads/libs\",
                \"-d\",
                \"INPUT=s3n://elasticmapreduce/samples/hive-ads/tables\",
                \"-d\",
                \"OUTPUT=s3n://my-bucket/hive-ads/output/\"
            ]
        }
    }
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest"

Output

1
2
3
4
5
6
{
    "ResponseMetadata": {
        "RequestId": "97c75823-a659-11e2-8eb9-4fceb28f23f9"
    }, 
    "JobFlowId": "j-GVLURNZ2X5YS"
}

If you want On-Demand rather than spot instances, replace the value of market attribute from SPOT to ON_DEMAND and remove the bid_price attribute.

On Demand Instances

1
2
3
4
5
6
7
        {
            \"name\": \"Master Instance Group\",
            \"instance_role\": \"MASTER\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 1,
            \"market\": \"ON_DEMAND\"
        }

If you want a Task instance group, it has the same format as Master and Core, except instance_role is TASK.

Task Instance Group

1
2
3
4
5
6
7
8
        {
            \"name\": \"Task Instance Group\",
            \"instance_role\": \"TASK\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 2,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        }

If you want it to be visible to all IAM users, add --visible-to-all-users. If you do not want it visible to all IAM users, which is the default, you do not have to do anything.

If you want to have the cluster run in availability zone us-east-1a add the following attribute in the --instances structure

Availability Zone

1
2
3
    \"placement\": {
        \"availability_zone\": \"us-east-1a\"
    }

Resources

Parts in this series