Friday, October 18, 2013

A few things about the HDP Sandbox for Hadoop



The sandbox is really nice to work with;
With that said, a few tidbits that helped me that i want to share:

- There is a shell access from Ambari, the UI, but sometimes you want to access via ssh;

Dont do this:
$ ssh root@127.0.0.1:2222
ssh: Could not resolve hostname 127.0.0.1:2222: nodename nor servname provided, or not known


Do that instead:

urbanlegends-2:~$ ssh -p 2222 root@127.0.0.1

Password should be 'hadoop'.

- If you want to use Hive, and you are installing HDP from scratch, surprise, you cannot use Beeswax (as the time of this writing, Oct, 2013), it is not integrated yet ..
So you will need to install Beeswax separately from Ambari.
Documentation is not complete, and you will need to download (via yum install beeswax).


- adding a jar for a Serde;
Even though you add the jar in the Hue UI File Browser, the jar location may not be picked up properly when using Hive at the command line. And Hue hides the actual path from you ..
Workaround: run your select statement from Beeswax. adding the jar resource in Beeswax. It will then tell you where the jar was added in the log.
I.e. : Added resource: /tmp/hue_3792@sandbox_201310151419_resources/hive-contrib-0.11.0.2.0.5.0-67.jar 

- installation of Hue:
Documentation:

1. After creation of Hue user
(
3. Create a Hue user and either deploy Hue in that user's home directory or under the /usr/share directory.

) documentation omits to say that you need to actually download and install hue..
i.e. this step, mentioned in HDP 1.3 , was forgotten in HDP 2.0:



2. After running the daemon,  via /usr/lib/hue/build/env/bin/supervisor
The IP address needs to remain 0.0.0.0 and the port needs to be a free port (check via netstat). Then the daemon should say something like:
Starting beeswax server on port <port>, talking back to Desktop at <host>
and you can check the UI on the browser. “Desktop” refers to the Hue server (generally the same management node as Ambari).


A few notes:

- Installing g++ : you actually need to install gcc-c++.
i.e. yum install gcc-c++ .

- You can install multiple yum packages at once (in fact, all of the ones listed in the HDP doc) but putting their name all on the same yum install line.

But actually


Hue Integration: as of HDP 2.0, Ambari and Hue are not integrated together. Therefore their users need to be duplicated in each system. You can integrate Hue and Ambari with LDAP(Active directory) , if that is done enterprise users who have access to  have sso in ambari and hue.

 linux boxes will be able to have sso in ambari and hue.

Hue Security: You need to ensure all users created in Hue have access to create Hive jobs. If not, It could be because you do not have /user/<username> directories in HDFS. You have to create user in hdfs before you can use hue , as you need .staging directory for executing map reduce jobs.

Beeswax settings: If there is a specific serde jar which you have to use every time and by all user , you can put same in /usr/lib/hive/lib and restart hue. It will include the directory in class path while starting beeswax. Check beeswax_server.out for more details.



2 comments:

  1. Nice post! For information, the Hue website is gethue.com. You can see the latest updates, docs and help there.

    ReplyDelete

Note: Only a member of this blog may post a comment.