I looked around for a while, and found this somewhat out of date tutorial by Tom White that pointed me to a set of ec2 helper scripts in the src/contrib subdir of the Hadoop installation.
Unfortunately, those scripts did not get me 'all the way there', but they were a start. I'm going to try to roll my changes into those ec2 helper scripts before I have to do set up another cluster:)
Setting Up a Multi Node Hadoop Cluster on EC2
Prior to setting up a Multi Node Hadoop Cluster, I set up a single node standalone installation. I recommend doing this because it allowed me to make sure my code worked, i.e. my jar file was valid, my Mapper and Reducer were working, etc.
In order to set up a multi node hadoop cluster, the standard Hadoop Cluster setup instructions applied to the EC2 environment meant that I would have to do the following
- find an AMI with hadoop on it
- bring up N+1 of those
- make one the master and the rest the slaves
- change the master config to account for the slaves
- change the slave config to point to the master
- allow Hadoop component port access between master and slave for namenode and datanode communication
- start the system up.
The scripts at {hadoop src location}/src/contrib/ec2/bin use the ec2 API shells to do attempt to do all of the above. They fall short in a couple of key areas, and need to be extended. I'm going to detail the necessary steps I took to get a cluster fully operational so that I can extend those scripts in the future.
What the scripts do:
- Find AMIs and starting up instances, N slave instances and 1 master instance.
- Allow you to log into the master as well as push files out to it.
- Generate a private/public key on the master, and push the public key out to the slaves to enable password-less ssh.
- Push the master hadoop-site.xml out to all slaves.
What they do not do:
- they do not configure the master conf/slaves file to contain the IPs of all slaves.
- they do not set up security groups with overridden port values specified in the /etc/rc.local of the AMI I was using. Those values are catted to conf/hadoop-site.xml. To be honest, there is no way they could actually be aware of those values unless the scripts were synchronized to the image, which they weren't.
Both of these mean that true distributed startup doesn't happen. But the failure is 'silent', so unless you are looking at the logs on multiple machines, you don't know that things are failing.
Initial Script Setup Steps
Here are the steps I used to get working with the scripts. Note that the AMI the scripts point to by default has version 0.17 of Hadoop installed.
(1) I configured my EC2_PRIVATE_KEY and EC2_CERT env vars to point to the .pem files I generated for them.
(2) In {hadoop src location}/src/contrib/ec2/bin/hadoop-ec2-env.sh, I set the following env vars:
(2) In {hadoop src location}/src/contrib/ec2/bin/hadoop-ec2-env.sh, I set the following env vars:
- AWS_ACCOUNT_ID={acct number}
- AWS_ACCESS_KEY_ID={key id}
- AWS_SECRET_ACCESS_KEY={secret key}
- KEY_NAME={name of KeyPair you want to use} NOTE: on the KeyPair, the hadoop-ec2 scripts assume that the generated private key for your keypair resides in the same directory you configured your EC2_PRIVATE_KEY in.
At this point, I thought the cluster was up and running, but when I tried to copy a large file to the cluster, I got this error:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/root/input could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1145)
at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
I googled this and found out that it implied that my data nodes were failing (but I hadn't seen that!). I checked the masters and slaves files in the master machine conf file, and they only contained localhost, which meant that the master knew nothing about the slaves at startup.
I stopped hadoop, changed the conf/slaves config file to include the Amazon internal names of all slaves, and restarted. This time I could see the remote slave data nodes start up. So I tried the copy again, and got the same failure.
When I went out to a slave machine, I looked at the datanode log file in the log directory (on this AMI, configured at /mnt/hadoop/logs. I saw that the datanode service was trying to contact the master with no success.
This ended up being because of the security policy of EC2. In EC2, you need to explicitly configure which ports are accessible on each instance via EC2 Security Groups. In summary, the current scripts assumed defaults from hadoop-default.xml, and I had overridden some of those defaults in hadoop-site.xml.
Summary:
- extended conf/slaves to include IP addresses of slave instances.
- added 50001 and 50002 access to the master security group (meaning that slave nodes could talk to the master on those ports)
- added 50010 access to the slave security group (same meaning for master to slaves)
At this point Hadoop was configured with 4 slave nodes and 1 master.
Attaching an EBS to the master and copying EBS data to HDFS
(1) My data source was located in a Elastic Block Store volume. These are mounted like so:
ec2-attach-volume {volume ID} -i {image ID} -d {device location on image, i.e. /dev/sda}
(2) In order to actually access the data, you mount that volume like you would mount a drive:
mount {name of dir to map to} {name of device to mount, i.e. /dev/sda}
Running the Job
(1) Once I mounted the volume, I needed to log into the master to start the job.
{hadoop src location}/src/contrib/ec2/bin/hadoop-ec2 {name of cluster} login
(2) Then I need to push the file to HDFS for processing.
hadoop fs -mkdir input
hadoop fs -copyFromLocal {location of articles.tsv} input
(3) From my local box, I push my jar out to the master:
{hadoop src location}/src/contrib/ec2/bin/hadoop-ec2 {name of cluster} push {name of my jar file}
(4) On the master box, I start the job:
hadoop jar {name of jar} input output (NOTE: job will fail immediately if output exists in hdfs)
Arun, have you tried the Cloudera AMI for EC2? That said, it still doesn't auto-mount EBS volumes for you... this is something I'm interested in doing but don't yet have enough knowledge to get my head around it...
ReplyDeletep7a, I'm actually in the middle of writing up a guest post for the Cloudera guys, who contacted me after reading this last post. It was _way_ easier. Stay tuned, I'll update when this post goes out on the Cloudera blog.
ReplyDeleteWhen you say cluster here, do you mean, you have more than 1 machine at your end, and all these machines execute one job on EC2, Please correct me if i am wrong. Actually I have just started reading your post, its too long, so taking time to understand time.
ReplyDeleteYou answers may help me to understand your post better
The best way to do this for starters is to install, configure and test a "local" Hadoop setup for each of the two Ubuntu boxes, and in a second step to "merge" these two single-node clusters into one multi-node cluster in which one Ubuntu box will become the designated master, and the other box will become only a slave.
ReplyDeleteThis is a great inspiring tutorials.I am pretty much pleased with your good work.You put really very helpful information. Keep it up.
ReplyDeleteHadoop Training in hyderabad
Nice piece of article you have shared here, my dream of becoming a hadoop professional become true with the help of hadoop training in velachery, keep up your good work of sharing quality articles.
ReplyDeleteI am following your blog from the beginning, it was so distinct & I had a chance to collect conglomeration of information that helps me a lot to improvise myself.
ReplyDeleteActually, you have explained the technology to the fullest. Thanks for sharing the information you have got. It helped me a lot. I experimented your thoughts in my training program.
ReplyDeleteHadoop Training Chennai
Hadoop Training in Chennai
Big Data Training in Chennai
Hi admin thanks for sharing informative article on hadoop technology. In coming years, hadoop and big data handling is going to be future of computing world. This field offer huge career prospects for talented professionals. Thus, taking Hadoop Training in Chennai will help you to enter big data technology.
ReplyDeleteCloud is one of the tremendous technology that any company in this world would rely on(Salesforce Training). Using this technology many tough tasks can be accomplished easily in no time. Your content are also explaining the same(Salesforce administrator training in chennai). Thanks for sharing this in here. You are running a great blog, keep up this good work.
ReplyDeleteThere will be a lot of difference in attending hadoop online training center compared to attending a live classroom training. However, websites like this with rich in information will be very useful for gaining additional knowledge.
ReplyDeleteTruely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)
ReplyDeleteSoftware testing training in chennai | Testing courses in chennai | Software testing course
This comment has been removed by a blog administrator.
ReplyDeleteThis content is so informatics and it was motivating all the programmers and beginners to switch over the career into the Big Data Technology. This article is so impressed and keeps updating us regularly.
ReplyDeleteHadoop Training in Chennai | Hadoop Training Chennai | Big Data Training in Chennai
Thank you so much for sharing... alternatives to Lucky Patcher
ReplyDeleteThank you for posting the nice information about Hadoop.
ReplyDeleteHadoop Training In Bangalore
thanks for sharing useful information
ReplyDeleteHadoop Training in Bangalore
Digital Marketing Training in Bangalore
Angularjs Training in Bangalore
Nice Job!
ReplyDeleteSAP FICO INTERVIEW QUESTIONS AND ANSWERS FOR EXPERIENCED
Qlikview interview questions
sap workflow interview questions
Thank you for sharing keep going like this and post on Mobile App Development also
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteAmazing post. It will be very helpful for beginners like me. Thank you very much for this kind of post.Waiting for your next blog.
ReplyDeleteSelenium Training in Chennai
Best Selenium Training Institute in Chennai
ios developer training in chennai
Best ios Training institute in Chennai
Digital marketing Training institute in chennai
Digital marketing course chennai
Your blog is very attractive. I much thanks for your great article. I want more updating keep it up....
ReplyDeleteSEO Course in Aminjikarai
SEO Training in Vadapalani
SEO Training in Tnagar
SEO Course in Karappakkam
SEO Training in Padur
SEO Training in Sholinganallur
It is a great post. Keep sharing such kind of useful information.
ReplyDeleteentrepreneursoutlook
Article submission sites
This comment has been removed by the author.
ReplyDeletegood work done and keep update more.i like your information's and
ReplyDeletethat is very much useful for readers.
German Training in Nungambakkam
German Training in Mogappair
german training bangalore
institutes to learn german in bangalore
Amazing information,thank you for your ideas.after along time i have studied
ReplyDeletean interesting information's.we need more updates in your blog.
Android Training in chennai
Android Training courses near me
Android Training in Anna Nagar
android developer course in bangalore
insighting article. i really found useful. thank you for sharing experience. nice article
ReplyDeleteYour article gives lots of information to me. I really appreciate your efforts admin, continue sharing more like this.
ReplyDeleteRPA Training in Chennai
RPA Training near me
Robotics Process Automation Training in Chennai
RPA courses in Chennai
AWS Training in Chennai
DevOps Training in Chennai
I have gone through your blog, it was very much useful for me and because of your blog, and also I gained many unknown information, the way you have clearly explained is really fantastic. Kindly post more like this, Thank You.
ReplyDeleteairport ground staff training courses in chennai
airport ground staff training in chennai
ground staff training in chennai
Awesome Post. The content show cases your in-depth knowledge. Thanks for Sharing.
ReplyDeletePrimavera Training
Primavera p6 Training
Primavera Training in Velachery
Primavera Courses in Velachery
Primavera Training in Tambaram
Primavera Courses in Adyar
Primavera Training in Adyar
This post is much helpful for us. This is really very massive value to all the readers and it will be the only reason for the post to get popular with great authority.
ReplyDeleteSelenium Training
Selenium Course in Chennai
Selenium Training Institute in Chennai
Best Software Testing Training Institute in Chennai
Testing training
Software testing training institutes
ReplyDeleteAwesome Post. It shows your in-depth knowledge on the content. Thanks for Sharing.
Informatica Training in Chennai
Informatica Training center Chennai
Informatica Training Institute in Chennai
Best Informatica Training in Chennai
Informatica Course in Chennai
IELTS coaching in Chennai
IELTS Training in Chennai
IELTS coaching centre in Chennai
ReplyDeleteGreat Article. The way you express in extra-ordinary. The information provided is very useful. Thanks for Sharing. Waiting for your next post.
SAS Training in Chennai
SAS Course in Chennai
SAS Training Institutes in Chennai
SAS Institute in Chennai
Clinical SAS Training in Chennai
SAS Analytics Training in Chennai
Photoshop Classes in Chennai
Photoshop Course in Chennai
Photoshop Training in Chennai
Thank you for sharing this kind of noteworthy information. Nice Post.It is a great post. Keep sharing
ReplyDeletesustainable-hyderabad
Education
Great content thanks for sharing this informative blog which provided me technical information keep posting.
ReplyDeletesalesforce training in hyderabad
devops training in hyderabad
data science course in hyderabad
big data hadoop training in hyderabad
non veg pickles in hyderbad
workday training in hyderabad
aws training in hyderabad
Great Article. Thanks for sharing info.
ReplyDeleteIELTS Coaching in Hyderabad
ServiceNow Training in Hyderabad
SharePoint Training in Hyderabad
Tableau Training in Hyderabad
SAP FICO Training in Hyderabad
Good articles
ReplyDeleteaws training in hyderabad
Your very own commitment to getting the message throughout came to be rather powerful and have consistently enabled employees just like me to arrive at their desired goals.
ReplyDeleteData science Course Training in Chennai | Data Science Training in Chennai
RPA Course Training in Chennai | RPA Training in Chennai
AWS Course Training in Chennai | AWS Training in Chennai
Thanks For Sharing The Information The Information Shared Is Very Valuable Please Keep Updating Us Time Just Went On Reading The article Hadoop Online Course
ReplyDelete
ReplyDeleteI like your post very much. It is very much useful for my research. I hope you to share more info
about this. Keep posting!!
Best Devops Training Institute
Wow, what an awesome spot to spend hours and hours! It's beautiful and I'm also surprised that you had it all to yourselves!
ReplyDeleteKindly visit us @
Best HIV Treatment in India
Top HIV Hospital in India
HIV AIDS Treatment in Mumbai
HIV Specialist in Bangalore
HIV Positive Treatment in India
Medicine for AIDS in India
Nice blog, it's so knowledgeable, informative, and good looking site. I appreciate your hard work. Good job. Thank you for this wonderful sharing with us. Keep Sharing. Kindly visit us @ 100% Job Placement | Best Colleges for Computer Engineering
ReplyDeleteBiomedical Engineering Colleges in Coimbatore | Best Biotechnology Colleges in Tamilnadu | Biotechnology Colleges in Coimbatore
Biotechnology Courses in Coimbatore | Best MCA Colleges in Tamilnadu | Best MBA Colleges in Coimbatore
Engineering Courses in Tamilnadu | Engg Colleges in Coimbatore
你有一篇很棒的文章
ReplyDeletePhối chó bull pháp
Phối giống chó Corgi
Phối chó Pug
chó Poodle
chó Poodle giá bao nhiêu
你有一篇很棒的文章。祝你工作愉快
ReplyDeletebon ngam chan
máy ngâm chân giải độc
bồn mát xa chân
chậu ngâm chân giá rẻ
Thanks for sharing valuable information.
ReplyDeletehadoop interview questions
Hadoop interview questions for experienced
Hadoop interview questions for freshers
top 100 hadoop interview questions
frequently asked hadoop interview questions
Vanskeligheter( van bi ) vil passere. På samme måte som( van điện từ ) regnet utenfor( van giảm áp ) vinduet, hvor nostalgisk( van xả khí ) er det som til slutt( van cửa ) vil fjerne( van công nghiệp ) himmelen.
ReplyDeletehttp://www.google.sr/url?q=https://forums.futura-sciences.com/members/1080064-thanhgompaumaieco.html
ReplyDeletehttp://www.google.ad/url?q=https://forums.futura-sciences.com/members/1080064-thanhgompaumaieco.html
http://www.google.com.bh/url?q=https://forums.futura-sciences.com/members/1080064-thanhgompaumaieco.html
http://www.google.com.bo/url?q=https://forums.futura-sciences.com/members/1080064-thanhgompaumaieco.html
http://www.google.co.bw/url?q=https://forums.futura-sciences.com/members/1080064-thanhgompaumaieco.html
ReplyDeleteA very inspiring blog your article is so convincing that I never stop myself to say something about it.
good...
ReplyDeleteinplant training in chennai
inplant training in chennai for it
Bermuda web hosting
Botswana hosting
armenia web hosting
dominican republic web hosting
iran hosting
palestinian territory web hosting
iceland web hosting
I have enjoyed reading your post,it looks very attractive.Thanks for sharing some useful information.Tableau can help anyone see and understand their data.Get Tableau certification from our institute.
ReplyDeleteBest tableau training institutes in Bangalore
We as a team of real-time industrial experience with a lot of knowledge in developing applications in python programming (7+ years) will ensure that we will deliver our best in python training in vijayawada. , and we believe that no one matches us in this context.
ReplyDeletehttps://www.kaashivinfotech.com/robotics-training-in-chennai
ReplyDeletehttps://www.kaashivinfotech.com/internship-for-mba-students
https://www.kaashivinfotech.com/internship-for-cse-students-in-hyderabad
https://www.kaashivinfotech.com/internship-in-chennai
https://www.kaashivinfotech.com/internship-for-cse-students
https://www.kaashivinfotech.com/best-final-year-project-in-information-technology
https://www.kaashivinfotech.com/internship-for-bba-students
https://www.kaashivinfotech.com/internship-for-ece-students-in-bangalore
https://www.kaashivinfotech.com/tag/list-of-architectural-firms-in-chennai-for-internship
Good training...........
ReplyDeleterobotics training
internship for mba students
cse internship in hyderabad
internship chennai
internships in chennai for cse
final year project for information technology
bba internship
internships for ece students in bangalore
list of architectural firms in chennai for internship
nice
ReplyDeleteResume Coustomer Service Executive
Resume For Bank Job
Resume Cyber security Engineer
Resume Data Base Developer
Resume DeputyManager
Resume Design Engineer
Resume Desktop Support Engineer
Interview Question for CTS Placement
Cognizant Interview Questions For Fresher
Cognizant Interview Questions
You are doing a great job by sharing useful information about Hadoop course. It is one of the post to read and improve my knowledge in Hadoop.You can check our Multinode Hadoop Cluster Setup,for more information about Hadoop Multi Node Cluster setup Tutorial.
ReplyDeleteThanks For Sharing
ReplyDeletepython certification training in Bangalore
best React JS Training course in Bangalore
very nice blogger thanks for sharing......!!!
ReplyDeletecoronavirus update
inplant training in chennai
inplant training
inplant training in chennai for cse
inplant training in chennai for ece
inplant training in chennai for eee
inplant training in chennai for mechanical
internship in chennai
online internships
Amazing article, I highly appreciate your efforts, it was highly helpful.
ReplyDeleteBig Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery
Thank you for that valuable post. Fresher’s have struggle to learn web design and developement applications in this post guide that students and give more extended knowledge of web technology. good luck guys
ReplyDeleteAi & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai
Hi, Very nice article. I hope you will publish again such type of post. Thank you!
ReplyDeleteCorporate gifts ideas | Corporate gifts
Corporate gifts singapore | Corporate gifts in singapore
Promotional gifts singapore | Corporate gifts wholesale Singapore
Business card holder singapore | T shirts supplier singapore
Thumb drive supplier singapore | Leather corporate gifts singapore
Thank you for the time to publish this information very useful! I've been looking for books of this nature for a way too long. I'm just glad that I found yours. Looking forward for your next post. Thanks
ReplyDeleteSalesforce Training in Chennai
Salesforce Online Training in Chennai
Salesforce Training in Bangalore
Salesforce Training in Hyderabad
Salesforce training in ameerpet
Salesforce Training in Pune
Salesforce Online Training
Salesforce Training
Thank you for the time to publish this information very useful! I've been looking for books of this nature for a way too long. I'm just glad that I found yours. Looking forward for your next post. Thanks
ReplyDeleteSalesforce Training in Chennai
Salesforce Online Training in Chennai
Salesforce Training in Bangalore
Salesforce Training in Hyderabad
Salesforce training in ameerpet
Salesforce Training in Pune
Salesforce Online Training
Salesforce Training
Awesome blog. It was very informative. I would like to appreciate you. Keep updated like this!
ReplyDeleteBigdata Hadoop Training in Gurgaon
It was really fun reading ypur article. Thankyou very much. # BOOST Your GOOGLE RANKING.It’s Your Time To Be On #1st Page
ReplyDeleteOur Motive is not just to create links but to get them indexed as will
Increase Domain Authority (DA).We’re on a mission to increase DA PA of your domain
High Quality Backlink Building Service
Boost DA upto 15+ at cheapest
Boost DA upto 25+ at cheapest
Boost DA upto 35+ at cheapest
Boost DA upto 45+ at cheapest
Salesforce advancement has become a fundamental necessity in the current business situation. How is the Salesforce job market in Noida after training
ReplyDeletehttps://digitalbrolly.com/affiliate-marketing-course-in-hyderabad/
ReplyDeleteWelcome to CapturedCurrentNews – Latest & Breaking India News 2021
ReplyDeleteHello Friends My Name Anthony Morris.latest and breaking news linkfeeder
Five near do. Have owner yourself hard. They little arrive reduce movie energy.trending-updates
ReplyDelete