Questions tagged [hadoop]

Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware

Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework.

95 questions
30
votes
2 answers

7Zip Cannot create symbolic link, access is denied to libhdfs.so and libhadoop.so

I am working on Windows 10 and trying to install Hadoop I downloaded it from here. When trying to extract Hadoop for files (libhdfs.so and libhadoop.so) I am getting the error. Cannot create symbolic link : Access is denied How do I fix this?
5
votes
1 answer

Hive Installation - Error Executing SQL Query "select "DB_ID" from "DBS""

I'm trying to install Apache Hive (3.1.1) on a Hadoop (3.2.0) multi-node cluster with 1 namenode and 3 data nodes. I have followed the getting started tutorial step-by-step on the apache website but when running the "hive" command I get an extremely…
A Tol
  • 51
4
votes
2 answers

Open eclipse with a linux user that doesn't have graphical environment (created from console)

I created an user from console in my Ubuntu Destkop 14.04 LTS, doing this: sudo addgroup hadoop sudo adduser --ingroup hadoop hduser I used that user for doing all sort of stuff, because i'm using it for doing some programming stuff that is related…
chomp
  • 143
4
votes
0 answers

Hadoop - java.io.IOException: Connection reset by peer when creating when creating a new directory

I have installed hadoop 2.4.0 as a single node for learning purposes but after I start hadoop and create a directory using the command: hadoop fs -mkdir /tmp I get the following error: ls: Failed on local exception: java.io.IOException: Connection…
Qurashi
  • 141
4
votes
2 answers

I can't run hive in command line

Hive 3.1.2 Hadoop 3.2.1 When I run hive in command line, it tell me the error message below: which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding…
4
votes
0 answers

Can you connect two GNOME-boxes via ssh?

I’m trying to set up 2 node Hadoop cluster using two lap-tops with GNOME-boxes, both running Ubuntu 17.10. The both laptops are connectes to the same wifi but the GNOME-boxes alone are not connected to the wifi. I need someone to explain to me the…
4
votes
4 answers

Why is Hadoop not a Data Warehouse?

What are functional reasons why Hadoop cannot be a Data Warehouse On several sites one can see statements that a Hadoop cluster is not a replacement of a traditional data warehouse. However, I can't find the real reasons why. I am aware that…
Dennis Jaheruddin
  • 496
  • 1
  • 5
  • 24
3
votes
4 answers

Unable to send files via SCP

I tried using the scp command with the -i option to transfer the file from local machine to remote EC2 instance: Akhis-Macbook-Pro:~ aswinakhilesh$ sudo scp -i Mykey.pem FileA ec2-user@ec2-23-20-46-45.compute-1.amazonaws.com:/home/FileA Instead of…
3
votes
2 answers

Slave: ssh: connect to host slave port 22: Connection timed out

i have done with single node cluster on two different machine,I have made one as master(192.168.1.1) and other m/c as slave(192.168.1.2), I am successfully able to ping between two machine,I have made the following changes to get into 2 node cluster…
Aashu
  • 131
3
votes
2 answers

Setting Hostname as IP on Linux for Hadoop VM

How can I have an Ubuntu server VM set /etc/hostname/ to the value fo the VM's assigned IP address automatically on startup? I'm creating an Ubuntu server VM image to run Hadoop. When a client interacts with Hadoop it returns addresses of nodes in…
3
votes
1 answer

Error when Starting CYGWin SSH daemon on windows 7 home edition

I am using Windows 7 Home edition, trying to install CYGWin SSH daemon. Successfully done the same, but when I run the same, I see error in C:\cygwin\var\log\sshd.log file Privilege separation user sshd does not exist I understand that I need to…
Lav
3
votes
3 answers

Port binding error in PySpark

I have been trying to get PySpark to work. I use the PyCharm IDE on a Windows 10 machine. For the setup I took these steps: installed PySpark installed Java 8u211 downloaded and pasted the winutils.exe declared SPARK_HOME, JAVA_HOME and HADOOP_HOME…
Moritz
  • 111
2
votes
1 answer

How can I access the ganglia web interface with ssh tunneling to monitor my EMR job?

Ive been using the standard hadoop monitoring tools with: ssh -L 9100:localhost:9100 -L 9101:localhost:9101 -o ServerAliveInterval=10 -o StrictHostKeyChecking=no -i key.pem hadoop@ec2-blah-blah-.compute-1.amazonaws.com And then just using my…
kelorek
  • 210
2
votes
1 answer

JAVA_HOME not set properly

I am trying to configure hadoop in cygwin. I have set JAVA_HOME as /cygdrive/c/work/java/jdk1.6.0_30 If I echo $JAVA_HOME, it displays correctly. If I run the command bin/hadoop version it is giving the following error message: /bin/java :No such…
Rashmi
  • 31
2
votes
0 answers

How to avoid (skip) full datanodes on write or replica process

I started a small hadoop cluster for experiment purpose, in my own hardware, with three datanodes with 30GB disk space. Later I added two more nodes with 200GB and now my cluster has approximately 420GB. My replica factor is 2. Today my starting…
1
2 3 4 5 6 7