## FUSE-DFS

In order to allow existing software to access the Hadoop Distributed File System (HDFS) without modification, I have compiled and installed FUSE-DFS on my cluster. FUSE-DFS allows us to use FUSE (Files System in Userspace) to mount the HDFS as a local filesystem. Software can then access the contents of the HDFS in the same way that files on the local filesystem are accessed.

Since I am using the standard version of Hadoop (from hadoop.apache.org), rather than a distribution from Cloudera or another company, I had to compile and configure the filesystem myself. I ran into several issues along the way, so I thought that I should share my solution to some of the more difficult problems.

I began by reading a wiki page about Mountable HDFS. I had already downloaded the source for Hadoop 2.4.1, so I began attempting to compile the version of fuse_dfs that came included with the download. Upon trying to follow directions to compile fuse_dfs, I found that the directory structure in the instructions differed from the directory structure of the source taball that I downloaded. After spending some time attempting to adjust the instructions to apply to my source, I decided to compile the code manually. If I had more knowledge of cmake, I probably would have been able to use cmake to build it, but I don’t know very much about cmake yet.

$gcc ../fuse-dfs/*.c -o fuse_dfs -D_FILE_OFFSET_BITS=64 -I .. -I ../libhdfs/ \ -L /usr/local/hadoop/lib/native/ \ -Wl,-rpath=/usr/local/hadoop/lib/native/:/usr/lib/jvm/java-7-oracle/jre/lib/amd64/server/ \ -lhdfs -lfuse -lpthread -lc -lm where /usr/local/hadoop/lib/native/ is the location of libhdfs.so and /usr/lib/jvm/java-7-oracle/jre/lib/amd64/server/ is the location of libjvm.so. You may also need to make a link to Hadoop’s “config.h” in the fuse-dfs directory or do something else so that the preprocessor can locate config.h. When I first attempted this, the version of libhdfs.so installed on my system was apparently a 32-bit executable, so it could not be linked with fuse_dfs. I compiled libhdfs.so manually as well: $ gcc -c -fPIC ../libhdfs/*.c -I /usr/lib/jvm/java-7-oracle/include/ \ -I /usr/lib/jvm/java-7-oracle/include/linux/ \ -I ../../../../../../hadoop-common-project/hadoop-common/target/native/

where the final include path specifies the location of config.h. I then linked it…

$gcc -shared -fPIC -o libhdfs.so exception.o expect.o hdfs.o jni_helper.o \ -L /usr/lib/jvm/java-7-oracle/jre/lib/amd64/server/ -ljvm \ -Wl,-rpath=/usr/lib/jvm/java-7-oracle/jre/lib/amd64/server/ Once this was all finished, I installed fuse_dfs and fuse_dfs_wrapper.sh in /usr/local/hadoop/bin/ where all of the other hadoop-related executables are located. Upon trying to mount my HDFS, I encountered errors telling me that certain .jar files could not be found and that CLASSPATH was not defined. The command $ hadoop classpath

prints the relevant CLASSPATH, but the CLASSPATH that is actually needed is an explicit listing of all of the .jar files—not just the list of directories (note that the system does not understand the meaning of the wildcard, *). In order to make the list of .jar files, I built a command with awk, sed, ls, and sh and then set the CLASSPATH environment variable to the result of that command. This can probably be done with a shorter command, but this works:

export CLASSPATH=$(hadoop classpath | sed s/:/'\n'/g | awk '/\*$/ {print "ls", $0 ".jar"}' | sh | sed ':a;N;$!ba;s/\n/:/g')

This command ignores one path—the path to Hadoop’s configuration .xml files, which is /usr/local/hadoop/etc/hadoop/, in my case. So I add this directory as follows:

export CLASSPATH=/usr/local/hadoop/etc/hadoop/:$CLASSPATH This CLASSPATH definition is inserted into my .bashrc file on all of the nodes. At this point, I was still unable to mount the drive because I did not have the proper priviledges, so I added myself to the fuse group: $ sudo adduser $USER fuse Then, I had to uncomment the following line in /etc/fuse.conf: user_allow_other Finally, I was able to mount the filesystem: $ fuse_dfs_wrapper.sh -d dfs://foam:8020 dfsmount/

Where “foam” is the hostname of the NameNode and dfsmount is the mountpoint. Here it is in action:

### 15 Responses to “FUSE-DFS”

1. au essay service Says:

Education help to gain the suitable employment according to the ability level of a person and education is the guideline for us to select the appropriate career choice which makes our future more secure.

2. Outsourced IT Says:

This truly is this kind of incredible asset that you're supplying furthermore you give away complimentary. I truly like seeing sites that comprehend the requirement for supplying a superb asset complimentary. Much obliged to you for this great asset. Thanks for the article.

3. Kitz Says:

I don't have any way to measure the promotional value of the trailer, but it has given me just one more opportunity to remind people that the book is out there and present its themes in an entertaining way.

4. ?ssay ontim? Says:

Great programming tutorials for those who are looking for such contents for solving their program fixing and bugs. Get best online educational services of professional writers.

5. Kane Says:

Hey, thanks for the ncie post, any idea where the hadoop's config.h is located?

6. Cheap Essays Help Says:

I am speculating this is for a vinyl air bed. Assuming this is the case, if the needs a lot for another board, Coleman makes a decent inflation, included with their airbeds, yet I think accessible independently for \$ or somewhere in the vicinity, and you could most likely kluge it into spot, however you may need to remote the switch.

7. RootForPC Says:

Issab mishtik likigin a, mas layi kuri khush thaas thai likokai andaz

8. VASU Says:

Are you looking for Big Data Hadoop training classes where Big Data is fastest growing and most promising technology for handling large amounts of data for doing data analytics. Hadoop training in bay area This Big Data Hadoop training course lets you master hadoop technology. You will gain proficiency in learning in-depth knowledge on Big data and Hadoop Modules. You will learn the most important skills needed to work with hadoop data sources for data mining to get deep understanding of valuable business awareness.

9. fazal Says:

Big Data Organizations use their data to support and influence decisions and build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term ‘data science’. In big data training in bay area you will learn who is best suited to attend the full training, what prior knowledge you should have, and how the skills will help you (do/complete) business-critical analyses using Big Data in Hadoop.

10. Alisonbenhar Says:

Thanks for the efforts on gathering useful content and sharing here. It's very clever and will be extremely helpful for all people who use Hadoop

11. molly Says:

Hello,

Thank you for sharing such valuable information!

Regards

12. katie Says:

Brilliantly written. Thank you for sharing this informative post.

13. net worth stat Says:

Thank you for sharing the post! Glad to find it here.
net worth

14. Vicky Says:

My rather long internet look up has at the end of the day been compensated with pleasant insight to talk about with my family and friends.

My rather long internet look up has at the end of the day been compensated with pleasant insight to talk about with my family and friends.

15. Alicia Palmer Says:

Setting and Achieving Goals Essay. A person needs to achieve certain goals in one's life before you can call them successful. Success is to achieve goals, you have set. I have set certain goals I would like to achieve in my lifetime.