I'm kind of new to Hadoop HDFS and quite rusty with Java and I need some help. I'm trying to read a file from HDFS and calculate the MD5 hash of this file. The general Hadoop configuration is as below.
private FSDataInputStream hdfsDIS;
private FileInputStream FinputStream;
private FileSystem hdfs;
private Configuration myConfig;
myConfig.addResource("/HADOOP_HOME/conf/core-site.xml");
myConfig.addResource("/HADOOP_HOME/conf/hdfs-site.xml");
hdfs = FileSystem.get(new URI("hdfs://NodeName:54310"), myConfig);
hdfsDIS = hdfs.open(hdfsFilePath);
The function hdfs.open(hdfsFilePath) returns an FSDataInputStream
The problem is that i can only get an FSDataInputStream out of the HDFS, but i'd like to get a FileInputStream out of it.
The code below performs the hashing part and is adapted from something i found somewhere on StackOverflow (can't seem to find the link to it now).
FileInputStream FinputStream = hdfsDIS;   // <---This is where the problem is
MessageDigest md;
    try {
        md = MessageDigest.getInstance("MD5");  
        FileChannel channel = FinputStream.getChannel();
        ByteBuffer buff = ByteBuffer.allocate(2048);
        while(channel.read(buff) != -1){
            buff.flip();
            md.update(buff);
            buff.clear();
        }
        byte[] hashValue = md.digest();
        return toHex(hashValue);
    }
    catch (NoSuchAlgorithmException e){
        return null;
    } 
    catch (IOException e){
        return null;
    }
The reason why i need a FileInputStream is because the code that does the hashing uses a FileChannel which supposedly increases the efficiency of reading the data from the file.
Could someone show me how i could convert the FSDataInputStream into a FileInputStream
 
     
     
    