Hadoop Ecosystem Contributors (Q3 2018)

Data as of 2018-09-28, based on count of committers or pmc.

Hadoop contributions 2018-09-28 06-58-54


WebHDFS on Hadoop 3 with Java 9

Unable to browse WebHDFS on Hadoop 3 running on JDK 9.

On Browsing the file system:

Failed to retrieve data from /webhdfs/v1/?op=LISTSTATUS: Server Error

In the namenode logs:

java.lang.NoClassDefFoundError: javax/activation/DataSource

On namenode, in hadoop-env.sh, add:

export HADOOP_OPTS="--add-modules java.activation"


An alternate to slack on Pi 3 (Rocket.Chat)

Rocket.Chat is an open source alternative to slack, and I have been thinking about using it at home. The ubuntu install was painless and very easy, but it didn’t make sense to keep a box up all the time – at least at home. So the best home deployment platform was an sbc (Pi 3).

Installation on pi was ‘mostly’ easy (followed official link) except for the persistence layer – mongodb. Rocket.chat required a newer version of mongodb, and the github page offered a cloud-based alternative for mongodb hosting. To me the whole solution being on-premise (actually whole solution on-pi) was major convincing point, so I avoided that route. After few searches, I located a post by Andy Felong, where he had already compiled binaries for arm architecture (Pi 2 and 3 – Jessie) of mongodb 3.0.9. Although the steps he had listed to deploy mongodb using his binaries were straightforward, I still scripted it; in case if I had to reinstall things again (link to script on github).

After mongodb was installed, the Rocket.chat server started smoothly. My beta testers (my kids) were ready to jump on it as soon as I created their user accounts. So much that at one point, when a file upload was a bit slow; I saw my son clicking the button multiple times – and suddenly all of them got a yellow header in chat interface due to server unavailability. I went back to terminal and there was a stack-trace waiting. This was the right time to graduate the interactive start into supervisor-based run.

The more interesting part was integration via webhooks. Rocket.chat supports writing webhooks wrappers in javascript e.g. it was very easy to pull live images from webcam.

Here is the launch.sh:

export PORT=3000 
export ROOT_URL= 
export MONGO_URL=mongodb:// 
export ADMIN_USER=admin
export ADMIN_PASS=*****
export ADMIN_EMAIL=admin@somedomain
/home/pi/meteor/dev_bundle/bin/node /home/pi/rocketchat/bundle/main.js

And the supervisor’s run-as-svc.sh:

echo "Note: Start this script as sudo"
if [ -f "$svcconf" ]; then
  echo "supervisor config already exists"
  echo "creating supervisor config - first time only"
cat > /etc/supervisor/conf.d/rocketchat.conf <<EOF
echo "Starting process"
supervisorctl reread
supervisorctl update
supervisorctl restart rocketchat

Finally, the only disappointment was android cordova-based app. It kept on showing media less pages (no images, or css loaded) in the webview. I think it could be due to some baked expectations of SSL within the app.

Overall, a fun little project with great potential with webhooks. Some thoughts for webhooks: web cluster/pool member status, build status notifications, current server load, or how Chicago downtown looks like at this exact minute.

OAuth2 via curl – Google Calendar API

Some time and technology iterations have passed since the previous post on similar topic (Google Contact Data API via curl). Still, curl is a fun way to play with APIs – this time Google Calendar API to download calendar events.

Below is the general flow for playing with Google APIs – starting with API console project setup manually, till the App (a bash script) gets data. “Almost” all work was done in bash (json parsing was the exception).

This bash script became the data provider part for another raspberry pi endeavor – “Announce upcoming calendar events using TTS.”


For my own memory, I also documented the steps in Freemind.

Screenshot from 2016-12-28 03-59-57.png


(Click here to open the full image in new tab)

For the first phase (the setup), the steps were simple on API console website.


The second phase was to obtain Authorization code with the help of the user.

Screenshot from 2016-12-28 04-03-46.pngNext step was to obtain Access Token. This was done via curl. Bash is cool but it can sometimes become an obsession to do everything in bash. After a break, it seemed more logical to do json parsing in python – while keeping main control in bash. The lines were still compact.

Screenshot from 2016-12-28 04-06-06.png

Eventually, once Access Token were obtained, its time to get the data. This was the easy part.

Screenshot from 2016-12-28 04-08-42.png

Finally, once the Access Token expires, Refresh token is used to get the new one. This one is very similar to second step, but instead of authorization code, the refresh token is sent.

Screenshot from 2016-12-28 04-09-53.png

To do: Upload complete bash script on github.

Image Magick with Powershell


  • Lots of burst images e.g. consecutive screenshots, time-lapse images, etc.
  • Need to keep only the images with significant changes i.e. de-dup almost similar ones.


Image Magick is a great open-source software. The idea is to use its ‘compare’ module to compare consecutive images and get a numeric value which can be checked against some threshold. If the numeric value > threshold, then the change is significant. Simple!

Having done it on linux (raspberry pi), the diff-ing part was easy and tested with parameters: Fuzziness, Quiet. (reference)

The unexpected challenging part was powershell syntax to read image magick output! A simple task of loading the numeric result from stdout (or stderr in case of image magick) into a variable was such a big challenge. Appeared as if MS created hacks to make quoted arguments work. After spending hours with Invoke-Command, Invoke-Expression, iex, OutVariable, &, etc., the max that was accomplished was a console error with ps trying to execute the stdout/stderr numeric output, after executing the actual command.

C:\Progra~1\ImageMagick-7.0.4-Q16\magick : 262
At line:1 char:1
+ C:\Progra~1\ImageMagick-7.0.4-Q16\magick compare  ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 + CategoryInfo : NotSpecified: (262:String) [], RemoteException
 + FullyQualifiedErrorId : NativeCommandError

The fix was in two parts.

One was redirecting stderr (2) to stdout (1) i.e. 2>&1.

$fullcmdline = '<path to image magick>\magick <compare args> <path to imageN> <path to image N+1> tmp.jpg 2>&1'

The other was iex syntax:

$result = iex (${fullcmdline})

Once this was resolved, the rest was easy – delete files where $result did not exceed the threshold.