Installing MongoDB on Ubuntu 14.04

MongoDB is one of the leading NoSQL databases, given our use of Cloudera, MongoDB is the logical choice for experimenting with NoSQL. To start we need to import the key:

root@mysql:/home/kev# sudo apt-key adv --keyserver hkp:// --recv 7F0CEB10

Now that we have the key we need to add the repos to the source files on the server, the URLs we need to add are:

deb dist 10gen

Next run apt-get update to pull in the new repo details and then apt-get install mongodb-org to install MongoDB.


Custom Flume Interceptor

I had a need to write a custom flume interceptor for something, after much googling I came across which, as a non java programmer was ideal. Having followed the instructions the letter it would not compile, the error was

/home/kwincott/jars/tweaker/src/main/java/com/example/flume/interceptors /[18,3] annotations are not supported in -source 1.3
(use -source 5 or higher to enable annotations)

/home/kwincott/jars/tweaker/src/main/java/com/example/flume/interceptors/[24,7] generics are not supported in -source 1.3
(use -source 5 or higher to enable generics)
Map<String, String> headers = event.getHeaders();

/home/kwincott/jars/tweaker/src/main/java/com/example/flume/interceptors/[46,20] for-each loops are not supported in -source 1.3
(use -source 5 or higher to enable for-each loops)
for (Event event:events) {

Adding these lines to pom.xml solved the issue




Sqoop Error – Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

While importing data from a MySQL table I came across this error:


This was using Cloudera 5.0.2, the Sqoop command that was being run was fine on the command line but failed when using Oozie to schedule it. As with most Hadoop logs there was little in the logs to suggest what the actual error was. in order to give myself a chance at finding the error I enabled verbose logging in the sqoop command.

This revelaed that Oozie was unable to find the mysql-connector.jar and the derby.jar (for interfacing with hive). Once I added these to the workflow directory in HDFS the job completed successfully



Sqoop export failing in CDH5

When doing an export using Sqoop I got this error:

root@c2nn1:/mnt/usb/tweets# sqoop
Warning: /opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
/opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/bin/../lib/sqoop/bin/sqoop: line 101: /usr/lib/hadoop/bin/hadoop: No such file or director

This can be resolved by running

rm -rf /usr/lib/hadoop

As there are no files in that directory


Ambari hangs when installing Hadoop

While installing Hadoop using the automated install method offered by Ambari I would constantly get to 4% then the install would fail, investigation in the logs showed that this was down to Puppet timing out. Further investigation revealed that this was because the server was running using a DHCP address, once this was changed to a static IP the install continued as normal,


EE wont replace faulty iPhone 5

After many pointless emails to EE and their Executive Office they have finally issued their final word on the matter of replacing a faulty iPhone 5 battery:

As EE like to ignore the law I sent this to their CEO, Olaf Swantee:

Having searched online it is clear that we are not the only people fighting with EE to replace or repair a faulty iPhone5. To date we have received nothing from EE apart from the email above and continue to have a faulty iPhone, no reply from Olaf or the “Executive Office”


MapReduce and Flumes .tmp files

While using one of the many online guides for streaming tweets into Hadoop (particular thanks to ) if became apparant that as the volume of tweets increased MapReduce would fail to run any queries against the data. Typical errors would be

Caused by: org.apache.hadoop.ipc.RemoteException( File does not exist: /user/flume/tweets/FlumeData.1395159404667.tmp

This is because the tmp files are getting written to the proper file name while MapRefuce is still holding them for processing. The way to combat this is to get MapReduce to ignore .tmp files. This requires the creation of a jar file and some ammendments to hive-site.xml. Not being a java programmer it took me a while to understand the process and most guides relied on using Eclipse, being an Ubuntu user I prefer NetBeans. Here is the precompiled jar file that I created based on the Jira issue for this.

The ammendments to the hive-site.xml are:

hive.aux.jars.path = file:///usr/lib/hadoop/twitterutil-1.0-SNAPSHOT.jar
mapred.input.pathFilter.class = com.twitter.util.FileFilterExcludeTmpFiles

These either need to be entered into Ambari or added (with formatting!) to hive-site.xml

One of the caveats I noticed was that the jar file needed to be under /usr/lib/hadoop rather than under /usr/lib/hadoop/lib like most of the other jar files


Load CSV in Hue

When creating a table in Hive based on a large CSV file the following error occurs;

IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[,], original=[,]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via ‘dfs.client.block.write.replace-datanode-on-failure.policy’ in its configuration.

This is caused by the number of nodes being too high in hdfs-site.xml, the default value of block.replication is 3, by dropping this to 2 (even with a 1 node cluster) resolved the issue.


Observium on FreeBSD

This is my guide to installing Observium on FreeBSD. I am using 9.1 patched with all the latest updates.

For some reason Observium would only run from /opt sp these are the commands I used to get a base install

mkdir /opt
cd /opt
cd /usr/ports/devel/subversion
make clean install
svn co observium
cd /opt/observium/
mv config.php.default config.php
mkdir rrd

At this point you need to login to the Mysql server and create a database and user for observium, once you have this you need to edit config.php with these details

php includes/update/update.php

appears to install the database, at this point I got a few errors about paths to things such as RRD, fping etc so you need to edit includes/ to set them correctly. Once thats all done your ready to add the first user, the command is php ./adduser.php username password 10. Im running this on a server with other sites so I created a virtual host config and browsed to the URL and all appeared OK. Upon trying to use I was informed that I need mcrypt from php5-extensions and some python stuff. The extensions are difficult to install but to install python the command was cd /usr/ports/*/py-MySQLdb && make install clean


harfbuzz failing on Freebsd

While trying to install cacti on a production Freebsd 9.1 server it failed with

/usr/local/lib/ undefined reference to `_XEatDataWords'
gmake[2]: *** [hb-view] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[2]: Leaving directory `/usr/ports/print/harfbuzz/work/harfbuzz-0.9.19/util'
gmake[1]: *** [all-recursive] Error 1
gmake[1]: Leaving directory `/usr/ports/print/harfbuzz/work/harfbuzz-0.9.19'
gmake: *** [all] Error 2
===> Compilation failed unexpectedly.
Try to set MAKE_JOBS_UNSAFE=yes and rebuild before reporting the failure to
the maintainer.
*** [do-build] Error code 1

Stop in /usr/ports/print/harfbuzz

Turns out that you need to reinstall libXrender from ports