InfoSphere Streams radically extends the state-of-the-art in big data processing; it’s a high-performance computing platform that allows users to develop and reuse applications to rapidly ingest, analyze, & correlate information as it arrives from thousands of real-time sources. You can download the 90-day trial version here.
Here is the list of products I used in my installation:
- IBM InfoSphere Streams v3.0 (paid, 90-day trial edition is available here)
- VMware Workstation for virtualization (paid, free alternative could be virtualbox)
- Red Hat Enterprise Linux Server 5.5 (64 bit) (paid, you can get evaluation version for free, you can check system requirements at IBM’s infocenter site, you can also experiment with some free linux systems, it can work even if IBM doesn’t officially support them)
My virtual machine with RHEL was configured as follow:
- Memory: 3GB (inititally 2GB was enough)
- HDD: 20GB
- Processors: 2 (initially 1 was enough)
Installation Process Video
I recorded all the steps I took during the installation. If you are not interested in setting up the VMware image, you can skip to 05:10.
Installation Process Briefly
- Extract the IBM InfoSphere Streams archive:
tar -zxvf Streams-3.0.0.0-x86_64-el6.tar.gz
- Add streamsadmin group
groupadd streamsadmin
- Add streamsadmin user
useradd -g streamsadmin streamsadmin
and set its passwordpasswd streamsadmin
- Now we need to edit sudoers file. To edit the file type
visudo
and add this line in user privilege specification sectionstreamsadmin ALL=(ALL) NOPASSWD: ALL
. The section could then look similar to this:## Allow root to run any commands anywhere
root ALL=(ALL) ALL
streamsadmin ALL=(ALL) NOPASSWD: ALLThis will set up the streamsadmin user with sudo work priviliges without having to enter the password, which is necessary to run Streams.
- Switch user to streamsadmin
su - streamsadmin
- Go to
StreamsInstallFiles
and run dependency checker file:./dependency_checker.sh
. If there are some errors, you need to resolve them (you can use ScientificLinux repositories to get appropriate packages). In case JAVA JDK is missing, you can install it from RPM package that is part of the installation files. You can do this by therpm
command (you need to be logged as root user):rpm -i StreamsInstallFiles/rpm/ibm-java-x86_64-sdk-6.0-11.0.0x86_64.rpm
. JAVA is then installed in this directory:/opt/ibm/java-x86_64-60
. - When dependency checker doesn’t report any errors, you can start the installation. Make sure you are logged as streamsadmin, then you can start the installation wizard by running this command:
./InfoSphereStreamsSetup.bin
(located inStreamsInstallFiles folder
). - When installation is done, you need to start First Steps application and perform two required tasks: (1) Configure a secure shell environment and (2) Configure the InfoSphere Streams environment variables. For more information about post-installation tasks check infocenter.
I went through complete datastage installation video. Really helpful though i am not into datastage administration role still it explained lot. thanks for the wonderful video.