Hardware & Software

Hardware

The current hardware is constantly extended and adapted to the needs of the current projects. While the aim of the Little Big Data - Cluster is to provide a stable working environment for the researchers and teachers of the TU Wien, the Development Cluster was developed to be able to meet the latest hardware and software testing requirements and to implement these into the productive environment only if it can be guaranteed that there will be no impairments for the users.

dataLAB offers you uncomplicated access to the knowledge and infrastructure of the TU Wien.

On the Cloudera Platform, which is based on Hadoop, applications such as Apache Spark, Hive, Cassandra, MongoDB and Kafka run on the cluster. However, the team around dataLAB adapts the offer flexibly to the requirements of the users in order to support them in their projects in the best possible way.

For teaching and research with 1 NameNodes and 18 DataNodes, each:

  • 2x XeonE5-2650v4
  • 24 Cores
  • 256 GB Main Memory
  • 16 TB HDD
  • 10 Gbit/s

Additionally 300 TB NFS storage.

Scheme of the LBD cluster

For testing the latest technologies with 1 NameNode and 6 DataNodes:

  • 2x Xeon X5550@2.67GHz
  • 8 Cores
  • 24 GB Main Memory
  • 1 Gbit/s
Scheme of the development cluster

Software

Name

Description

Comment

Centos 7

Operating system

OK

XCAT

Deployment environment

OK

Cloudera Manager

Big Data Deployment

OK

Cloudera HDFS

Hadoop distributed file system

OK

Cloudera Accumulo

Key/value store

OK

Cloudera HBase

Database on top of HDFS

OK

Cloudera Hive

Data warehouse using SQL

OK

Cloudera Hue

Hadoop user experience, web gui, SQL analytics workbench

OK

Cloudera Impala

SQL query engine, used by Hue

OK

Oozie

Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Used by Hue

OK

Cloudera Solr

Open source enterprise search platform, used by Hue, used by Key-Value Store Indexer

OK

Cloudera Key-Value 
Store Indexer

The Key-Value Store Indexer service uses the Lily HBase NRT Indexer to index the stream of records being added to HBase tables. Indexing allows you to query data stored in HBase with the Solr service.

OK

Cloudera Spark (Spark 2)

Cluster-computing framework mit Scala 2.10 (2.11)

OK

Cloudera YARN (MR2 Included)

Yet Another Resource Negotiator (cluster management)

OK

Cloudera ZooKeeper

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services

OK

Java 1.8

Software Development Kit

OK

Python 2.7, 3.*

Software Development Kit

OK

Scala

Programming language

OK

Anaconda Python (Python)

Programming language with package management from Anaconda

OK

Jupyter Notebook

Webinterface for interactive computing, needs Anaconda

OK

JupyterLab

JupyterLab is the next-generation web-based user interface for Project Jupyter

OK

Cassandra

Requires disk space, selected nodes

TODO

Kafka

Open-source stream-processing software platform, configuration required for specific use

Rollout phase

MongoDB

NoSQL database, selected nodes

Rollout phase


Service Center

Logo service center

© TU Wien

Ticketsystem Online Portal, opens an external URL in a new window
Hotline 01 588 01 42002

help@it.tuwien.ac.at
1040 Wien, Operngasse 11, EG

The Service Center can be reached digitally from 8 a.m. to 4 p.m. on Mondays to Fridays and will also be in person from 8 a.m. to 12 p.m. on weekdays.

Safety and Security Measures, opens an external URL in a new window