Martin Toshev

Martin is an IT consultant, Java enthusiast and has been heavily involved in the activities of the Bulgarian Java User group (BG JUG). His areas of interest include the wide range of Java-related technologies (such as Servlets, JSP, JAXB, JAXP, JMS, JMX, JAX-RS, JAX-WS, Hibernate, Spring Framework, Liferay Portal and Eclipse RCP), cloud computing technologies, cloud-based software architectures, enterprise application integration, relational and NoSQL databases. You can reach him for any Java and FOSS-related topics (especially Eclipse and the OpenJDK). Martin is also a regular speaker at Java conferences.

Building highly scalable data pipelines with Apache Spark

Day 2 - 11th Dec 10:30-11:20 Hall 8 #AIST Advanced

Many modern world software systems face the need to process increasing volumes of data in a reasonable amount of time. To achieve that one can think of different ways to build a highly-scalable data pipeline which is a non-trivial activity especially in terms of the different operations, performance through various optimizations and scalability that need to be taken into consideration during the implementation. In order to avoid that we typically fall back to a framework like Apache Spark that provides the necessary primitives to build such a data pipeline. During this topic we will demonstrate how does Apache Spark achieves that by implementing a realtime event management system that aggregates events from a number of data sources in a single application implemented with the help of Apache Spark.