
Building and Managing Scalable Big Data Architectures using PySpark.

By Qudus Ayoola
Data Engineer at WillowfinchAbstract:
In this day, with how data driven decisions are important for organizations; being able to handle large volume of data is by far one key element in determining the success or failure a business. Session Title: "Building and Managing Scalable Big Data Architectures using PySpark".
This session will provide an overview of Apache Spark, a high performance cluster-computing framework for big data that is ideal from the point-of-view in getting to work with developing robust scalable analytic architectures. This session will show AI pipelines in the field managed with Spark and benefits of leveraging this tool.
We also face some of the most common big data challenges that organizations are struggling to manage today - things like data quality, latency and scale among others. This session will provide the participants with practical solutions, lesson learned to get over these hurdles in an effective manner. In this session, take a look into how you can build resilient big data solutions in PySpark and approach to solving complex problems of Big Data architecture.
This session covers: - Distributed data processing system, key components and how these component interact to handle big data. - Introduction to Spark and key features that makes spark suitable for big data architecture. - Real-world examples of AI pipelines managed with spark and benefits of adopting Spark. - Strategies for dealing with common big data challenges like data quality, latency and scalability.
GO BACK
Other Talks
-
-
Building Modern AI for African Languages
by Isheanesu Misi -
Ship it and Run it
by JOSEPH WAMBUGU -
FastDjango: Conjuring Powerful APIs with the Sorcery of Django Ninja
by Julius Boakye -
Keynote
by Deb Nicholson
