Building and Managing Scalable Big Data Architectures using PySpark.

  • Home /
  • Schedule /
  • Building and Managing Scalable Big Data Architectures using PySpark.

Building and Managing Scalable Big Data Architectures using PySpark.

  • General Python, Web/DevOps
  • Tutorial
  • Intermediate
  • Image Description

    By Qudus Ayoola

    Data Engineer at Willowfinch

    Abstract:

    In this day, with how data driven decisions are important for organizations; being able to handle large volume of data is by far one key element in determining the success or failure a business. Session Title: "Building and Managing Scalable Big Data Architectures using PySpark".

    This session will provide an overview of Apache Spark, a high performance cluster-computing framework for big data that is ideal from the point-of-view in getting to work with developing robust scalable analytic architectures. This session will show AI pipelines in the field managed with Spark and benefits of leveraging this tool.

    We also face some of the most common big data challenges that organizations are struggling to manage today - things like data quality, latency and scale among others. This session will provide the participants with practical solutions, lesson learned to get over these hurdles in an effective manner. In this session, take a look into how you can build resilient big data solutions in PySpark and approach to solving complex problems of Big Data architecture.

    This session covers: - Distributed data processing system, key components and how these component interact to handle big data. - Introduction to Spark and key features that makes spark suitable for big data architecture. - Real-world examples of AI pipelines managed with spark and benefits of adopting Spark. - Strategies for dealing with common big data challenges like data quality, latency and scalability.


    GO BACK
    Image Description