Manju's Blog

Manju S V

Sr. Associate Technology L2 at Publicis Sapient. M.Tech Computer Science from NITK Surathkal. Interested in Python, Data Science, Machine Learning, Django and Flask

Parsed, Analyzed, and Optimized Logical Plans in Spark

3 minute read

What are Parsed, Analyzed, and Optimized Logical Plans in Spark? Apache Spark employs a sophisticated query optimization mechanism involving several logical ...

Cache vs Persist in Spark

2 minute read

What is Cache and Persist in Spark? Cache Definition: The cache() method stores the RDD or DataFrame in memory. By default, it uses the MEMORY_AND_DISK st...

Managed vs external table in Spark

3 minute read

What Are Managed Tables and External Tables in Spark? Managed Tables Definition: In a managed table, Spark manages both the metadata and the data itself. ...

Repartition vs Coalesce in Spark

2 minute read

What are Repartition and Coalesce in Spark? Repartition repartition() increases or decreases the number of partitions in an RDD or DataFrame.

Caching an RDD in Spark

2 minute read

What is Caching an RDD in Spark? Definition Caching an RDD in Spark means storing it in memory so that subsequent actions on the same RDD can reuse the data ...

Manju S V

Recent posts

Parsed, Analyzed, and Optimized Logical Plans in Spark

Cache vs Persist in Spark

Managed vs external table in Spark

Repartition vs Coalesce in Spark

Caching an RDD in Spark