Recent posts

Cache vs Persist in Spark

2 minute read

What is Cache and Persist in Spark? Cache Definition: The cache() method stores the RDD or DataFrame in memory. By default, it uses the MEMORY_AND_DISK st...

Managed vs external table in Spark

3 minute read

What Are Managed Tables and External Tables in Spark? Managed Tables Definition: In a managed table, Spark manages both the metadata and the data itself. ...