pyspark.sql.functions.spark_partition_id#

pyspark.sql.functions.spark_partition_id()[source]#

A column for partition ID.

New in version 1.6.0.

Changed in version 3.4.0: Supports Spark Connect.

Returns
Column

partition id the record belongs to.

Notes

This is non deterministic because it depends on data partitioning and task scheduling.

Examples

>>> import pyspark.sql.functions as sf
>>> spark.range(10, numPartitions=5).select("*", sf.spark_partition_id()).show()
+---+--------------------+
| id|SPARK_PARTITION_ID()|
+---+--------------------+
|  0|                   0|
|  1|                   0|
|  2|                   1|
|  3|                   1|
|  4|                   2|
|  5|                   2|
|  6|                   3|
|  7|                   3|
|  8|                   4|
|  9|                   4|
+---+--------------------+