pyspark.sql.functions.spark_partition_id#
- pyspark.sql.functions.spark_partition_id()[source]#
A column for partition ID.
New in version 1.6.0.
Changed in version 3.4.0: Supports Spark Connect.
- Returns
Column
partition id the record belongs to.
Notes
This is non deterministic because it depends on data partitioning and task scheduling.
Examples
>>> import pyspark.sql.functions as sf >>> spark.range(10, numPartitions=5).select("*", sf.spark_partition_id()).show() +---+--------------------+ | id|SPARK_PARTITION_ID()| +---+--------------------+ | 0| 0| | 1| 0| | 2| 1| | 3| 1| | 4| 2| | 5| 2| | 6| 3| | 7| 3| | 8| 4| | 9| 4| +---+--------------------+