Spark 之 HiveStrategies

时间:2025/7/12 10:43:57来源：https://blog.csdn.net/zhixingheyi_tian/article/details/139441801 浏览次数:0次

HiveTableRelation 相关代码

HiveStrategies.scala
当 relation.tableMeta.stats.isEmpty 是，即调用 hiveTableWithStats

class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] {private def hiveTableWithStats(relation: HiveTableRelation): HiveTableRelation = {val table = relation.tableMetaval partitionCols = relation.partitionCols// For partitioned tables, the partition directory may be outside of the table directory.// Which is expensive to get table size. Please see how we implemented it in the AnalyzeTable.val sizeInBytes = if (conf.fallBackToHdfsForStatsEnabled && partitionCols.isEmpty) {try {val hadoopConf = session.sessionState.newHadoopConf()val tablePath = new Path(table.location)val fs: FileSystem = tablePath.getFileSystem(hadoopConf)fs.getContentSummary(tablePath).getLength} catch {case e: IOException =>logWarning("Failed to get table size from HDFS.", e)conf.defaultSizeInBytes}} else {conf.defaultSizeInBytes}val stats = Some(Statistics(sizeInBytes = BigInt(sizeInBytes)))relation.copy(tableStats = stats)}override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {case relation: HiveTableRelationif DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty =>hiveTableWithStats(relation)// handles InsertIntoStatement specially as the table in InsertIntoStatement is not added in its// children, hence not matched directly by previous HiveTableRelation case.case i @ InsertIntoStatement(relation: HiveTableRelation, _, _, _, _, _)if DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty =>i.copy(table = hiveTableWithStats(relation))}
}

HiveTableRelation

/*** A `LogicalPlan` that represents a hive table.** TODO: remove this after we completely make hive as a data source.*/
case class HiveTableRelation(tableMeta: CatalogTable,dataCols: Seq[AttributeReference],partitionCols: Seq[AttributeReference],tableStats: Option[Statistics] = None,@transient prunedPartitions: Option[Seq[CatalogTablePartition]] = None)

关键字：Spark 之 HiveStrategies

本网仅为发布的内容提供存储空间，不对发表、转载的内容提供任何形式的保证。凡本网注明“来源：XXX网络”的作品，均转载自其它媒体，著作权归作者所有，商业转载请联系作者获得授权，非商业转载请注明出处。

我们尊重并感谢每一位作者，均已注明文章来源和作者。如因作品内容、版权或其它问题，请及时与我们联系，联系邮箱：809451989@qq.com，投稿邮箱：809451989@qq.com

责任编辑：