当前位置: 首页> 游戏> 网游 > Spark 之 HiveStrategies

Spark 之 HiveStrategies

时间:2025/7/12 10:43:57来源:https://blog.csdn.net/zhixingheyi_tian/article/details/139441801 浏览次数:0次
HiveTableRelation 相关代码

HiveStrategies.scala
当 relation.tableMeta.stats.isEmpty 是, 即调用 hiveTableWithStats

class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] {private def hiveTableWithStats(relation: HiveTableRelation): HiveTableRelation = {val table = relation.tableMetaval partitionCols = relation.partitionCols// For partitioned tables, the partition directory may be outside of the table directory.// Which is expensive to get table size. Please see how we implemented it in the AnalyzeTable.val sizeInBytes = if (conf.fallBackToHdfsForStatsEnabled && partitionCols.isEmpty) {try {val hadoopConf = session.sessionState.newHadoopConf()val tablePath = new Path(table.location)val fs: FileSystem = tablePath.getFileSystem(hadoopConf)fs.getContentSummary(tablePath).getLength} catch {case e: IOException =>logWarning("Failed to get table size from HDFS.", e)conf.defaultSizeInBytes}} else {conf.defaultSizeInBytes}val stats = Some(Statistics(sizeInBytes = BigInt(sizeInBytes)))relation.copy(tableStats = stats)}override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {case relation: HiveTableRelationif DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty =>hiveTableWithStats(relation)// handles InsertIntoStatement specially as the table in InsertIntoStatement is not added in its// children, hence not matched directly by previous HiveTableRelation case.case i @ InsertIntoStatement(relation: HiveTableRelation, _, _, _, _, _)if DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty =>i.copy(table = hiveTableWithStats(relation))}
}
  • HiveTableRelation
/*** A `LogicalPlan` that represents a hive table.** TODO: remove this after we completely make hive as a data source.*/
case class HiveTableRelation(tableMeta: CatalogTable,dataCols: Seq[AttributeReference],partitionCols: Seq[AttributeReference],tableStats: Option[Statistics] = None,@transient prunedPartitions: Option[Seq[CatalogTablePartition]] = None)
关键字:Spark 之 HiveStrategies

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com

责任编辑: