智能建站收费标准_上海网页设计是什么_搜索引擎优化seo是什么_手游推广渠道平台

点一下关注吧！！！非常感谢！！持续更新！！！

Java篇开始了！

目前开始更新 MyBatis，一起深入浅出！

目前已经更新到了：

Hadoop（已更完）
HDFS（已更完）
MapReduce（已更完）
Hive（已更完）
Flume（已更完）
Sqoop（已更完）
Zookeeper（已更完）
HBase（已更完）
Redis （已更完）
Kafka（已更完）
Spark（已更完）
Flink（已更完）
ClickHouse（已更完）
Kudu（已更完）
Druid（已更完）
Kylin（已更完）
Elasticsearch（已更完）
DataX（已更完）
Tez（已更完）
数据挖掘（已更完）
Prometheus（已更完）
Grafana（已更完）
离线数仓（正在更新…）

章节内容

上节我们完成了如下的内容：

电商分析周期性事实表
拉链表的实现

在这里插入图片描述

基本介绍

在这里插入图片描述
首先要确定哪些是事实表、维表。

绿色为事实表
灰色为维表

用什么方式处理维表，每日快照、拉链表？

小表使用每日快照表：产品分类表、商家店铺表、商家地域组织表、支付方式表
大表使用拉链表：产品信息表

商品分类表

范式与反范式

数据库范式是设计关系型数据库结构时的一套指导原则，目的是为了减少数据冗余、确保数据依赖性合理，并提高数据一致性。然而，遵循范式也有一些潜在的缺点：

性能问题：高度规范化的数据库可能会导致查询和连接操作变慢，因为需要在多个表之间进行复杂的连接来获取完整的信息。
复杂性增加：随着范式的深入应用，数据库模式变得更加复杂，维护起来更加困难。对于开发人员来说，理解和编写针对规范化数据库的查询也变得更具有挑战性。
过度设计：有时过于追求范式会导致对简单场景的过度工程化，增加了不必要的复杂性和工作量。
读取效率低下：在某些情况下，为了保证写入时的数据完整性，范式可能导致频繁的读取操作变得低效，特别是在高并发读取环境中。

为了避免这些缺点，可以采取以下策略：
-选择适当的范式级别：并不是所有应用程序都需要达到第三范式或更高的标准。根据具体需求，选择适合的范式级别，例如第二范式可能就足够了。

反范式化（Denormalization）：在一些特定场景下，如报表生成、分析处理或者为了优化读取性能，可以适当放宽范式要求，通过引入冗余数据来简化查询逻辑并提升性能。
使用缓存机制：对于频繁访问但不经常变化的数据，可以考虑使用缓存技术来减轻数据库的压力，从而改善性能。
分区与分片：对于大型数据集，可以通过水平分割（分片）或垂直分割（分区）的方式来分散数据存储，以减少单个查询所需扫描的数据量。
索引优化：创建合理的索引可以帮助加速查询过程，但是过多的索引同样会影响插入和更新操作的速度，因此需要权衡利弊。
评估业务需求：始终基于实际业务需求来进行数据库设计，不要盲目追求理论上的完美范式。了解哪些数据更关键，哪些操作更频繁，据此调整设计方案。

总之，在实践中应该灵活运用范式原则，既要保持良好的数据结构，也要考虑到性能和易用性等因素。

创建表

数据库中的数据是规范的（满足三范式），但是规范化的数据给查询带来不便。
备注：这里对商品分类维度表做了逆规范化，省略了无关的信息，做成了宽表：

DROP TABLE IF EXISTS dim.dim_trade_product_cat;
create table if not exists dim.dim_trade_product_cat(
firstId int, -- 一级商品分类id
firstName string, -- 一级商品分类名称
secondId int, -- 二级商品分类Id
secondName string, -- 二级商品分类名称
thirdId int, -- 三级商品分类id
thirdName string -- 三级商品分类名称
)
partitioned by (dt string)
STORED AS PARQUET;

实现的具体是：

select T1.catid, T1.catname, T2.catid, T2.catname, T3.catid,
T3.catname
from (select catid, catname, parentid
from ods.ods_trade_product_category
where level=3 and dt='2020-07-01') T3
left join
(select catid, catname, parentid
from ods.ods_trade_product_category
where level=2 and dt='2020-07-01') T2
on T3.parentid=T2.catid
left join
(select catid, catname, parentid
from ods.ods_trade_product_category
where level=1 and dt='2020-07-01') T1
on T2.parentid=T1.catid;

数据加载

编写脚本

vim dim_load_product_cat.sh

写入的内容如下所示：

source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table dim.dim_trade_product_cat
partition(dt='$do_date')
select
t1.catid, -- 一级分类id
t1.catname, -- 一级分类名称
t2.catid, -- 二级分类id
t2.catname, -- 二级分类名称
t3.catid, -- 三级分类id
t3.catname -- 三级分类名称
from
-- 商品三级分类数据
(select catid, catname, parentid
from ods.ods_trade_product_category
where level=3 and dt='$do_date') t3
left join
-- 商品二级分类数据
(select catid, catname, parentid
from ods.ods_trade_product_category
where level=2 and dt='$do_date') t2
on t3.parentid = t2.catid
left join
-- 商品一级分类数据
(select catid, catname, parentid
from ods.ods_trade_product_category
where level=1 and dt='$do_date') t1
on t2.parentid = t1.catid;
"
hive -e "$sql"

商品地域组织表

创建表

商家店铺表、商家地域组织表 => 一张维表
这里也是逆规范化的设计、将商家店铺表、商家地域组织表组织成一张表，并拉宽。
在一行数据中体现：

商家信息
城市信息
地域信息

信息中包括ID和Name：

drop table if exists dim.dim_trade_shops_org;
create table dim.dim_trade_shops_org(
shopid int,
shopName string,
cityId int,
cityName string ,
regionId int ,
regionName string
)
partitioned by (dt string)
STORED AS PARQUET;

实现方式：

select T1.shopid, T1.shopname, T2.id cityid, T2.orgname
cityname, T3.id regionid, T3.orgname regionname
from
(select shopid, shopname, areaid
from ods.ods_trade_shops
where dt='2020-07-01') T1
left join
(select id, parentid, orgname, orglevel
from ods.ods_trade_shop_admin_org
where orglevel=2 and dt='2020-07-01') T2
on T1.areaid=T2.id
left join
(select id, orgname, orglevel
from ods.ods_trade_shop_admin_org
where orglevel=1 and dt='2020-07-01') T3
on T2.parentid=T3.id
limit 10;

数据加载

编写脚本对数据进行加载：

vim dim_load_shop_org.sh

写入的内容如下所示：

#！/bin/bash
source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table dim.dim_trade_shops_org
partition(dt='$do_date')
select t1.shopid,
t1.shopname,
t2.id as cityid,
t2.orgname as cityName,
t3.id as region_id,
t3.orgname as region_name
from (select shopId, shopName, areaId
from ods.ods_trade_shops
where dt='$do_date') t1
left join
(select id, parentId, orgname, orglevel
from ods.ods_trade_shop_admin_org
where orglevel=2 and dt='$do_date') t2
on t1.areaid = t2.id
left join
(select id, parentId, orgname, orglevel
from ods.ods_trade_shop_admin_org
where orglevel=1 and dt='$do_date') t3
on t2.parentid = t3.id;
"
hive -e "$sql"

商品信息表

数据处理

使用拉链表对商品信息进行处理

历史数据

历史数据 => 初始化拉链表（开始日期：当日，结束日期：9999-12-31）只执行一次

每日数据

新增数据：每日新增数据（ODS） => 开始日期：当日，结束日期：9999-12-31
历史数据：拉链表（DIM）与每日新增数据（ODS）做左连接（连接上有数据，数据有变化，结束日期变为当日。为连接上数据，数据无变化，结束日期保持不变）

创建维表

拉链表要增加两列，分别记录生效日期和失效日期

drop table if exists dim.dim_trade_product_info;
create table dim.dim_trade_product_info(`productId` bigint,`productName` string,`shopId` string,`price` decimal,`isSale` tinyint,`status` tinyint,`categoryId` string,`createTime` string,`modifyTime` string,`start_dt` string,`end_dt` string
) COMMENT '产品表'
STORED AS PARQUET;

初始数据加载

历史数据加载，只需要执行一次

insert overwrite table dim.dim_trade_product_info
select productId,
productName,
shopId,
price,
isSale,
status,
categoryId,
createTime,
modifyTime,
-- modifyTime非空取modifyTime，否则取createTime；substr取
日期
case when modifyTime is not null
then substr(modifyTime, 0, 10)
else substr(createTime, 0, 10)
end as start_dt,
'9999-12-31' as end_dt
from ods.ods_trade_product_info
where dt = '2020-07-12';

增量数据导入

重复执行，每次加载数据执行，编写脚本：

vim dim_load_product_info.sh

写入的内容如下所示：

#！/bin/bash
source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table dim.dim_trade_product_info
select productId,
productName,
shopId,
price,
isSale,
status,
categoryId,
createTime,
modifyTime,
case when modifyTime is not null
then substr(modifyTime,0,10)
else substr(createTime,0,10)
end as start_dt,
'9999-12-31' as end_dt
from ods.ods_trade_product_info
where dt='$do_date'
union all
select dim.productId,
dim.productName,
dim.shopId,
dim.price,
dim.isSale,
dim.status,
dim.categoryId,
dim.createTime,
dim.modifyTime,
dim.start_dt,
case when dim.end_dt >= '9999-12-31' and ods.productId
is not null
then '$do_date'
else dim.end_dt
end as end_dt
from dim.dim_trade_product_info dim left join
(select *
from ods.ods_trade_product_info
where dt='$do_date' ) ods
on dim.productId = ods.productId
"
hive -e "$sql"