当前位置: 首页> 文旅> 酒店 > 外贸英文网站搭建的公司_北京市建设工程信息网安徽兴创_深圳搜索竞价账户托管_网络营销机构官方网站

外贸英文网站搭建的公司_北京市建设工程信息网安徽兴创_深圳搜索竞价账户托管_网络营销机构官方网站

时间:2025/7/13 18:55:25来源:https://blog.csdn.net/sunxunyong/article/details/144451639 浏览次数:1次
外贸英文网站搭建的公司_北京市建设工程信息网安徽兴创_深圳搜索竞价账户托管_网络营销机构官方网站

1、获取fsimage文件:
hdfs dfsadmin -fetchImage /data/xy/
2、从二进制文件解析:
hdfs oiv -i /data/xy/fsimage_0000000019891608958 -t /data/xy/tmpdir -o /data/xy/out -p Delimited -delimiter “,”
3、创建hive表
create database if not exists hdfsinfo;
use hdfsinfo;
CREATE TABLE fsimage_info_csv(
path string,
replication int,
modificationtime string,
accesstime string,
preferredblocksize bigint,
blockscount int,
filesize bigint,
nsquota string,
dsquota string,
permission string,
username string,
groupname string)
ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’
WITH SERDEPROPERTIES (‘field.delim’=‘,’, ‘serialization.format’=‘,’)
STORED AS INPUTFORMAT ‘org.apache.hadoop.mapred.TextInputFormat’;

4、存储HDFS元数据加载进hive中
hdfs dfs -put /data/xy/out /user/hive/warehouse/hdfsinfo.db/fsimage_info_csv/
hdfs dfs -ls /user/hive/warehouse/hdfsinfo.db/fsimage_info_csv/
Hive: MSCK REPAIR TABLE hdfsinfo.fsimage_info_csv;
select * from hdfsinfo.fsimage_info_csv limit 5;

5、统计叶子目录下小文件数据量(4194304 H字节,即<4M)
SELECT
dir_path ,
COUNT(*) AS small_file_num,
modificationtime,
accesstime
FROM
( SELECT
modificationtime,
accesstime,
relative_size,
dir_path
FROM
(
SELECT
(CASE filesize < 4194304 WHEN TRUE THEN ‘small’ ELSE ‘large’ END) AS relative_size,
modificationtime,
accesstime,
split(
substr(
concat_ws(‘/’, split(PATH, ‘/’)),
1,
length(concat_ws(‘/’, split(PATH, ‘/’))) - length(last_element) - 1
),
‘,’)[0] as dir_path
FROM (
SELECT
modificationtime,
accesstime,
filesize,
PATH,
split(PATH, ‘/’)[size(split(PATH, ‘/’)) - 1] as last_element
FROM hdfsinfo.fsimage_info_csv
) t0 ) t1
WHERE
relative_size=‘small’) t2
GROUP BY
dir_path,modificationtime,accesstime
ORDER BY
small_file_num desc
limit 500;

5、统计叶子目录下小文件数据量(4194304 H字节,即<4M)
SELECT
dir_path,
COUNT(*) AS small_file_num
FROM
( SELECT
relative_size,
dir_path
FROM
(
SELECT
(CASE filesize < 41943040 WHEN TRUE THEN ‘small’ ELSE ‘large’ END) AS relative_size,
split(
substr(
concat_ws(‘/’, split(PATH, ‘/’)),
1,
length(concat_ws(‘/’, split(PATH, ‘/’))) - length(last_element) - 1
),
‘,’)[0] as dir_path
FROM (
SELECT
filesize,
PATH,
split(PATH, ‘/’)[size(split(PATH, ‘/’)) - 1] as last_element
FROM hdfsinfo.fsimage_info_csv
WHERE
permission not LIKE ‘d%’
) t0 ) t1
WHERE
relative_size=‘small’) t2
GROUP BY
dir_path
ORDER BY
small_file_num desc
limit 50000;

关键字:外贸英文网站搭建的公司_北京市建设工程信息网安徽兴创_深圳搜索竞价账户托管_网络营销机构官方网站

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com

责任编辑: