王博,庄暨军,熊军,罗小臣.基于高维特征聚类优化的随机森林算法研究[J].井冈山大学自然版,2022,43(5):52-56 |
基于高维特征聚类优化的随机森林算法研究 |
HDFC-RF ALGORITHM BASED ON HIGH-DIMENSIONAL FEATURE CLUSTERING OPTIMIZATION |
投稿时间:2022-03-23 修订日期:2022-05-27 |
DOI:10.3969/j.issn.1674-8085.2022.05.008 |
中文关键词: 高维特征 特征聚类 随机森林 |
英文关键词: high dimensional features feature clustering random forest |
基金项目:国家自然科学基金项目(61862035);江西省教育厅科技计划项目(GJJ190561) |
|
摘要点击次数: 1242 |
全文下载次数: 933 |
中文摘要: |
针对传统的随机森林算法(RF)在对高维特征数据集计算速度慢、聚类效果不佳的缺陷,提出了一种基于高维特征聚类的随机森林算法(HDFC-RF),首先用传统RF方法对初始高维数据集聚类后,使用K均值聚类(KM)和模糊C-均值(FCM)结合,计算样本相似度,并对聚类特征划分族群,最后通过计算DBI指标,并与相关性阈值δ比较和排序,得到最终的高维特征序列。将HDFC-RF算法应用于高维特征数据集Colon Tumor,与传统的RF和FSRF算法比较。实验结果表明,HDFC-RF算法对于高维特征的数据集具有更好的聚类效果、训练速度也更快,具备良好的可行性。 |
英文摘要: |
In view of the shortcomings of traditional random forest algorithm (RF) in slow calculation speed and poor clustering effect on high-dimensional feature data sets, a random forest algorithm based on high-dimensional feature clustering (HDFC-RF) are proposed in this paper. Firstly, the initial high-dimensional data sets are clustered by traditional RF method, and then K-means clustering (KM) and fuzzy C-means (FCM) are combined to divide the population according to the sample similarity. Finally, comparing and sorting with the correlation threshold, the final high-dimensional feature sequence is obtained by calculating DBI index. The HDFC-RF algorithm is applied to the high-dimensional feature data set colon tumor. Compared with the traditional RF and FSRF algorithms, the experimental results show that the HDFC-RF algorithm has better clustering effect, faster training speed and good feasibility. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |
|
|
|