Loading... <div class="tip inlineBlock success"> 选择题看历次课前测试 </div> # 常见的有监督学习算法 * 支持向量机(Support Vector Machines * 线性回归(linear regression) * 逻辑回归(logistic regression) * 朴素贝叶斯(naive Bayes) * 线性判别分析(linear discriminant analysis) * 决策树(decision trees) * K-近邻(k-nearest neighbor algorithm) * Multilayer perceptron <div class="tip inlineBlock info"> K-means算法为无监督学习算法 </div> # 经典案例分类 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-fec5bb3167f3fdc2a407c77cb00b0a2e19" aria-expanded="true"><div class="accordion-toggle"><span style="">关联分析</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-fec5bb3167f3fdc2a407c77cb00b0a2e19" class="collapse collapse-content"><p></p> * **购物篮分析法** * **亚马逊的个性化推荐** * 潘多拉音乐组计划 * 塔吉特的大数据营销 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-e8beae8dec99ea695dd782cff9bda0b076" aria-expanded="true"><div class="accordion-toggle"><span style="">趋势预测</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-e8beae8dec99ea695dd782cff9bda0b076" class="collapse collapse-content"><p></p> - **谷歌流感趋势** - **奥斯卡预测** - Farecast案例详析 - Decide案例详析 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-681864594833e49c285489ff888f3d1151" aria-expanded="true"><div class="accordion-toggle"><span style="">决策支持</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-681864594833e49c285489ff888f3d1151" class="collapse collapse-content"><p></p> - **美国总统大选** - 《纸牌屋》 <p></p></div></div></div> # 填空题 1.数据科学是一门通过 **填空 1** 来获取 **填空 2** 的科学。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-60eca991c607b0bbaf957a3a2dfb41a478" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-60eca991c607b0bbaf957a3a2dfb41a478" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**系统性研究** 填空 2:**与数据相关的知识体系** <p></p></div></div></div> 2.可视化领域包括三个主要分支,分别是 **填空 1** 、 **填空 2** 以及 **填空 3**等。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-d59446ca15dea9ed92def6030ff25b5050" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-d59446ca15dea9ed92def6030ff25b5050" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**科学可视化** 填空 2:**信息可视化** 填空 3:**可视分析** <p></p></div></div></div> 3.大数据能被用于打击罪犯的特征有真实性、**填空 1**、**填空 2** 。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-51e2a7b96e8e7394f9d86450031407ed47" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-51e2a7b96e8e7394f9d86450031407ed47" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**多样性** 填空 2:**速度** PS:大数据的4V特征: 1. 体量 2. 多样性 3. 真实性(价值性) 4. 速度 <p></p></div></div></div> 4.谷歌流感监测是大数据在 **填空 1** 方面的应用。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-2ec50f4e562e5bafe3df72488b237a5526" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-2ec50f4e562e5bafe3df72488b237a5526" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**趋势预测** PS:大数据的应用 1. 预测 2. 推荐 3. 商业情报分析 4. 科学研究 <p></p></div></div></div> 5.有训练样本,有标注的机器学习称为 **填空 1** 学习,而有训练样本无标注的机器学习称为 **填空 2** 学习。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-ec5f61f1b6205891c23fc08c008cd67178" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-ec5f61f1b6205891c23fc08c008cd67178" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**有监督** 填空 2:**无监督** <p></p></div></div></div> 6.列举两个基于`python`的中文处理工具包 **填空 1** 、 **填空 2**。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-7c78e73859ee9ce3f79cb9dae4d5387815" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-7c78e73859ee9ce3f79cb9dae4d5387815" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**jieba** 填空 2:****thulac**** PS: * jieba【推荐】 * thulac(清华大学) :处理utf-8编码 * SnowNLP:处理unicode编码,使用时需要decode/unicode,包括情感分析部分的分词处理 * pynlpir * CoreNLP * pyNLP(哈工大) * NLPIP:可处理少数民族语言的分词包 <p></p></div></div></div> <div class="tip inlineBlock warning"> 以上是去年测试题 </div> --- <div class="tip inlineBlock info"> 以下是历次课前测试填空题 </div> 7.python中注释语句符号为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-1372f29c67de059ea7ef07ab782647a813" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-1372f29c67de059ea7ef07ab782647a813" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**#** <p></p></div></div></div> 8.python中 **填空 1** 数据结构能容纳不同类型的数据。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c1c18e8e14298b398ea4be7984cf11ba69" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c1c18e8e14298b398ea4be7984cf11ba69" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**list** <p></p></div></div></div> 9.python中else与if连写为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-71773378a7d431382eb5dccadad86cb666" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-71773378a7d431382eb5dccadad86cb666" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**elif** <p></p></div></div></div> 10.**填空 1**库是面向Python的机器学习软件包,它可以支持主流的有监督机器学习方法和无监督机器学习方法。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-96d4b8c2ad6651de0610ea9ca853af5774" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-96d4b8c2ad6651de0610ea9ca853af5774" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**scikit-learn** <p></p></div></div></div> 11.Anaconda中安装第三方库所用的命令可以在库名前加 **填空 1** 。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-20bf2c76274c191b1116d12fc697987f3" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-20bf2c76274c191b1116d12fc697987f3" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**pip install** <p></p></div></div></div> 12.**填空 1** 是 Python 语言的一个扩展程序库,支持大量的维度数组与矩阵运算,此外也针对数组运算提供大量的数学函数库。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a64c05b56412ebfd0eae9b67dbfa28de2" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a64c05b56412ebfd0eae9b67dbfa28de2" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**Numpy** <p></p></div></div></div> 13.在python中想使用numpy库时,可用 **填空 1** 命令装载它。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-06cb77457df9dee7d174435448f350de49" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-06cb77457df9dee7d174435448f350de49" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**import numpy** <p></p></div></div></div> 14.有监督学习算法与无监督学习算法的不同是,必须对训练的样本给出 **填空 1**。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0e921219b7737899eca57d5b98c9ff6343" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0e921219b7737899eca57d5b98c9ff6343" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**标签** <p></p></div></div></div> 15.机器学习框架中,首先采集、预处理数据,再针对训练集进行 **填空 1** 的设计,确定其参数,再用它对测试集进行预测。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b74ec511d361e3328afc9823650017ee96" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b74ec511d361e3328afc9823650017ee96" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**模型** <p></p></div></div></div> 16.**填空 1**和 **填空 2** 是目前常见的大数据处理平台。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0ce881773215a16116fc80052874266f67" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0ce881773215a16116fc80052874266f67" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**Hadoop** 填空 2:**Spark** <p></p></div></div></div> 17.python中 **填空 1** 形式用来表示下一条语句是在上一条语句的结构里。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-d3da94233a5a605e5bfb58aa9e47053c73" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-d3da94233a5a605e5bfb58aa9e47053c73" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**缩进** <p></p></div></div></div> 18.购物篮分析法约分为两类 **填空 1** 和 **填空 2** 购物篮分析法,两者之所以不同思路的根本原因是因为 **填空 3** 差别很大。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-50843921264ce87a58382797900a3a0c60" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-50843921264ce87a58382797900a3a0c60" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**美式** 填空 2:**日式** 填空 3:**营业面积** <p></p></div></div></div> 19.采用Apriori算法进行相关性分析时,需要进行两个步骤 **填空 1** 和 **填空 2** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-db1e1d7e090e37ce5a3699cac618995810" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-db1e1d7e090e37ce5a3699cac618995810" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**寻找频繁集** 填空 2:**挖掘关联规则** <p></p></div></div></div> 20.可视化的目的,是把 **填空1** ,首要的原则是 **填空 2** 和 **填空 3** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4e3565eb37688802702fefb764f20c2474" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4e3565eb37688802702fefb764f20c2474" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**把复杂数据有效地展示出来** 填空 2:**准确** 填空 3:**清晰** <p></p></div></div></div> 21.目前基于视频的车流量检测主要有 **填空 1** 和 **填空 2** 两种方法。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-ab6d9a50862ffe49904c32a774ac51dd29" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-ab6d9a50862ffe49904c32a774ac51dd29" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**设置虚拟线圈** 填空 2:**车辆目标跟踪** <p></p></div></div></div> 22.交通流预测分为 **填空 1** 交通流预测和 **填空 2** 交通流预测。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-03033390cde51f13f172a3b862e7510735" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-03033390cde51f13f172a3b862e7510735" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**短时** 填空 2:**中长时** <p></p></div></div></div> 23.有监督学习是指对 **填空 1** 的数据进行建模,无监督学习是对 **填空 2** 的数据进行建模。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4c8784bcf56bf549cfdd27de97246ffe85" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4c8784bcf56bf549cfdd27de97246ffe85" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**已标注** 填空 2:**无标注** <p></p></div></div></div> 24.信息系统的评价有两个指标: **填空 1** 、 **填空 2** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-9e2493913f361a2cb236bed02d02da547" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-9e2493913f361a2cb236bed02d02da547" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**准确率** 填空 2:**召回率** <p></p></div></div></div> 25.文本检索中,向量空间模型在求两个代表文本信息的向量的距离时,采用的策略是求两向量的 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f1e4de949e09dafd20df0175d7d1726d14" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f1e4de949e09dafd20df0175d7d1726d14" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**夹角余弦值** <p></p></div></div></div> 26.文本分析前要做预处理,需对文档里的文本做 **填空 1** 分割、 **填空 2** 切分。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-7ee3ebb7f221a59a95a33bb2dd76fecf61" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-7ee3ebb7f221a59a95a33bb2dd76fecf61" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**句子** 填空 2:**词** <p></p></div></div></div> 27.文本数据也可可视化,常见的方法为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-e4593f003d3154c1294d53f119fd4c387" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-e4593f003d3154c1294d53f119fd4c387" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**词云** <p></p></div></div></div> 28.把文档的内容简要概括,称为文档 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-e56ace2cc3d1dbbede635359c8bb747685" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-e56ace2cc3d1dbbede635359c8bb747685" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**摘要** <p></p></div></div></div> 29.字符串可以用 **填空 1** 或者 **填空 2** 括起来。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0c600288f83c2821bc443379b987c51189" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0c600288f83c2821bc443379b987c51189" class="collapse collapse-content"><p></p> 填空 1:**单引号** 填空 2:**双引号** <p></p></div></div></div> # 简答题 1.请简述什么是大数据傲慢: <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-029d6e43506e0b9356f5ea54df5b67ae35" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-029d6e43506e0b9356f5ea54df5b67ae35" class="collapse collapse-content"><p></p> 以为利用大数据,就能完全忽略和取代传统数据收集方法。 <p></p></div></div></div> 2.趋势预测与关联分析的不同之处在于? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-dfc5e1b3e1dc1c581cacdef77c1a7a3f27" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-dfc5e1b3e1dc1c581cacdef77c1a7a3f27" class="collapse collapse-content"><p></p> 前者着重数据之间的相关关系建模,后者着重挖掘数据之间相关关系的存在 <p></p></div></div></div> 3.简述基于内容的推荐算法思路: <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-98c2beeb2fcd3c3edbbb0301bac3232515" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-98c2beeb2fcd3c3edbbb0301bac3232515" class="collapse collapse-content"><p></p> 根据物品的内容来分类,类似的物品间进行推荐 <p></p></div></div></div> 4.简述数据清洗阶段平滑噪声数据常见的三种方法? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-171947efd071d74b84df530302c6dd1198" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-171947efd071d74b84df530302c6dd1198" class="collapse collapse-content"><p></p> 1. 分箱 2. 回归 3. 聚类 <p></p></div></div></div> 5.试述购物篮分析法有几种分类及所它们所应用的场所。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-83e084c2ffccbbdc960250d8417e239e34" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-83e084c2ffccbbdc960250d8417e239e34" class="collapse collapse-content"><p></p> 购物篮分析法有2种分类。 - 第一类是**美式购物篮分析法**,适用于**卖场面积大**、**商品种类多**、**商品陈列区域距离相差大**的卖场,类似于沃尔玛; - 第二类是**日式购物篮分析法**,适用于**营业面积小**,**商品种类少**、**商品陈列区域距离相差小**的卖场,类似于便利店。 <p></p></div></div></div> 6.简述什么是最佳拟合线。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-bff694bd685844cb6947fd39754648a631" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-bff694bd685844cb6947fd39754648a631" class="collapse collapse-content"><p></p> 最佳拟合线指在散点图上绘制一条直线,使得这条直线尽可能通过数据点。 <p></p></div></div></div> 7.趋势预测的原理是? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a7a176629347bab64316a6a5010078de81" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a7a176629347bab64316a6a5010078de81" class="collapse collapse-content"><p></p> 收集与要预测的变量可能相关的数据,建立预测模型 <p></p></div></div></div> 8.简述基于人口统计学的推荐算法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c169f897209809e511a0a0a85db430fc64" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c169f897209809e511a0a0a85db430fc64" class="collapse collapse-content"><p></p> 给用户来进行分类,根据用户的喜好推荐给相似的用户。 <p></p></div></div></div> 9.简述Apriori算法中的频繁集与关联规则 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-2676b0c31c379696c1a9d36080580f0d81" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-2676b0c31c379696c1a9d36080580f0d81" class="collapse collapse-content"><p></p> **频繁集**:是指经常在一起购买的物品集合。 **关联规则**:是频繁集中物品之间的影响规则。 <p></p></div></div></div> 10.简述决策树方法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c680e8671677c1848b28b74b5f48e0c099" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c680e8671677c1848b28b74b5f48e0c099" class="collapse collapse-content"><p></p> 决策树方法是人们把决策问题的自然状态或条件出现的概率、行动方案、益损值、预测结果等,用一个树状图表示出来,并利用该图反映出人们思考、预测、决策的全过程。 <p></p></div></div></div> 11.决策树有那些步骤? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-87f4a71f3cd30ea9feb6d1c4c64d2f7a56" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-87f4a71f3cd30ea9feb6d1c4c64d2f7a56" class="collapse collapse-content"><p></p> 1. 特征选择 2. 决策树的生成 3. 决策树的修剪 <p></p></div></div></div> 12.简述数据可视化 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-1b4574bc9f4e71a4a61726a6b25a61079" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-1b4574bc9f4e71a4a61726a6b25a61079" class="collapse collapse-content"><p></p> 数据可视化是指利用计算机图形学等技术,将数据通过图形化的方式展示出来,直观地表达数据中蕴含的信息、规律和逻辑,便于用户进行观察和理解。 <p></p></div></div></div> 13.简述大数据平台一般处理流程 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5bbbd064565c20bd951ea20846c0978a38" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5bbbd064565c20bd951ea20846c0978a38" class="collapse collapse-content"><p></p> 1. 数据采集 2. 数据存储 3. 数据处理 4. 数据展现 <p></p></div></div></div> 14.简述传统商业数据 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-886173e76ac7f084065f5429241c03f512" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-886173e76ac7f084065f5429241c03f512" class="collapse collapse-content"><p></p> 传统商业数据指来自于各类企业ERP系统、各种POS终端及网上支付系统等业务系统的数据,包括审计和日志等自动生成的信息。 <p></p></div></div></div> 15.简述互联网数据 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-11687782082afc910cc62749c14cf1ed4" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-11687782082afc910cc62749c14cf1ed4" class="collapse collapse-content"><p></p> 互联网数据是指网络空间交互过程中产生的大量数据 <p></p></div></div></div> 16.为什么要进行数据预处理? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-1ae1e48043d57c18920211c0cc2a391695" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-1ae1e48043d57c18920211c0cc2a391695" class="collapse collapse-content"><p></p> 由于庞大的数据库和繁多的异构数据源,当今现实世界的数据库极易受噪声、默认值和不一致数据的侵扰,低质量的数据将导致低质量的挖掘结果,故需要进行数据预处理。 <p></p></div></div></div> 17.简述大数据预处理的方法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a2fa3e3bacd99ea7d3726a75c53fc52694" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a2fa3e3bacd99ea7d3726a75c53fc52694" class="collapse collapse-content"><p></p> 1. 数据清洗 2. 数据集成 3. 数据变换 4. 数据归约 <p></p></div></div></div> 18.简述数据清洗目的 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-2b3db375f9e27b54823418fe98a65e9e39" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-2b3db375f9e27b54823418fe98a65e9e39" class="collapse collapse-content"><p></p> 数据清洗目的在于纠正存在的错误,并提供数据一致性。 <p></p></div></div></div> 19.数据规范化有那些算法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5886dde911573b0f258c7b2d521d614222" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5886dde911573b0f258c7b2d521d614222" class="collapse collapse-content"><p></p> 1. 归一化 2. 标准化 3. 中心化 <p></p></div></div></div> 20.大数据存储面临那些挑战 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-75e4929a9479a2c39a2c6869515d195a93" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-75e4929a9479a2c39a2c6869515d195a93" class="collapse collapse-content"><p></p> 1. 容量问题 2. 延迟问题 3. 安全问题 4. 成本问题 5. 数据的积累 6. 灵活性 7. 应用感知 <p></p></div></div></div> # 主观题 ### 1.计算准确率与召回率 计算信息检索系统评价指标,一个是准确率,一个是召回率如下图的检索结果,请计算此系统的准确率和召回率。 | | 实际上相关的文档 | 实际上不相关的文档 | | -------------------------------------- | ------------------ | -------------------- | | 检索系统返回的、判断为相关的文档 | 15 | 3 | | 检索系统不返回的、判断为不相关的文档 | 6 | 3 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c7ade04dc5254c3a424bba5d68fc1dbc20" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c7ade04dc5254c3a424bba5d68fc1dbc20" class="collapse collapse-content"><p></p> 没什么好分析的看图上公式:  <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-331beac20ab137ce2ea9655a4c4b374763" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-331beac20ab137ce2ea9655a4c4b374763" class="collapse collapse-content"><p></p> 准确率 = $\frac{15+3}{15+3+6+3}\quad$ = $\frac{2}{3}\quad$ 召回率 = $\frac{15}{15+6}\quad$ = $\frac{5}{7}\quad$ <p></p></div></div></div> ### 2.简单线性回归 这是一家4S店投放的广告和销售量的记录表,假设投放的广告量为15,用简单线性回归模型预测销售是多少? | 广告量 | 销售量 | | -------- | -------- | | 1 | 6 | | 7 | 21 | | 3 | 10 | | 5 | 18 | | 9 | 25 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5d11f06559349304189300762e59c6aa26" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5d11f06559349304189300762e59c6aa26" class="collapse collapse-content"><p></p> 带入简单线性回归方程即可: $$ y=kx+b $$ $$ k=\frac{\sum_{i=1}^{n}(x_i-\overline{\text{x}} )(y_i-\overline{\text{y}} )}{\sum_{i=1}^{n}(x_i-\overline{\text{x}} )^2}\quad $$ $$ b=\overline{\text{y}}-b_1\overline{\text{x}} $$ <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b2f99f05ae226bc691005ab556c951ca16" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b2f99f05ae226bc691005ab556c951ca16" class="collapse collapse-content"><p></p> ```matlab 答:由题意,假设广告量为自变量x,销售量为应变量y。 所以可设线性回归方程 y = kx+b 其中k为斜率,b为截距。 根据记录表的数据可知: 平均广告量x` = (1 + 7 + 3 + 5 + 9)/5 = 5 平均销售量y` = (6 + 21 + 10 + 18 + 25)/5 = 16 所以 k = [(1-5)*(6-16)+(7-5)*(21-16)+(3-5)*(10-16)+(5-5)*(18-16)+(9-5)*(25-16)]/[(1-5)^2+(7-5)^2+(3-5)^2+(5-5)^2+(9-5)^2] = (-4*(-10)+2*5+(-2)*(-6)+4*9)/(16+4+4+16) = 98/40 = 2.45 由 b = y` - kx` = 16 - 2.45*5 = 3.75 得到回归方程为 y = 2.45x + 3.75 所以当广告量为 15 时 2.45*15 + 3.75 = 40.5 ``` <p></p></div></div></div> ### 3.K近邻(KNN)算法 假设下表是判断糖尿病的训练集,请用K近邻(KNN)算法来预测第8个用户是否患病,若k=5, 采用欧式距离为距离度量,请写出预测结果。 | 编号 | k | u | class | | ------ | --- | --- | ------- | | 1 | 2 | 3 | 0 | | 2 | 4 | 4 | 1 | | 3 | 6 | 2 | 1 | | 4 | 1 | 4 | 1 | | 5 | 3 | 7 | 0 | | 6 | 5 | 2 | 1 | | 7 | 6 | 4 | 0 | | 8 | 3 | 8 | ? | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5ff4e9b7b29254e23fe14019bceadb1e69" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5ff4e9b7b29254e23fe14019bceadb1e69" class="collapse collapse-content"><p></p> 首先给到需要使用到的公式: * 欧式距离公式:$d=\sqrt{\sum_{j=1}^n(x_j-y_j)^2}\quad$ $$ 比如说A(a_1,b_1,c_1),B(a_2,b_2,c_2): $$ $$ d=\sqrt{(a_1-a_2)^2+(b_1-b_2)^2+(c_1-c_2)^2}\quad $$ - 曼哈顿距离公式:$d=\sum_{j=1}^n|x_j-y_j|$ $$ 比如说A(a_1,b_1,c_1),B(a_2,b_2,c_2): $$ $$ d=|a_1-a_2|+|b_1-b_2|+|c_1-c_2| $$ KNN计算步骤: 1. 计算所有数据与预测数据的距离 2. 按照距离从近到远进行排序 3. 选取k组(一般题目会给出k值)中类别较多的一组 4. 将需要预测的一组归类为广告选取的类别 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c500ab198817a3fc14a3292b8e17540937" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c500ab198817a3fc14a3292b8e17540937" class="collapse collapse-content"><p></p> ```matlab 答:由题意,k=5,使用欧氏距离。 各个编号与第8个用户计算得到的距离为: dis[1] = sqrt((2-3)^2+(3-8)^2) = sqrt(26) dis[2] = sqrt((4-3)^2+(4-8)^2) = sqrt(17) dis[3] = sqrt((6-3)^2+(2-8)^2) = sqrt(45) dis[4] = sqrt((1-3)^2+(4-8)^2) = sqrt(18) dis[5] = sqrt((3-3)^2+(7-8)^2) = sqrt(1) dis[6] = sqrt((5-3)^2+(2-8)^2) = sqrt(40) dis[7] = sqrt((6-3)^2+(4-8)^2) = sqrt(25) 注:其中sqrt为根号 由k=5,选取5个与第8个用户最近的邻居。 选取结果为:dis[5]、dis[2]、dis[4]、dis[7]、dis[1] 其中 class 为 0 的令居有dis[5]、dis[7]、dis[1] 一共3个。 class 为 1 的令居有dis[2]、dis[2] 一共2个。 综上所述:第8个用户class 为 0,即第8个用户不患病。 ``` <p></p></div></div></div> ### 4.计算支持度、置信度、提高度 有购物数据集如下,请计算支持度S(面包- -> 牛奶),及置信度C(面包一-> 牛奶),提高度L(面包一-> 牛奶)。 | 购物记录 | 商品 | | ---------- | -------------------------------- | | 1 | 啤酒、面包、薯条、阿司匹林 | | 2 | 尿布、面包、葡萄酒、米糊、牛奶 | | 3 | 雪碧、薯条、牛奶 | | 4 | 啤酒、牛奶、冰淇淋、薯条 | | 5 | 雪碧、咖啡、牛奶、面包、啤酒 | | 6 | 啤酒、薯条 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0d66f2327115ff2c6e2a5fd5ec97327353" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0d66f2327115ff2c6e2a5fd5ec97327353" class="collapse collapse-content"><p></p> 废话少说,上公式 <div class="tip inlineBlock info"> **1.支持度** </div> - 支持度$S(A→B)$指的是<span style='color:#A52A2A'>**A与B同时出现的概率**</span> 计算公式: $S(A→B)=\frac{N(A\&B)}{N}\quad$ <div class="tip inlineBlock info"> **2.置信度** </div> - 置信度$C(A→B)$指的是<span style='color:#A52A2A'>**A出现的情况下B同时出现的概率**</span> 计算公式: $C(A→B)=\frac{N(A\&B)}{N(A)}\quad$ <div class="tip inlineBlock info"> **3.提高度** </div> - 提高度$L(A→B)$指的是<span style='color:#A52A2A'>**A出现的情况对于B出现的影响度**</span> 计算公式: $L(A→B)=\frac{C(A→B)}{S(B)}\quad$ <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-76d891756bcb94b0f2aa1f168b76613a85" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-76d891756bcb94b0f2aa1f168b76613a85" class="collapse collapse-content"><p></p> ```matlab 答:根据购物数据集 S(面包->牛奶) = N(面包&牛奶)/N = 2/6 = 1/3 C(面包->牛奶) = N(面包&牛奶)/N(面包) = 2/3 S(牛奶) = 4/6 = 2/3 L(面包->牛奶) = C(面包->牛奶) /S(牛奶)=(2/3)/(2/3)=1 ``` <p></p></div></div></div> ### 5.K-means聚类算法 假设采用K-means聚类算法将下表的用户分成两类,请描述K-means聚类算法步骤,距离函数自由选定。 | 用户 | A | B | C | | ------ | --- | --- | --- | | 1 | 1 | 1 | 2 | | 2 | 2 | 4 | 1 | | 3 | 4 | 6 | 7 | | 4 | 3 | 1 | 3 | | 5 | 1 | 2 | 1 | | 6 | 6 | 3 | 2 | | 7 | 5 | 5 | 4 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-e9b5210d6c7372def29b2a4434dbab0691" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-e9b5210d6c7372def29b2a4434dbab0691" class="collapse collapse-content"><p></p> 下面视频教程: <iframe class="iframe_video" src="https://player.bilibili.com/player.html?aid=797539164&bvid=BV1py4y1r7DN&cid=249834109&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe> <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3939675c48b4058a270d686f856ed66a31" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3939675c48b4058a270d686f856ed66a31" class="collapse collapse-content"><p></p> ```matlab 答:根据表中数据,我选择曼哈顿距离公式。 将这7个数据中,3、4设置为类心1、2。 根据曼哈顿距离公式其余各点到两个类心的距离 (1)到类心3的距离: G1[1] = |1-4| + |1-6| + |2-7| = 13 G1[2] = |2-4| + |4-6| + |1-7| = 10 G1[3] = |4-4| + |6-6| + |7-7| = 0 G1[4] = |3-4| + |1-6| + |3-7| = 10 G1[5] = |1-4| + |2-6| + |1-7| = 13 G1[6] = |6-4| + |3-6| + |2-7| = 10 G1[7] = |5-4| + |5-6| + |4-7| = 5 (2)到类心4的距离: G2[1] = |1-3| + |1-1| + |2-3| = 3 G2[2] = |2-3| + |4-1| + |1-3| = 6 G2[3] = |4-3| + |6-1| + |7-3| = 10 G2[4] = |3-3| + |1-1| + |3-3| = 0 G2[5] = |1-3| + |2-1| + |1-3| = 5 G2[6] = |6-3| + |3-1| + |2-3| = 6 G2[7] = |5-3| + |5-1| + |4-3| = 7 根据每个点距离最近的类 类1:3、7 类2:1、2、4、5、6 根据两个类所有的点,可得两类的平均坐标 A(4.5, 5.5, 5.5) B(2.6, 2.2, 1.8) 即获得新的两类坐标。 再计算各个点到两类的距离 (1)到类A的距离 dis_A[1] = |1-4.5| + |1-5.5| + |2-5.5| = 11.5 dis_A[2] = |2-4.5| + |4-5.5| + |1-5.5| = 8.5 dis_A[3] = |4-4.5| + |6-5.5| + |7-5.5| = 2.5 dis_A[4] = |3-4.5| + |6-5.5| + |7-5.5| = 3.5 dis_A[5] = |1-4.5| + |2-5.5| + |1-5.5| = 11.5 dis_A[6] = |6-4.5| + |3-5.5| + |2-5.5| = 8.5 dis_A[7] = |5-4.5| + |5-5.5| + |4-5.5| = 2.5 (2)到类B的距离 dis_B[1] = |1-2.6| + |1-2.2| + |2-1.8| = 3 dis_B[2] = |2-2.6| + |4-2.2| + |1-1.8| = 3.2 dis_B[3] = |4-2.6| + |6-2.2| + |7-1.8| = 11.8 dis_B[4] = |3-2.6| + |1-2.2| + |3-1.8| = 3.8 dis_B[5] = |1-2.6| + |2-2.2| + |1-1.8| = 3.6 dis_B[6] = |6-2.6| + |3-2.2| + |2-1.8| = 4.4 dis_B[7] = |5-2.6| + |5-2.2| + |4-1.8| = 8.4 根据每个点距离最近的类 类1:3、7 类2:1、2、4、5、6 由于关联点没有变化,所以停止计算。 所以按照题目要求,用户分为了 第一类:3、7 第二类:1、2、4、5、6 ``` <p></p></div></div></div> 最后修改:2021 年 12 月 25 日 © 允许规范转载 打赏 赞赏作者 支付宝微信 赞 3 如果觉得我的文章对你有用,请随意赞赏