Loading... <div class="tip inlineBlock success"> 选择题看历次课前测试 </div> # 常见的有监督学习算法 * 支持向量机(Support Vector Machines * 线性回归(linear regression) * 逻辑回归(logistic regression) * 朴素贝叶斯(naive Bayes) * 线性判别分析(linear discriminant analysis) * 决策树(decision trees) * K-近邻(k-nearest neighbor algorithm) * Multilayer perceptron <div class="tip inlineBlock info"> K-means算法为无监督学习算法 </div> # 经典案例分类 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-8c55d35592c2e6763800699712f8ada489" aria-expanded="true"><div class="accordion-toggle"><span style="">关联分析</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-8c55d35592c2e6763800699712f8ada489" class="collapse collapse-content"><p></p> * **购物篮分析法** * **亚马逊的个性化推荐** * 潘多拉音乐组计划 * 塔吉特的大数据营销 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-aab658757ba4706659f4756ba2edd87920" aria-expanded="true"><div class="accordion-toggle"><span style="">趋势预测</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-aab658757ba4706659f4756ba2edd87920" class="collapse collapse-content"><p></p> - **谷歌流感趋势** - **奥斯卡预测** - Farecast案例详析 - Decide案例详析 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-6ff337a358c15ce7b0b1c20ed9a4dcf773" aria-expanded="true"><div class="accordion-toggle"><span style="">决策支持</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-6ff337a358c15ce7b0b1c20ed9a4dcf773" class="collapse collapse-content"><p></p> - **美国总统大选** - 《纸牌屋》 <p></p></div></div></div> # 填空题 1.数据科学是一门通过 **填空 1** 来获取 **填空 2** 的科学。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a2323a2fa9c51a6ce448cfa1dec1495335" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a2323a2fa9c51a6ce448cfa1dec1495335" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**系统性研究** 填空 2:**与数据相关的知识体系** <p></p></div></div></div> 2.可视化领域包括三个主要分支,分别是 **填空 1** 、 **填空 2** 以及 **填空 3**等。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-2b5a397afbca1823a9bacccf4046c68361" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-2b5a397afbca1823a9bacccf4046c68361" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**科学可视化** 填空 2:**信息可视化** 填空 3:**可视分析** <p></p></div></div></div> 3.大数据能被用于打击罪犯的特征有真实性、**填空 1**、**填空 2** 。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-54f51125d35d522b4a7df67d52d447fe53" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-54f51125d35d522b4a7df67d52d447fe53" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**多样性** 填空 2:**速度** PS:大数据的4V特征: 1. 体量 2. 多样性 3. 真实性(价值性) 4. 速度 <p></p></div></div></div> 4.谷歌流感监测是大数据在 **填空 1** 方面的应用。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-d17bb42e8618a7aaa7e6bbb0e193fa0a52" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-d17bb42e8618a7aaa7e6bbb0e193fa0a52" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**趋势预测** PS:大数据的应用 1. 预测 2. 推荐 3. 商业情报分析 4. 科学研究 <p></p></div></div></div> 5.有训练样本,有标注的机器学习称为 **填空 1** 学习,而有训练样本无标注的机器学习称为 **填空 2** 学习。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-2458b555c481586de9285bab4b3a406726" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-2458b555c481586de9285bab4b3a406726" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**有监督** 填空 2:**无监督** <p></p></div></div></div> 6.列举两个基于`python`的中文处理工具包 **填空 1** 、 **填空 2**。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-697b7a4ac2af5d52e34fda1b1a60f56d23" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-697b7a4ac2af5d52e34fda1b1a60f56d23" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**jieba** 填空 2:****thulac**** PS: * jieba【推荐】 * thulac(清华大学) :处理utf-8编码 * SnowNLP:处理unicode编码,使用时需要decode/unicode,包括情感分析部分的分词处理 * pynlpir * CoreNLP * pyNLP(哈工大) * NLPIP:可处理少数民族语言的分词包 <p></p></div></div></div> <div class="tip inlineBlock warning"> 以上是去年测试题 </div> --- <div class="tip inlineBlock info"> 以下是历次课前测试填空题 </div> 7.python中注释语句符号为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-89b341a8b28e1c396aa1c537c464a86328" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-89b341a8b28e1c396aa1c537c464a86328" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**#** <p></p></div></div></div> 8.python中 **填空 1** 数据结构能容纳不同类型的数据。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-af8b99325f8535234a4c2012c1364f0414" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-af8b99325f8535234a4c2012c1364f0414" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**list** <p></p></div></div></div> 9.python中else与if连写为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f2cc9b63be42395689fed523bc1ef8f285" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f2cc9b63be42395689fed523bc1ef8f285" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**elif** <p></p></div></div></div> 10.**填空 1**库是面向Python的机器学习软件包,它可以支持主流的有监督机器学习方法和无监督机器学习方法。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3db6fcc0bc057799b98b5e76c4e1ba8432" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3db6fcc0bc057799b98b5e76c4e1ba8432" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**scikit-learn** <p></p></div></div></div> 11.Anaconda中安装第三方库所用的命令可以在库名前加 **填空 1** 。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-99fafdcbdafbfaa2ded1729ecffa421b40" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-99fafdcbdafbfaa2ded1729ecffa421b40" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**pip install** <p></p></div></div></div> 12.**填空 1** 是 Python 语言的一个扩展程序库,支持大量的维度数组与矩阵运算,此外也针对数组运算提供大量的数学函数库。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4c26c38d9adbb1756f6453ce21e3587477" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4c26c38d9adbb1756f6453ce21e3587477" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**Numpy** <p></p></div></div></div> 13.在python中想使用numpy库时,可用 **填空 1** 命令装载它。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-e72546d2bf5351771f38ecfdd8ae6abc22" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-e72546d2bf5351771f38ecfdd8ae6abc22" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**import numpy** <p></p></div></div></div> 14.有监督学习算法与无监督学习算法的不同是,必须对训练的样本给出 **填空 1**。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0509866da2d17abefde83b1a1415929f1" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0509866da2d17abefde83b1a1415929f1" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**标签** <p></p></div></div></div> 15.机器学习框架中,首先采集、预处理数据,再针对训练集进行 **填空 1** 的设计,确定其参数,再用它对测试集进行预测。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3af8df03d0360edd3a090eae2fe7ad2910" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3af8df03d0360edd3a090eae2fe7ad2910" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**模型** <p></p></div></div></div> 16.**填空 1**和 **填空 2** 是目前常见的大数据处理平台。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-cfdbc3313c55894c05d1f5573d6497fb21" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-cfdbc3313c55894c05d1f5573d6497fb21" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**Hadoop** 填空 2:**Spark** <p></p></div></div></div> 17.python中 **填空 1** 形式用来表示下一条语句是在上一条语句的结构里。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4bd5b86f4e0cbe0a8c193e0025a707e112" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4bd5b86f4e0cbe0a8c193e0025a707e112" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**缩进** <p></p></div></div></div> 18.购物篮分析法约分为两类 **填空 1** 和 **填空 2** 购物篮分析法,两者之所以不同思路的根本原因是因为 **填空 3** 差别很大。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c176e8fc7e359c5040980f325b0a682952" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c176e8fc7e359c5040980f325b0a682952" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**美式** 填空 2:**日式** 填空 3:**营业面积** <p></p></div></div></div> 19.采用Apriori算法进行相关性分析时,需要进行两个步骤 **填空 1** 和 **填空 2** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3f430bb363bd3e05adba5ba7aab40d4744" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3f430bb363bd3e05adba5ba7aab40d4744" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**寻找频繁集** 填空 2:**挖掘关联规则** <p></p></div></div></div> 20.可视化的目的,是把 **填空1** ,首要的原则是 **填空 2** 和 **填空 3** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-afdc7a69ec74cd4d3dca61f0aa2af06896" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-afdc7a69ec74cd4d3dca61f0aa2af06896" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**把复杂数据有效地展示出来** 填空 2:**准确** 填空 3:**清晰** <p></p></div></div></div> 21.目前基于视频的车流量检测主要有 **填空 1** 和 **填空 2** 两种方法。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-e981ed60d17728ae0239ebb3ac8ab96f36" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-e981ed60d17728ae0239ebb3ac8ab96f36" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**设置虚拟线圈** 填空 2:**车辆目标跟踪** <p></p></div></div></div> 22.交通流预测分为 **填空 1** 交通流预测和 **填空 2** 交通流预测。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f40448fa8a762e4f6ed609f15e4591eb43" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f40448fa8a762e4f6ed609f15e4591eb43" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**短时** 填空 2:**中长时** <p></p></div></div></div> 23.有监督学习是指对 **填空 1** 的数据进行建模,无监督学习是对 **填空 2** 的数据进行建模。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-bce18d8a270d5bc86883831b8be1f803100" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-bce18d8a270d5bc86883831b8be1f803100" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**已标注** 填空 2:**无标注** <p></p></div></div></div> 24.信息系统的评价有两个指标: **填空 1** 、 **填空 2** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0582cee020799cea26ebb647442a1a4016" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0582cee020799cea26ebb647442a1a4016" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**准确率** 填空 2:**召回率** <p></p></div></div></div> 25.文本检索中,向量空间模型在求两个代表文本信息的向量的距离时,采用的策略是求两向量的 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-6d22d4dbab56611518ffbacacb0bd77877" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-6d22d4dbab56611518ffbacacb0bd77877" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**夹角余弦值** <p></p></div></div></div> 26.文本分析前要做预处理,需对文档里的文本做 **填空 1** 分割、 **填空 2** 切分。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-788b09babd21d644790b2e2cea33882e13" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-788b09babd21d644790b2e2cea33882e13" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**句子** 填空 2:**词** <p></p></div></div></div> 27.文本数据也可可视化,常见的方法为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-7d81ee42e3a5a90ba08142e60ea8612f20" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-7d81ee42e3a5a90ba08142e60ea8612f20" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**词云** <p></p></div></div></div> 28.把文档的内容简要概括,称为文档 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a5e2e97d928f5aa1bdea2867f3763d3444" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a5e2e97d928f5aa1bdea2867f3763d3444" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**摘要** <p></p></div></div></div> 29.字符串可以用 **填空 1** 或者 **填空 2** 括起来。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-828ba99666f8ba054f5510b8a10849e090" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-828ba99666f8ba054f5510b8a10849e090" class="collapse collapse-content"><p></p> 填空 1:**单引号** 填空 2:**双引号** <p></p></div></div></div> # 简答题 1.请简述什么是大数据傲慢: <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a86d3f81ea8e5efc9a46e310089fb32674" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a86d3f81ea8e5efc9a46e310089fb32674" class="collapse collapse-content"><p></p> 以为利用大数据,就能完全忽略和取代传统数据收集方法。 <p></p></div></div></div> 2.趋势预测与关联分析的不同之处在于? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-74793c85e4209c8da95717beddb2eb4b8" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-74793c85e4209c8da95717beddb2eb4b8" class="collapse collapse-content"><p></p> 前者着重数据之间的相关关系建模,后者着重挖掘数据之间相关关系的存在 <p></p></div></div></div> 3.简述基于内容的推荐算法思路: <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-56011bf79eb070a19a2073e27968ad3162" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-56011bf79eb070a19a2073e27968ad3162" class="collapse collapse-content"><p></p> 根据物品的内容来分类,类似的物品间进行推荐 <p></p></div></div></div> 4.简述数据清洗阶段平滑噪声数据常见的三种方法? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4ea8bd285d7a771683f85cdcea83d4e125" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4ea8bd285d7a771683f85cdcea83d4e125" class="collapse collapse-content"><p></p> 1. 分箱 2. 回归 3. 聚类 <p></p></div></div></div> 5.试述购物篮分析法有几种分类及所它们所应用的场所。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-70bae9ca476018339afa1f9a4a3a00f568" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-70bae9ca476018339afa1f9a4a3a00f568" class="collapse collapse-content"><p></p> 购物篮分析法有2种分类。 - 第一类是**美式购物篮分析法**,适用于**卖场面积大**、**商品种类多**、**商品陈列区域距离相差大**的卖场,类似于沃尔玛; - 第二类是**日式购物篮分析法**,适用于**营业面积小**,**商品种类少**、**商品陈列区域距离相差小**的卖场,类似于便利店。 <p></p></div></div></div> 6.简述什么是最佳拟合线。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-8f4487a7620beb737b2c1b35c66eb4fc61" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-8f4487a7620beb737b2c1b35c66eb4fc61" class="collapse collapse-content"><p></p> 最佳拟合线指在散点图上绘制一条直线,使得这条直线尽可能通过数据点。 <p></p></div></div></div> 7.趋势预测的原理是? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a1c3a9a5086dd58c9cb38c0b8444fb0954" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a1c3a9a5086dd58c9cb38c0b8444fb0954" class="collapse collapse-content"><p></p> 收集与要预测的变量可能相关的数据,建立预测模型 <p></p></div></div></div> 8.简述基于人口统计学的推荐算法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5b60aadecba1b390d7de37dea528490422" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5b60aadecba1b390d7de37dea528490422" class="collapse collapse-content"><p></p> 给用户来进行分类,根据用户的喜好推荐给相似的用户。 <p></p></div></div></div> 9.简述Apriori算法中的频繁集与关联规则 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-15a4085ccb2f99e19ef56a89ad0cc6c668" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-15a4085ccb2f99e19ef56a89ad0cc6c668" class="collapse collapse-content"><p></p> **频繁集**:是指经常在一起购买的物品集合。 **关联规则**:是频繁集中物品之间的影响规则。 <p></p></div></div></div> 10.简述决策树方法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-71787d620e1f9b1c11c6071c55a59b1f17" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-71787d620e1f9b1c11c6071c55a59b1f17" class="collapse collapse-content"><p></p> 决策树方法是人们把决策问题的自然状态或条件出现的概率、行动方案、益损值、预测结果等,用一个树状图表示出来,并利用该图反映出人们思考、预测、决策的全过程。 <p></p></div></div></div> 11.决策树有那些步骤? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4abe80e72ed7e8e2083005004959b14314" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4abe80e72ed7e8e2083005004959b14314" class="collapse collapse-content"><p></p> 1. 特征选择 2. 决策树的生成 3. 决策树的修剪 <p></p></div></div></div> 12.简述数据可视化 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f2881dfd1bd88953174e3da923846a1687" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f2881dfd1bd88953174e3da923846a1687" class="collapse collapse-content"><p></p> 数据可视化是指利用计算机图形学等技术,将数据通过图形化的方式展示出来,直观地表达数据中蕴含的信息、规律和逻辑,便于用户进行观察和理解。 <p></p></div></div></div> 13.简述大数据平台一般处理流程 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-48fcda12f8461df6b75d22b33f9a12c982" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-48fcda12f8461df6b75d22b33f9a12c982" class="collapse collapse-content"><p></p> 1. 数据采集 2. 数据存储 3. 数据处理 4. 数据展现 <p></p></div></div></div> 14.简述传统商业数据 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c64573234aa0441b80f01e3270ae69aa66" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c64573234aa0441b80f01e3270ae69aa66" class="collapse collapse-content"><p></p> 传统商业数据指来自于各类企业ERP系统、各种POS终端及网上支付系统等业务系统的数据,包括审计和日志等自动生成的信息。 <p></p></div></div></div> 15.简述互联网数据 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b233c92fa8f63afcaf24df6d1c0e848598" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b233c92fa8f63afcaf24df6d1c0e848598" class="collapse collapse-content"><p></p> 互联网数据是指网络空间交互过程中产生的大量数据 <p></p></div></div></div> 16.为什么要进行数据预处理? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0d3764875d19af0b020220ecdc36a2e448" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0d3764875d19af0b020220ecdc36a2e448" class="collapse collapse-content"><p></p> 由于庞大的数据库和繁多的异构数据源,当今现实世界的数据库极易受噪声、默认值和不一致数据的侵扰,低质量的数据将导致低质量的挖掘结果,故需要进行数据预处理。 <p></p></div></div></div> 17.简述大数据预处理的方法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a16192c979960ab9748a251573e807d657" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a16192c979960ab9748a251573e807d657" class="collapse collapse-content"><p></p> 1. 数据清洗 2. 数据集成 3. 数据变换 4. 数据归约 <p></p></div></div></div> 18.简述数据清洗目的 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-35e546405c218cf8a1a655e70cbb080914" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-35e546405c218cf8a1a655e70cbb080914" class="collapse collapse-content"><p></p> 数据清洗目的在于纠正存在的错误,并提供数据一致性。 <p></p></div></div></div> 19.数据规范化有那些算法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0fe27447c853c78a7a77f03d64dacf5862" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0fe27447c853c78a7a77f03d64dacf5862" class="collapse collapse-content"><p></p> 1. 归一化 2. 标准化 3. 中心化 <p></p></div></div></div> 20.大数据存储面临那些挑战 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f4f1a575b40a46eb00d7ba7de3650d1378" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f4f1a575b40a46eb00d7ba7de3650d1378" class="collapse collapse-content"><p></p> 1. 容量问题 2. 延迟问题 3. 安全问题 4. 成本问题 5. 数据的积累 6. 灵活性 7. 应用感知 <p></p></div></div></div> # 主观题 ### 1.计算准确率与召回率 计算信息检索系统评价指标,一个是准确率,一个是召回率如下图的检索结果,请计算此系统的准确率和召回率。 | | 实际上相关的文档 | 实际上不相关的文档 | | -------------------------------------- | ------------------ | -------------------- | | 检索系统返回的、判断为相关的文档 | 15 | 3 | | 检索系统不返回的、判断为不相关的文档 | 6 | 3 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-8ee17030d30686a1dd2d3ea678d7c65f8" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-8ee17030d30686a1dd2d3ea678d7c65f8" class="collapse collapse-content"><p></p> 没什么好分析的看图上公式:  <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-8276c6e9d61c4c6dfa020788a149b3be71" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-8276c6e9d61c4c6dfa020788a149b3be71" class="collapse collapse-content"><p></p> 准确率 = $\frac{15+3}{15+3+6+3}\quad$ = $\frac{2}{3}\quad$ 召回率 = $\frac{15}{15+6}\quad$ = $\frac{5}{7}\quad$ <p></p></div></div></div> ### 2.简单线性回归 这是一家4S店投放的广告和销售量的记录表,假设投放的广告量为15,用简单线性回归模型预测销售是多少? | 广告量 | 销售量 | | -------- | -------- | | 1 | 6 | | 7 | 21 | | 3 | 10 | | 5 | 18 | | 9 | 25 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-65ad4d5a70d1abaf567fc88b7773969720" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-65ad4d5a70d1abaf567fc88b7773969720" class="collapse collapse-content"><p></p> 带入简单线性回归方程即可: $$ y=kx+b $$ $$ k=\frac{\sum_{i=1}^{n}(x_i-\overline{\text{x}} )(y_i-\overline{\text{y}} )}{\sum_{i=1}^{n}(x_i-\overline{\text{x}} )^2}\quad $$ $$ b=\overline{\text{y}}-b_1\overline{\text{x}} $$ <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0772c41052bce09d4c5e13d516fc34ca94" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0772c41052bce09d4c5e13d516fc34ca94" class="collapse collapse-content"><p></p> ```matlab 答:由题意,假设广告量为自变量x,销售量为应变量y。 所以可设线性回归方程 y = kx+b 其中k为斜率,b为截距。 根据记录表的数据可知: 平均广告量x` = (1 + 7 + 3 + 5 + 9)/5 = 5 平均销售量y` = (6 + 21 + 10 + 18 + 25)/5 = 16 所以 k = [(1-5)*(6-16)+(7-5)*(21-16)+(3-5)*(10-16)+(5-5)*(18-16)+(9-5)*(25-16)]/[(1-5)^2+(7-5)^2+(3-5)^2+(5-5)^2+(9-5)^2] = (-4*(-10)+2*5+(-2)*(-6)+4*9)/(16+4+4+16) = 98/40 = 2.45 由 b = y` - kx` = 16 - 2.45*5 = 3.75 得到回归方程为 y = 2.45x + 3.75 所以当广告量为 15 时 2.45*15 + 3.75 = 40.5 ``` <p></p></div></div></div> ### 3.K近邻(KNN)算法 假设下表是判断糖尿病的训练集,请用K近邻(KNN)算法来预测第8个用户是否患病,若k=5, 采用欧式距离为距离度量,请写出预测结果。 | 编号 | k | u | class | | ------ | --- | --- | ------- | | 1 | 2 | 3 | 0 | | 2 | 4 | 4 | 1 | | 3 | 6 | 2 | 1 | | 4 | 1 | 4 | 1 | | 5 | 3 | 7 | 0 | | 6 | 5 | 2 | 1 | | 7 | 6 | 4 | 0 | | 8 | 3 | 8 | ? | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f6eb98e8eecead70511298765a65659c67" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f6eb98e8eecead70511298765a65659c67" class="collapse collapse-content"><p></p> 首先给到需要使用到的公式: * 欧式距离公式:$d=\sqrt{\sum_{j=1}^n(x_j-y_j)^2}\quad$ $$ 比如说A(a_1,b_1,c_1),B(a_2,b_2,c_2): $$ $$ d=\sqrt{(a_1-a_2)^2+(b_1-b_2)^2+(c_1-c_2)^2}\quad $$ - 曼哈顿距离公式:$d=\sum_{j=1}^n|x_j-y_j|$ $$ 比如说A(a_1,b_1,c_1),B(a_2,b_2,c_2): $$ $$ d=|a_1-a_2|+|b_1-b_2|+|c_1-c_2| $$ KNN计算步骤: 1. 计算所有数据与预测数据的距离 2. 按照距离从近到远进行排序 3. 选取k组(一般题目会给出k值)中类别较多的一组 4. 将需要预测的一组归类为广告选取的类别 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3f0b1854f242e27bebdf5093a7388d9553" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3f0b1854f242e27bebdf5093a7388d9553" class="collapse collapse-content"><p></p> ```matlab 答:由题意,k=5,使用欧氏距离。 各个编号与第8个用户计算得到的距离为: dis[1] = sqrt((2-3)^2+(3-8)^2) = sqrt(26) dis[2] = sqrt((4-3)^2+(4-8)^2) = sqrt(17) dis[3] = sqrt((6-3)^2+(2-8)^2) = sqrt(45) dis[4] = sqrt((1-3)^2+(4-8)^2) = sqrt(18) dis[5] = sqrt((3-3)^2+(7-8)^2) = sqrt(1) dis[6] = sqrt((5-3)^2+(2-8)^2) = sqrt(40) dis[7] = sqrt((6-3)^2+(4-8)^2) = sqrt(25) 注:其中sqrt为根号 由k=5,选取5个与第8个用户最近的邻居。 选取结果为:dis[5]、dis[2]、dis[4]、dis[7]、dis[1] 其中 class 为 0 的令居有dis[5]、dis[7]、dis[1] 一共3个。 class 为 1 的令居有dis[2]、dis[2] 一共2个。 综上所述:第8个用户class 为 0,即第8个用户不患病。 ``` <p></p></div></div></div> ### 4.计算支持度、置信度、提高度 有购物数据集如下,请计算支持度S(面包- -> 牛奶),及置信度C(面包一-> 牛奶),提高度L(面包一-> 牛奶)。 | 购物记录 | 商品 | | ---------- | -------------------------------- | | 1 | 啤酒、面包、薯条、阿司匹林 | | 2 | 尿布、面包、葡萄酒、米糊、牛奶 | | 3 | 雪碧、薯条、牛奶 | | 4 | 啤酒、牛奶、冰淇淋、薯条 | | 5 | 雪碧、咖啡、牛奶、面包、啤酒 | | 6 | 啤酒、薯条 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-437c4fefcc9005f3bc0b9ad11ec2c96474" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-437c4fefcc9005f3bc0b9ad11ec2c96474" class="collapse collapse-content"><p></p> 废话少说,上公式 <div class="tip inlineBlock info"> **1.支持度** </div> - 支持度$S(A→B)$指的是<span style='color:#A52A2A'>**A与B同时出现的概率**</span> 计算公式: $S(A→B)=\frac{N(A\&B)}{N}\quad$ <div class="tip inlineBlock info"> **2.置信度** </div> - 置信度$C(A→B)$指的是<span style='color:#A52A2A'>**A出现的情况下B同时出现的概率**</span> 计算公式: $C(A→B)=\frac{N(A\&B)}{N(A)}\quad$ <div class="tip inlineBlock info"> **3.提高度** </div> - 提高度$L(A→B)$指的是<span style='color:#A52A2A'>**A出现的情况对于B出现的影响度**</span> 计算公式: $L(A→B)=\frac{C(A→B)}{S(B)}\quad$ <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-121626e125b46c5d203c8c8ecb38171f68" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-121626e125b46c5d203c8c8ecb38171f68" class="collapse collapse-content"><p></p> ```matlab 答:根据购物数据集 S(面包->牛奶) = N(面包&牛奶)/N = 2/6 = 1/3 C(面包->牛奶) = N(面包&牛奶)/N(面包) = 2/3 S(牛奶) = 4/6 = 2/3 L(面包->牛奶) = C(面包->牛奶) /S(牛奶)=(2/3)/(2/3)=1 ``` <p></p></div></div></div> ### 5.K-means聚类算法 假设采用K-means聚类算法将下表的用户分成两类,请描述K-means聚类算法步骤,距离函数自由选定。 | 用户 | A | B | C | | ------ | --- | --- | --- | | 1 | 1 | 1 | 2 | | 2 | 2 | 4 | 1 | | 3 | 4 | 6 | 7 | | 4 | 3 | 1 | 3 | | 5 | 1 | 2 | 1 | | 6 | 6 | 3 | 2 | | 7 | 5 | 5 | 4 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-46530e420dd7ca34af6575559786d87723" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-46530e420dd7ca34af6575559786d87723" class="collapse collapse-content"><p></p> 下面视频教程: <iframe class="iframe_video" src="https://player.bilibili.com/player.html?aid=797539164&bvid=BV1py4y1r7DN&cid=249834109&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe> <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5080649bdbac52f7c238995a201fe40224" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5080649bdbac52f7c238995a201fe40224" class="collapse collapse-content"><p></p> ```matlab 答:根据表中数据,我选择曼哈顿距离公式。 将这7个数据中,3、4设置为类心1、2。 根据曼哈顿距离公式其余各点到两个类心的距离 (1)到类心3的距离: G1[1] = |1-4| + |1-6| + |2-7| = 13 G1[2] = |2-4| + |4-6| + |1-7| = 10 G1[3] = |4-4| + |6-6| + |7-7| = 0 G1[4] = |3-4| + |1-6| + |3-7| = 10 G1[5] = |1-4| + |2-6| + |1-7| = 13 G1[6] = |6-4| + |3-6| + |2-7| = 10 G1[7] = |5-4| + |5-6| + |4-7| = 5 (2)到类心4的距离: G2[1] = |1-3| + |1-1| + |2-3| = 3 G2[2] = |2-3| + |4-1| + |1-3| = 6 G2[3] = |4-3| + |6-1| + |7-3| = 10 G2[4] = |3-3| + |1-1| + |3-3| = 0 G2[5] = |1-3| + |2-1| + |1-3| = 5 G2[6] = |6-3| + |3-1| + |2-3| = 6 G2[7] = |5-3| + |5-1| + |4-3| = 7 根据每个点距离最近的类 类1:3、7 类2:1、2、4、5、6 根据两个类所有的点,可得两类的平均坐标 A(4.5, 5.5, 5.5) B(2.6, 2.2, 1.8) 即获得新的两类坐标。 再计算各个点到两类的距离 (1)到类A的距离 dis_A[1] = |1-4.5| + |1-5.5| + |2-5.5| = 11.5 dis_A[2] = |2-4.5| + |4-5.5| + |1-5.5| = 8.5 dis_A[3] = |4-4.5| + |6-5.5| + |7-5.5| = 2.5 dis_A[4] = |3-4.5| + |6-5.5| + |7-5.5| = 3.5 dis_A[5] = |1-4.5| + |2-5.5| + |1-5.5| = 11.5 dis_A[6] = |6-4.5| + |3-5.5| + |2-5.5| = 8.5 dis_A[7] = |5-4.5| + |5-5.5| + |4-5.5| = 2.5 (2)到类B的距离 dis_B[1] = |1-2.6| + |1-2.2| + |2-1.8| = 3 dis_B[2] = |2-2.6| + |4-2.2| + |1-1.8| = 3.2 dis_B[3] = |4-2.6| + |6-2.2| + |7-1.8| = 11.8 dis_B[4] = |3-2.6| + |1-2.2| + |3-1.8| = 3.8 dis_B[5] = |1-2.6| + |2-2.2| + |1-1.8| = 3.6 dis_B[6] = |6-2.6| + |3-2.2| + |2-1.8| = 4.4 dis_B[7] = |5-2.6| + |5-2.2| + |4-1.8| = 8.4 根据每个点距离最近的类 类1:3、7 类2:1、2、4、5、6 由于关联点没有变化,所以停止计算。 所以按照题目要求,用户分为了 第一类:3、7 第二类:1、2、4、5、6 ``` <p></p></div></div></div> 最后修改:2021 年 12 月 25 日 © 允许规范转载 打赏 赞赏作者 支付宝微信 赞 3 如果觉得我的文章对你有用,请随意赞赏