Loading... <div class="tip inlineBlock success"> 选择题看历次课前测试 </div> # 常见的有监督学习算法 * 支持向量机(Support Vector Machines * 线性回归(linear regression) * 逻辑回归(logistic regression) * 朴素贝叶斯(naive Bayes) * 线性判别分析(linear discriminant analysis) * 决策树(decision trees) * K-近邻(k-nearest neighbor algorithm) * Multilayer perceptron <div class="tip inlineBlock info"> K-means算法为无监督学习算法 </div> # 经典案例分类 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a5f67b8e6ddcc3a6b97831fa9807baaf87" aria-expanded="true"><div class="accordion-toggle"><span style="">关联分析</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a5f67b8e6ddcc3a6b97831fa9807baaf87" class="collapse collapse-content"><p></p> * **购物篮分析法** * **亚马逊的个性化推荐** * 潘多拉音乐组计划 * 塔吉特的大数据营销 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f9c6b8afcbd09b877ea4b19f126a527f0" aria-expanded="true"><div class="accordion-toggle"><span style="">趋势预测</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f9c6b8afcbd09b877ea4b19f126a527f0" class="collapse collapse-content"><p></p> - **谷歌流感趋势** - **奥斯卡预测** - Farecast案例详析 - Decide案例详析 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b3ba315f691d9639d9c96d0a5c9efd7155" aria-expanded="true"><div class="accordion-toggle"><span style="">决策支持</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b3ba315f691d9639d9c96d0a5c9efd7155" class="collapse collapse-content"><p></p> - **美国总统大选** - 《纸牌屋》 <p></p></div></div></div> # 填空题 1.数据科学是一门通过 **填空 1** 来获取 **填空 2** 的科学。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-eaedfc2eb132836801ec9ac89dc3730e67" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-eaedfc2eb132836801ec9ac89dc3730e67" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**系统性研究** 填空 2:**与数据相关的知识体系** <p></p></div></div></div> 2.可视化领域包括三个主要分支,分别是 **填空 1** 、 **填空 2** 以及 **填空 3**等。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3f5cb1531b51805317435784280604f182" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3f5cb1531b51805317435784280604f182" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**科学可视化** 填空 2:**信息可视化** 填空 3:**可视分析** <p></p></div></div></div> 3.大数据能被用于打击罪犯的特征有真实性、**填空 1**、**填空 2** 。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-22763ebac52268334622dc200f628d3462" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-22763ebac52268334622dc200f628d3462" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**多样性** 填空 2:**速度** PS:大数据的4V特征: 1. 体量 2. 多样性 3. 真实性(价值性) 4. 速度 <p></p></div></div></div> 4.谷歌流感监测是大数据在 **填空 1** 方面的应用。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-d44c488d08c8767cd34feaaba63df51137" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-d44c488d08c8767cd34feaaba63df51137" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**趋势预测** PS:大数据的应用 1. 预测 2. 推荐 3. 商业情报分析 4. 科学研究 <p></p></div></div></div> 5.有训练样本,有标注的机器学习称为 **填空 1** 学习,而有训练样本无标注的机器学习称为 **填空 2** 学习。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-29399d35bb493dee2fe1de3d73c6d6df40" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-29399d35bb493dee2fe1de3d73c6d6df40" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**有监督** 填空 2:**无监督** <p></p></div></div></div> 6.列举两个基于`python`的中文处理工具包 **填空 1** 、 **填空 2**。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-595a8e60908590ae224649f07c3cf25573" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-595a8e60908590ae224649f07c3cf25573" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**jieba** 填空 2:****thulac**** PS: * jieba【推荐】 * thulac(清华大学) :处理utf-8编码 * SnowNLP:处理unicode编码,使用时需要decode/unicode,包括情感分析部分的分词处理 * pynlpir * CoreNLP * pyNLP(哈工大) * NLPIP:可处理少数民族语言的分词包 <p></p></div></div></div> <div class="tip inlineBlock warning"> 以上是去年测试题 </div> --- <div class="tip inlineBlock info"> 以下是历次课前测试填空题 </div> 7.python中注释语句符号为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-095efe8a0575839e428835c144b1fb9652" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-095efe8a0575839e428835c144b1fb9652" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**#** <p></p></div></div></div> 8.python中 **填空 1** 数据结构能容纳不同类型的数据。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-9f19612f68586d460d328d4a68488b9b42" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-9f19612f68586d460d328d4a68488b9b42" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**list** <p></p></div></div></div> 9.python中else与if连写为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-28cde82b3e5887caeb639d258891c37888" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-28cde82b3e5887caeb639d258891c37888" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**elif** <p></p></div></div></div> 10.**填空 1**库是面向Python的机器学习软件包,它可以支持主流的有监督机器学习方法和无监督机器学习方法。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f1b0a7c3d2132659f1cedc2ca8d8d3552" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f1b0a7c3d2132659f1cedc2ca8d8d3552" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**scikit-learn** <p></p></div></div></div> 11.Anaconda中安装第三方库所用的命令可以在库名前加 **填空 1** 。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c23260bb7b5bce65710fa348932ca7601" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c23260bb7b5bce65710fa348932ca7601" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**pip install** <p></p></div></div></div> 12.**填空 1** 是 Python 语言的一个扩展程序库,支持大量的维度数组与矩阵运算,此外也针对数组运算提供大量的数学函数库。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f8b2d1c49d388cfaa6d96c77022efd4b48" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f8b2d1c49d388cfaa6d96c77022efd4b48" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**Numpy** <p></p></div></div></div> 13.在python中想使用numpy库时,可用 **填空 1** 命令装载它。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0fc621d2a439a1f8fa83e03428866c4c66" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0fc621d2a439a1f8fa83e03428866c4c66" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**import numpy** <p></p></div></div></div> 14.有监督学习算法与无监督学习算法的不同是,必须对训练的样本给出 **填空 1**。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-e857c7e5e01322fee54d75917df6066e69" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-e857c7e5e01322fee54d75917df6066e69" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**标签** <p></p></div></div></div> 15.机器学习框架中,首先采集、预处理数据,再针对训练集进行 **填空 1** 的设计,确定其参数,再用它对测试集进行预测。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-55008383e083ab9deacaff50fd0c61c333" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-55008383e083ab9deacaff50fd0c61c333" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**模型** <p></p></div></div></div> 16.**填空 1**和 **填空 2** 是目前常见的大数据处理平台。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-425bde5acab22753249413a8f8f418fc53" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-425bde5acab22753249413a8f8f418fc53" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**Hadoop** 填空 2:**Spark** <p></p></div></div></div> 17.python中 **填空 1** 形式用来表示下一条语句是在上一条语句的结构里。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-cb1886796834052b017a6f8caf0efa1294" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-cb1886796834052b017a6f8caf0efa1294" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**缩进** <p></p></div></div></div> 18.购物篮分析法约分为两类 **填空 1** 和 **填空 2** 购物篮分析法,两者之所以不同思路的根本原因是因为 **填空 3** 差别很大。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-1f8e62240c3a128ad1cac13f18da467179" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-1f8e62240c3a128ad1cac13f18da467179" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**美式** 填空 2:**日式** 填空 3:**营业面积** <p></p></div></div></div> 19.采用Apriori算法进行相关性分析时,需要进行两个步骤 **填空 1** 和 **填空 2** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-02b19ff5057976fc10aa3867ad9dc84821" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-02b19ff5057976fc10aa3867ad9dc84821" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**寻找频繁集** 填空 2:**挖掘关联规则** <p></p></div></div></div> 20.可视化的目的,是把 **填空1** ,首要的原则是 **填空 2** 和 **填空 3** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-360a54b8c9f0c167f7f1ac2d4aa367d727" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-360a54b8c9f0c167f7f1ac2d4aa367d727" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**把复杂数据有效地展示出来** 填空 2:**准确** 填空 3:**清晰** <p></p></div></div></div> 21.目前基于视频的车流量检测主要有 **填空 1** 和 **填空 2** 两种方法。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-51a2ce4e71a8ec1d9f2d35d125f9fcbb95" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-51a2ce4e71a8ec1d9f2d35d125f9fcbb95" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**设置虚拟线圈** 填空 2:**车辆目标跟踪** <p></p></div></div></div> 22.交通流预测分为 **填空 1** 交通流预测和 **填空 2** 交通流预测。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0b5d8c1a1c6a19853e247d1c4c08de9168" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0b5d8c1a1c6a19853e247d1c4c08de9168" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**短时** 填空 2:**中长时** <p></p></div></div></div> 23.有监督学习是指对 **填空 1** 的数据进行建模,无监督学习是对 **填空 2** 的数据进行建模。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-7236caadce168c4b73d309859328e55818" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-7236caadce168c4b73d309859328e55818" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**已标注** 填空 2:**无标注** <p></p></div></div></div> 24.信息系统的评价有两个指标: **填空 1** 、 **填空 2** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-065c6659e6402159b4167fd1b463e53099" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-065c6659e6402159b4167fd1b463e53099" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**准确率** 填空 2:**召回率** <p></p></div></div></div> 25.文本检索中,向量空间模型在求两个代表文本信息的向量的距离时,采用的策略是求两向量的 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-12e648e9810f90738beb151b56b113862" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-12e648e9810f90738beb151b56b113862" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**夹角余弦值** <p></p></div></div></div> 26.文本分析前要做预处理,需对文档里的文本做 **填空 1** 分割、 **填空 2** 切分。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3d2a9d5344c5fbb00804a4646fe85a6192" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3d2a9d5344c5fbb00804a4646fe85a6192" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**句子** 填空 2:**词** <p></p></div></div></div> 27.文本数据也可可视化,常见的方法为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-78fa9f989d4b214aa313d9d27b27c79652" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-78fa9f989d4b214aa313d9d27b27c79652" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**词云** <p></p></div></div></div> 28.把文档的内容简要概括,称为文档 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0fd8b0ff44da139d72af5d50cc3eff4778" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0fd8b0ff44da139d72af5d50cc3eff4778" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**摘要** <p></p></div></div></div> 29.字符串可以用 **填空 1** 或者 **填空 2** 括起来。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-79a81643da161dcc4b5d8d5d85b7d45d75" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-79a81643da161dcc4b5d8d5d85b7d45d75" class="collapse collapse-content"><p></p> 填空 1:**单引号** 填空 2:**双引号** <p></p></div></div></div> # 简答题 1.请简述什么是大数据傲慢: <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-8ec93434c85b64540accd89ade20f2a415" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-8ec93434c85b64540accd89ade20f2a415" class="collapse collapse-content"><p></p> 以为利用大数据,就能完全忽略和取代传统数据收集方法。 <p></p></div></div></div> 2.趋势预测与关联分析的不同之处在于? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b8165136f25fa1198c53388af946d21939" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b8165136f25fa1198c53388af946d21939" class="collapse collapse-content"><p></p> 前者着重数据之间的相关关系建模,后者着重挖掘数据之间相关关系的存在 <p></p></div></div></div> 3.简述基于内容的推荐算法思路: <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-bdc1d080fae42ccb6a9013344e08aaa473" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-bdc1d080fae42ccb6a9013344e08aaa473" class="collapse collapse-content"><p></p> 根据物品的内容来分类,类似的物品间进行推荐 <p></p></div></div></div> 4.简述数据清洗阶段平滑噪声数据常见的三种方法? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-196494fd2f7301857e969ae32190a6e948" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-196494fd2f7301857e969ae32190a6e948" class="collapse collapse-content"><p></p> 1. 分箱 2. 回归 3. 聚类 <p></p></div></div></div> 5.试述购物篮分析法有几种分类及所它们所应用的场所。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-90e7c3742407c9915243672b091072e92" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-90e7c3742407c9915243672b091072e92" class="collapse collapse-content"><p></p> 购物篮分析法有2种分类。 - 第一类是**美式购物篮分析法**,适用于**卖场面积大**、**商品种类多**、**商品陈列区域距离相差大**的卖场,类似于沃尔玛; - 第二类是**日式购物篮分析法**,适用于**营业面积小**,**商品种类少**、**商品陈列区域距离相差小**的卖场,类似于便利店。 <p></p></div></div></div> 6.简述什么是最佳拟合线。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-7b07b0e520c470a58050a08424e82bd146" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-7b07b0e520c470a58050a08424e82bd146" class="collapse collapse-content"><p></p> 最佳拟合线指在散点图上绘制一条直线,使得这条直线尽可能通过数据点。 <p></p></div></div></div> 7.趋势预测的原理是? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a9238cccd77b5f638ab1b328626f602d69" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a9238cccd77b5f638ab1b328626f602d69" class="collapse collapse-content"><p></p> 收集与要预测的变量可能相关的数据,建立预测模型 <p></p></div></div></div> 8.简述基于人口统计学的推荐算法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-1239761d3929c35d4b0e3c495ddd5ffd89" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-1239761d3929c35d4b0e3c495ddd5ffd89" class="collapse collapse-content"><p></p> 给用户来进行分类,根据用户的喜好推荐给相似的用户。 <p></p></div></div></div> 9.简述Apriori算法中的频繁集与关联规则 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-318da371a77db5951664b6d08bc598d426" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-318da371a77db5951664b6d08bc598d426" class="collapse collapse-content"><p></p> **频繁集**:是指经常在一起购买的物品集合。 **关联规则**:是频繁集中物品之间的影响规则。 <p></p></div></div></div> 10.简述决策树方法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b7a861e780904e33468be91bb55ff70a22" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b7a861e780904e33468be91bb55ff70a22" class="collapse collapse-content"><p></p> 决策树方法是人们把决策问题的自然状态或条件出现的概率、行动方案、益损值、预测结果等,用一个树状图表示出来,并利用该图反映出人们思考、预测、决策的全过程。 <p></p></div></div></div> 11.决策树有那些步骤? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3e8ad2b1979bd27bdae1c7b8c465997f63" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3e8ad2b1979bd27bdae1c7b8c465997f63" class="collapse collapse-content"><p></p> 1. 特征选择 2. 决策树的生成 3. 决策树的修剪 <p></p></div></div></div> 12.简述数据可视化 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-05f468a687e9ace8912f2173f639cfc558" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-05f468a687e9ace8912f2173f639cfc558" class="collapse collapse-content"><p></p> 数据可视化是指利用计算机图形学等技术,将数据通过图形化的方式展示出来,直观地表达数据中蕴含的信息、规律和逻辑,便于用户进行观察和理解。 <p></p></div></div></div> 13.简述大数据平台一般处理流程 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4c0fbba705a29823b90bd7f707ff0d0292" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4c0fbba705a29823b90bd7f707ff0d0292" class="collapse collapse-content"><p></p> 1. 数据采集 2. 数据存储 3. 数据处理 4. 数据展现 <p></p></div></div></div> 14.简述传统商业数据 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b98dc6561ca8ad8e528af843dcb67e2c30" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b98dc6561ca8ad8e528af843dcb67e2c30" class="collapse collapse-content"><p></p> 传统商业数据指来自于各类企业ERP系统、各种POS终端及网上支付系统等业务系统的数据,包括审计和日志等自动生成的信息。 <p></p></div></div></div> 15.简述互联网数据 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-52f776e08a684420dfe97d236b38f4d299" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-52f776e08a684420dfe97d236b38f4d299" class="collapse collapse-content"><p></p> 互联网数据是指网络空间交互过程中产生的大量数据 <p></p></div></div></div> 16.为什么要进行数据预处理? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-ca158d14dc49843deddec2ae46d4ca9784" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-ca158d14dc49843deddec2ae46d4ca9784" class="collapse collapse-content"><p></p> 由于庞大的数据库和繁多的异构数据源,当今现实世界的数据库极易受噪声、默认值和不一致数据的侵扰,低质量的数据将导致低质量的挖掘结果,故需要进行数据预处理。 <p></p></div></div></div> 17.简述大数据预处理的方法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a303e37bb3dc86a534a3ae522788dda531" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a303e37bb3dc86a534a3ae522788dda531" class="collapse collapse-content"><p></p> 1. 数据清洗 2. 数据集成 3. 数据变换 4. 数据归约 <p></p></div></div></div> 18.简述数据清洗目的 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f5c6d5dbc24160937976252211fcde6258" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f5c6d5dbc24160937976252211fcde6258" class="collapse collapse-content"><p></p> 数据清洗目的在于纠正存在的错误,并提供数据一致性。 <p></p></div></div></div> 19.数据规范化有那些算法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5647cadf526dd768f2e1cc6f5419019318" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5647cadf526dd768f2e1cc6f5419019318" class="collapse collapse-content"><p></p> 1. 归一化 2. 标准化 3. 中心化 <p></p></div></div></div> 20.大数据存储面临那些挑战 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-afe1d2b70bd84951c2e914edb8f8c06a87" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-afe1d2b70bd84951c2e914edb8f8c06a87" class="collapse collapse-content"><p></p> 1. 容量问题 2. 延迟问题 3. 安全问题 4. 成本问题 5. 数据的积累 6. 灵活性 7. 应用感知 <p></p></div></div></div> # 主观题 ### 1.计算准确率与召回率 计算信息检索系统评价指标,一个是准确率,一个是召回率如下图的检索结果,请计算此系统的准确率和召回率。 | | 实际上相关的文档 | 实际上不相关的文档 | | -------------------------------------- | ------------------ | -------------------- | | 检索系统返回的、判断为相关的文档 | 15 | 3 | | 检索系统不返回的、判断为不相关的文档 | 6 | 3 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-89e419e73926952bc351b0be2d88421f10" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-89e419e73926952bc351b0be2d88421f10" class="collapse collapse-content"><p></p> 没什么好分析的看图上公式:  <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-6966cbab950ff932d7a9db66210872dd92" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-6966cbab950ff932d7a9db66210872dd92" class="collapse collapse-content"><p></p> 准确率 = $\frac{15+3}{15+3+6+3}\quad$ = $\frac{2}{3}\quad$ 召回率 = $\frac{15}{15+6}\quad$ = $\frac{5}{7}\quad$ <p></p></div></div></div> ### 2.简单线性回归 这是一家4S店投放的广告和销售量的记录表,假设投放的广告量为15,用简单线性回归模型预测销售是多少? | 广告量 | 销售量 | | -------- | -------- | | 1 | 6 | | 7 | 21 | | 3 | 10 | | 5 | 18 | | 9 | 25 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-8a526a4dccb274f8c3f2d19c3464eaeb15" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-8a526a4dccb274f8c3f2d19c3464eaeb15" class="collapse collapse-content"><p></p> 带入简单线性回归方程即可: $$ y=kx+b $$ $$ k=\frac{\sum_{i=1}^{n}(x_i-\overline{\text{x}} )(y_i-\overline{\text{y}} )}{\sum_{i=1}^{n}(x_i-\overline{\text{x}} )^2}\quad $$ $$ b=\overline{\text{y}}-b_1\overline{\text{x}} $$ <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5a0b3c43375c598c3c93c03c8372e45757" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5a0b3c43375c598c3c93c03c8372e45757" class="collapse collapse-content"><p></p> ```matlab 答:由题意,假设广告量为自变量x,销售量为应变量y。 所以可设线性回归方程 y = kx+b 其中k为斜率,b为截距。 根据记录表的数据可知: 平均广告量x` = (1 + 7 + 3 + 5 + 9)/5 = 5 平均销售量y` = (6 + 21 + 10 + 18 + 25)/5 = 16 所以 k = [(1-5)*(6-16)+(7-5)*(21-16)+(3-5)*(10-16)+(5-5)*(18-16)+(9-5)*(25-16)]/[(1-5)^2+(7-5)^2+(3-5)^2+(5-5)^2+(9-5)^2] = (-4*(-10)+2*5+(-2)*(-6)+4*9)/(16+4+4+16) = 98/40 = 2.45 由 b = y` - kx` = 16 - 2.45*5 = 3.75 得到回归方程为 y = 2.45x + 3.75 所以当广告量为 15 时 2.45*15 + 3.75 = 40.5 ``` <p></p></div></div></div> ### 3.K近邻(KNN)算法 假设下表是判断糖尿病的训练集,请用K近邻(KNN)算法来预测第8个用户是否患病,若k=5, 采用欧式距离为距离度量,请写出预测结果。 | 编号 | k | u | class | | ------ | --- | --- | ------- | | 1 | 2 | 3 | 0 | | 2 | 4 | 4 | 1 | | 3 | 6 | 2 | 1 | | 4 | 1 | 4 | 1 | | 5 | 3 | 7 | 0 | | 6 | 5 | 2 | 1 | | 7 | 6 | 4 | 0 | | 8 | 3 | 8 | ? | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-36b909ed4a0d4c32df62116c05b262553" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-36b909ed4a0d4c32df62116c05b262553" class="collapse collapse-content"><p></p> 首先给到需要使用到的公式: * 欧式距离公式:$d=\sqrt{\sum_{j=1}^n(x_j-y_j)^2}\quad$ $$ 比如说A(a_1,b_1,c_1),B(a_2,b_2,c_2): $$ $$ d=\sqrt{(a_1-a_2)^2+(b_1-b_2)^2+(c_1-c_2)^2}\quad $$ - 曼哈顿距离公式:$d=\sum_{j=1}^n|x_j-y_j|$ $$ 比如说A(a_1,b_1,c_1),B(a_2,b_2,c_2): $$ $$ d=|a_1-a_2|+|b_1-b_2|+|c_1-c_2| $$ KNN计算步骤: 1. 计算所有数据与预测数据的距离 2. 按照距离从近到远进行排序 3. 选取k组(一般题目会给出k值)中类别较多的一组 4. 将需要预测的一组归类为广告选取的类别 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-ce9510f54c194922380dee2876a132cb92" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-ce9510f54c194922380dee2876a132cb92" class="collapse collapse-content"><p></p> ```matlab 答:由题意,k=5,使用欧氏距离。 各个编号与第8个用户计算得到的距离为: dis[1] = sqrt((2-3)^2+(3-8)^2) = sqrt(26) dis[2] = sqrt((4-3)^2+(4-8)^2) = sqrt(17) dis[3] = sqrt((6-3)^2+(2-8)^2) = sqrt(45) dis[4] = sqrt((1-3)^2+(4-8)^2) = sqrt(18) dis[5] = sqrt((3-3)^2+(7-8)^2) = sqrt(1) dis[6] = sqrt((5-3)^2+(2-8)^2) = sqrt(40) dis[7] = sqrt((6-3)^2+(4-8)^2) = sqrt(25) 注:其中sqrt为根号 由k=5,选取5个与第8个用户最近的邻居。 选取结果为:dis[5]、dis[2]、dis[4]、dis[7]、dis[1] 其中 class 为 0 的令居有dis[5]、dis[7]、dis[1] 一共3个。 class 为 1 的令居有dis[2]、dis[2] 一共2个。 综上所述:第8个用户class 为 0,即第8个用户不患病。 ``` <p></p></div></div></div> ### 4.计算支持度、置信度、提高度 有购物数据集如下,请计算支持度S(面包- -> 牛奶),及置信度C(面包一-> 牛奶),提高度L(面包一-> 牛奶)。 | 购物记录 | 商品 | | ---------- | -------------------------------- | | 1 | 啤酒、面包、薯条、阿司匹林 | | 2 | 尿布、面包、葡萄酒、米糊、牛奶 | | 3 | 雪碧、薯条、牛奶 | | 4 | 啤酒、牛奶、冰淇淋、薯条 | | 5 | 雪碧、咖啡、牛奶、面包、啤酒 | | 6 | 啤酒、薯条 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b0fd2efce4c6e5246cdcabdf2fdbaee937" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b0fd2efce4c6e5246cdcabdf2fdbaee937" class="collapse collapse-content"><p></p> 废话少说,上公式 <div class="tip inlineBlock info"> **1.支持度** </div> - 支持度$S(A→B)$指的是<span style='color:#A52A2A'>**A与B同时出现的概率**</span> 计算公式: $S(A→B)=\frac{N(A\&B)}{N}\quad$ <div class="tip inlineBlock info"> **2.置信度** </div> - 置信度$C(A→B)$指的是<span style='color:#A52A2A'>**A出现的情况下B同时出现的概率**</span> 计算公式: $C(A→B)=\frac{N(A\&B)}{N(A)}\quad$ <div class="tip inlineBlock info"> **3.提高度** </div> - 提高度$L(A→B)$指的是<span style='color:#A52A2A'>**A出现的情况对于B出现的影响度**</span> 计算公式: $L(A→B)=\frac{C(A→B)}{S(B)}\quad$ <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-59d8884bea700171a5bee168fc92cfee32" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-59d8884bea700171a5bee168fc92cfee32" class="collapse collapse-content"><p></p> ```matlab 答:根据购物数据集 S(面包->牛奶) = N(面包&牛奶)/N = 2/6 = 1/3 C(面包->牛奶) = N(面包&牛奶)/N(面包) = 2/3 S(牛奶) = 4/6 = 2/3 L(面包->牛奶) = C(面包->牛奶) /S(牛奶)=(2/3)/(2/3)=1 ``` <p></p></div></div></div> ### 5.K-means聚类算法 假设采用K-means聚类算法将下表的用户分成两类,请描述K-means聚类算法步骤,距离函数自由选定。 | 用户 | A | B | C | | ------ | --- | --- | --- | | 1 | 1 | 1 | 2 | | 2 | 2 | 4 | 1 | | 3 | 4 | 6 | 7 | | 4 | 3 | 1 | 3 | | 5 | 1 | 2 | 1 | | 6 | 6 | 3 | 2 | | 7 | 5 | 5 | 4 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0c2e1d5b7f50e599f5e47240723d310674" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0c2e1d5b7f50e599f5e47240723d310674" class="collapse collapse-content"><p></p> 下面视频教程: <iframe class="iframe_video" src="https://player.bilibili.com/player.html?aid=797539164&bvid=BV1py4y1r7DN&cid=249834109&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe> <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0ba9967b5d904912ba9865d3acd7657b85" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0ba9967b5d904912ba9865d3acd7657b85" class="collapse collapse-content"><p></p> ```matlab 答:根据表中数据,我选择曼哈顿距离公式。 将这7个数据中,3、4设置为类心1、2。 根据曼哈顿距离公式其余各点到两个类心的距离 (1)到类心3的距离: G1[1] = |1-4| + |1-6| + |2-7| = 13 G1[2] = |2-4| + |4-6| + |1-7| = 10 G1[3] = |4-4| + |6-6| + |7-7| = 0 G1[4] = |3-4| + |1-6| + |3-7| = 10 G1[5] = |1-4| + |2-6| + |1-7| = 13 G1[6] = |6-4| + |3-6| + |2-7| = 10 G1[7] = |5-4| + |5-6| + |4-7| = 5 (2)到类心4的距离: G2[1] = |1-3| + |1-1| + |2-3| = 3 G2[2] = |2-3| + |4-1| + |1-3| = 6 G2[3] = |4-3| + |6-1| + |7-3| = 10 G2[4] = |3-3| + |1-1| + |3-3| = 0 G2[5] = |1-3| + |2-1| + |1-3| = 5 G2[6] = |6-3| + |3-1| + |2-3| = 6 G2[7] = |5-3| + |5-1| + |4-3| = 7 根据每个点距离最近的类 类1:3、7 类2:1、2、4、5、6 根据两个类所有的点,可得两类的平均坐标 A(4.5, 5.5, 5.5) B(2.6, 2.2, 1.8) 即获得新的两类坐标。 再计算各个点到两类的距离 (1)到类A的距离 dis_A[1] = |1-4.5| + |1-5.5| + |2-5.5| = 11.5 dis_A[2] = |2-4.5| + |4-5.5| + |1-5.5| = 8.5 dis_A[3] = |4-4.5| + |6-5.5| + |7-5.5| = 2.5 dis_A[4] = |3-4.5| + |6-5.5| + |7-5.5| = 3.5 dis_A[5] = |1-4.5| + |2-5.5| + |1-5.5| = 11.5 dis_A[6] = |6-4.5| + |3-5.5| + |2-5.5| = 8.5 dis_A[7] = |5-4.5| + |5-5.5| + |4-5.5| = 2.5 (2)到类B的距离 dis_B[1] = |1-2.6| + |1-2.2| + |2-1.8| = 3 dis_B[2] = |2-2.6| + |4-2.2| + |1-1.8| = 3.2 dis_B[3] = |4-2.6| + |6-2.2| + |7-1.8| = 11.8 dis_B[4] = |3-2.6| + |1-2.2| + |3-1.8| = 3.8 dis_B[5] = |1-2.6| + |2-2.2| + |1-1.8| = 3.6 dis_B[6] = |6-2.6| + |3-2.2| + |2-1.8| = 4.4 dis_B[7] = |5-2.6| + |5-2.2| + |4-1.8| = 8.4 根据每个点距离最近的类 类1:3、7 类2:1、2、4、5、6 由于关联点没有变化,所以停止计算。 所以按照题目要求,用户分为了 第一类:3、7 第二类:1、2、4、5、6 ``` <p></p></div></div></div> 最后修改:2021 年 12 月 25 日 © 允许规范转载 打赏 赞赏作者 支付宝微信 赞 3 如果觉得我的文章对你有用,请随意赞赏