Loading... <div class="tip inlineBlock success"> 选择题看历次课前测试 </div> # 常见的有监督学习算法 * 支持向量机(Support Vector Machines * 线性回归(linear regression) * 逻辑回归(logistic regression) * 朴素贝叶斯(naive Bayes) * 线性判别分析(linear discriminant analysis) * 决策树(decision trees) * K-近邻(k-nearest neighbor algorithm) * Multilayer perceptron <div class="tip inlineBlock info"> K-means算法为无监督学习算法 </div> # 经典案例分类 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0b673f80afa91e61795ff84aa7b571cb89" aria-expanded="true"><div class="accordion-toggle"><span style="">关联分析</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0b673f80afa91e61795ff84aa7b571cb89" class="collapse collapse-content"><p></p> * **购物篮分析法** * **亚马逊的个性化推荐** * 潘多拉音乐组计划 * 塔吉特的大数据营销 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-d01af0928ea7ea38480d5aaa0c19fd6f3" aria-expanded="true"><div class="accordion-toggle"><span style="">趋势预测</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-d01af0928ea7ea38480d5aaa0c19fd6f3" class="collapse collapse-content"><p></p> - **谷歌流感趋势** - **奥斯卡预测** - Farecast案例详析 - Decide案例详析 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-2eccbe666a578c608f6d8e524fa1794660" aria-expanded="true"><div class="accordion-toggle"><span style="">决策支持</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-2eccbe666a578c608f6d8e524fa1794660" class="collapse collapse-content"><p></p> - **美国总统大选** - 《纸牌屋》 <p></p></div></div></div> # 填空题 1.数据科学是一门通过 **填空 1** 来获取 **填空 2** 的科学。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4e0ae4f6d0395724c526eab3ffcae7dc20" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4e0ae4f6d0395724c526eab3ffcae7dc20" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**系统性研究** 填空 2:**与数据相关的知识体系** <p></p></div></div></div> 2.可视化领域包括三个主要分支,分别是 **填空 1** 、 **填空 2** 以及 **填空 3**等。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-abb0d91ab80de6b14ca296ed0076b9e514" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-abb0d91ab80de6b14ca296ed0076b9e514" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**科学可视化** 填空 2:**信息可视化** 填空 3:**可视分析** <p></p></div></div></div> 3.大数据能被用于打击罪犯的特征有真实性、**填空 1**、**填空 2** 。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-84b1ff62b80095fe2e23d7ae3f8f5da956" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-84b1ff62b80095fe2e23d7ae3f8f5da956" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**多样性** 填空 2:**速度** PS:大数据的4V特征: 1. 体量 2. 多样性 3. 真实性(价值性) 4. 速度 <p></p></div></div></div> 4.谷歌流感监测是大数据在 **填空 1** 方面的应用。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5598daf49d3e41d0f3731aa5981a57fc28" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5598daf49d3e41d0f3731aa5981a57fc28" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**趋势预测** PS:大数据的应用 1. 预测 2. 推荐 3. 商业情报分析 4. 科学研究 <p></p></div></div></div> 5.有训练样本,有标注的机器学习称为 **填空 1** 学习,而有训练样本无标注的机器学习称为 **填空 2** 学习。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-e7aa6ae5d6660738e76efb5911e7d21f46" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-e7aa6ae5d6660738e76efb5911e7d21f46" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**有监督** 填空 2:**无监督** <p></p></div></div></div> 6.列举两个基于`python`的中文处理工具包 **填空 1** 、 **填空 2**。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-64ebdb51aaacf8c74e8945c61f67354b39" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-64ebdb51aaacf8c74e8945c61f67354b39" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**jieba** 填空 2:****thulac**** PS: * jieba【推荐】 * thulac(清华大学) :处理utf-8编码 * SnowNLP:处理unicode编码,使用时需要decode/unicode,包括情感分析部分的分词处理 * pynlpir * CoreNLP * pyNLP(哈工大) * NLPIP:可处理少数民族语言的分词包 <p></p></div></div></div> <div class="tip inlineBlock warning"> 以上是去年测试题 </div> --- <div class="tip inlineBlock info"> 以下是历次课前测试填空题 </div> 7.python中注释语句符号为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-aab8d412aef5ab5e8185b8369a7678d41" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-aab8d412aef5ab5e8185b8369a7678d41" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**#** <p></p></div></div></div> 8.python中 **填空 1** 数据结构能容纳不同类型的数据。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f2b66b77befa650c77ccc005f86b380c94" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f2b66b77befa650c77ccc005f86b380c94" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**list** <p></p></div></div></div> 9.python中else与if连写为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-1d354dd22a45df8aafb22df5c7aba68532" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-1d354dd22a45df8aafb22df5c7aba68532" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**elif** <p></p></div></div></div> 10.**填空 1**库是面向Python的机器学习软件包,它可以支持主流的有监督机器学习方法和无监督机器学习方法。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-ae3ea13904e6d6dc918515f872b174d650" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-ae3ea13904e6d6dc918515f872b174d650" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**scikit-learn** <p></p></div></div></div> 11.Anaconda中安装第三方库所用的命令可以在库名前加 **填空 1** 。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-041a74f32073ee271cc4ec8aa7f53de945" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-041a74f32073ee271cc4ec8aa7f53de945" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**pip install** <p></p></div></div></div> 12.**填空 1** 是 Python 语言的一个扩展程序库,支持大量的维度数组与矩阵运算,此外也针对数组运算提供大量的数学函数库。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-246118fe51cc13b0e4f2a9a23dc0c79046" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-246118fe51cc13b0e4f2a9a23dc0c79046" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**Numpy** <p></p></div></div></div> 13.在python中想使用numpy库时,可用 **填空 1** 命令装载它。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-96482bd5975a843baad45a14822ee16267" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-96482bd5975a843baad45a14822ee16267" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**import numpy** <p></p></div></div></div> 14.有监督学习算法与无监督学习算法的不同是,必须对训练的样本给出 **填空 1**。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4fe9236e6681cab526b1a4b72a54055365" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4fe9236e6681cab526b1a4b72a54055365" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**标签** <p></p></div></div></div> 15.机器学习框架中,首先采集、预处理数据,再针对训练集进行 **填空 1** 的设计,确定其参数,再用它对测试集进行预测。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b08b552b40aaea515189aa905d2d475213" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b08b552b40aaea515189aa905d2d475213" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**模型** <p></p></div></div></div> 16.**填空 1**和 **填空 2** 是目前常见的大数据处理平台。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-46449878fdcaab42eb899b35acde26ff40" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-46449878fdcaab42eb899b35acde26ff40" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**Hadoop** 填空 2:**Spark** <p></p></div></div></div> 17.python中 **填空 1** 形式用来表示下一条语句是在上一条语句的结构里。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f7c434fd271d2afd0d14c34e3e7d31c632" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f7c434fd271d2afd0d14c34e3e7d31c632" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**缩进** <p></p></div></div></div> 18.购物篮分析法约分为两类 **填空 1** 和 **填空 2** 购物篮分析法,两者之所以不同思路的根本原因是因为 **填空 3** 差别很大。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3e7e7e09e842a3797286edba5b38507f42" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3e7e7e09e842a3797286edba5b38507f42" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**美式** 填空 2:**日式** 填空 3:**营业面积** <p></p></div></div></div> 19.采用Apriori算法进行相关性分析时,需要进行两个步骤 **填空 1** 和 **填空 2** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-46b9ff1dcff0e35c3bdc34ce905d1d4697" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-46b9ff1dcff0e35c3bdc34ce905d1d4697" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**寻找频繁集** 填空 2:**挖掘关联规则** <p></p></div></div></div> 20.可视化的目的,是把 **填空1** ,首要的原则是 **填空 2** 和 **填空 3** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-44aad9179806dab45d50d1a7dce8c6230" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-44aad9179806dab45d50d1a7dce8c6230" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**把复杂数据有效地展示出来** 填空 2:**准确** 填空 3:**清晰** <p></p></div></div></div> 21.目前基于视频的车流量检测主要有 **填空 1** 和 **填空 2** 两种方法。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-23add33fbd988409949ddfda2bea53f722" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-23add33fbd988409949ddfda2bea53f722" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**设置虚拟线圈** 填空 2:**车辆目标跟踪** <p></p></div></div></div> 22.交通流预测分为 **填空 1** 交通流预测和 **填空 2** 交通流预测。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-26ed2e1ad50b628177c80d33cfe4ea35100" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-26ed2e1ad50b628177c80d33cfe4ea35100" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**短时** 填空 2:**中长时** <p></p></div></div></div> 23.有监督学习是指对 **填空 1** 的数据进行建模,无监督学习是对 **填空 2** 的数据进行建模。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-61c62450ba60d5f6b05b6c40151c582110" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-61c62450ba60d5f6b05b6c40151c582110" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**已标注** 填空 2:**无标注** <p></p></div></div></div> 24.信息系统的评价有两个指标: **填空 1** 、 **填空 2** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-9f2ee6e1fe2c409b84ce7c144fcea68392" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-9f2ee6e1fe2c409b84ce7c144fcea68392" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**准确率** 填空 2:**召回率** <p></p></div></div></div> 25.文本检索中,向量空间模型在求两个代表文本信息的向量的距离时,采用的策略是求两向量的 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-17e2ad6abc6df92e6cd7f68d4ea2118512" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-17e2ad6abc6df92e6cd7f68d4ea2118512" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**夹角余弦值** <p></p></div></div></div> 26.文本分析前要做预处理,需对文档里的文本做 **填空 1** 分割、 **填空 2** 切分。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-1b07e95aaf7e3c0a46e4fa477aef353b23" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-1b07e95aaf7e3c0a46e4fa477aef353b23" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**句子** 填空 2:**词** <p></p></div></div></div> 27.文本数据也可可视化,常见的方法为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-d2a65b74499c3f58b6bcca0abd600ab34" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-d2a65b74499c3f58b6bcca0abd600ab34" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**词云** <p></p></div></div></div> 28.把文档的内容简要概括,称为文档 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-1efb63aeb77d2579b90e5eb6089509d487" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-1efb63aeb77d2579b90e5eb6089509d487" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**摘要** <p></p></div></div></div> 29.字符串可以用 **填空 1** 或者 **填空 2** 括起来。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-16349f281df165cc2ab809c17242d37893" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-16349f281df165cc2ab809c17242d37893" class="collapse collapse-content"><p></p> 填空 1:**单引号** 填空 2:**双引号** <p></p></div></div></div> # 简答题 1.请简述什么是大数据傲慢: <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-665b88fff95b8efc778621c5bd2a3d0519" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-665b88fff95b8efc778621c5bd2a3d0519" class="collapse collapse-content"><p></p> 以为利用大数据,就能完全忽略和取代传统数据收集方法。 <p></p></div></div></div> 2.趋势预测与关联分析的不同之处在于? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-02c03ed6244c1a6fc18136639de506119" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-02c03ed6244c1a6fc18136639de506119" class="collapse collapse-content"><p></p> 前者着重数据之间的相关关系建模,后者着重挖掘数据之间相关关系的存在 <p></p></div></div></div> 3.简述基于内容的推荐算法思路: <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-406c55ce67c924e39075768d68599d1f25" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-406c55ce67c924e39075768d68599d1f25" class="collapse collapse-content"><p></p> 根据物品的内容来分类,类似的物品间进行推荐 <p></p></div></div></div> 4.简述数据清洗阶段平滑噪声数据常见的三种方法? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-6f3833bbc7e3dbc8a5d966e877d839c828" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-6f3833bbc7e3dbc8a5d966e877d839c828" class="collapse collapse-content"><p></p> 1. 分箱 2. 回归 3. 聚类 <p></p></div></div></div> 5.试述购物篮分析法有几种分类及所它们所应用的场所。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-500b2b30ce3edb0d3be876352666fa4459" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-500b2b30ce3edb0d3be876352666fa4459" class="collapse collapse-content"><p></p> 购物篮分析法有2种分类。 - 第一类是**美式购物篮分析法**,适用于**卖场面积大**、**商品种类多**、**商品陈列区域距离相差大**的卖场,类似于沃尔玛; - 第二类是**日式购物篮分析法**,适用于**营业面积小**,**商品种类少**、**商品陈列区域距离相差小**的卖场,类似于便利店。 <p></p></div></div></div> 6.简述什么是最佳拟合线。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-ca391c9841f037957b6057b8c20b835a34" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-ca391c9841f037957b6057b8c20b835a34" class="collapse collapse-content"><p></p> 最佳拟合线指在散点图上绘制一条直线,使得这条直线尽可能通过数据点。 <p></p></div></div></div> 7.趋势预测的原理是? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-07cb6f2699d534e6fd35c602a076b1f819" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-07cb6f2699d534e6fd35c602a076b1f819" class="collapse collapse-content"><p></p> 收集与要预测的变量可能相关的数据,建立预测模型 <p></p></div></div></div> 8.简述基于人口统计学的推荐算法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-abb55c7f3753d3e68e44adb2b8d216ed6" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-abb55c7f3753d3e68e44adb2b8d216ed6" class="collapse collapse-content"><p></p> 给用户来进行分类,根据用户的喜好推荐给相似的用户。 <p></p></div></div></div> 9.简述Apriori算法中的频繁集与关联规则 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-7af8035d1f97356a00c6d8be80286d3e75" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-7af8035d1f97356a00c6d8be80286d3e75" class="collapse collapse-content"><p></p> **频繁集**:是指经常在一起购买的物品集合。 **关联规则**:是频繁集中物品之间的影响规则。 <p></p></div></div></div> 10.简述决策树方法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-651db1f78d02747ebd5632dba8de24b046" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-651db1f78d02747ebd5632dba8de24b046" class="collapse collapse-content"><p></p> 决策树方法是人们把决策问题的自然状态或条件出现的概率、行动方案、益损值、预测结果等,用一个树状图表示出来,并利用该图反映出人们思考、预测、决策的全过程。 <p></p></div></div></div> 11.决策树有那些步骤? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-d553bc88ae5ea901198ee11fedf0fe6994" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-d553bc88ae5ea901198ee11fedf0fe6994" class="collapse collapse-content"><p></p> 1. 特征选择 2. 决策树的生成 3. 决策树的修剪 <p></p></div></div></div> 12.简述数据可视化 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-fb8894dc0ea0028574ebe30c601264b918" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-fb8894dc0ea0028574ebe30c601264b918" class="collapse collapse-content"><p></p> 数据可视化是指利用计算机图形学等技术,将数据通过图形化的方式展示出来,直观地表达数据中蕴含的信息、规律和逻辑,便于用户进行观察和理解。 <p></p></div></div></div> 13.简述大数据平台一般处理流程 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4d5c6795d400f07c3a66cad314b3fbee76" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4d5c6795d400f07c3a66cad314b3fbee76" class="collapse collapse-content"><p></p> 1. 数据采集 2. 数据存储 3. 数据处理 4. 数据展现 <p></p></div></div></div> 14.简述传统商业数据 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b92a965ddcc948ae65f081a5246d294b7" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b92a965ddcc948ae65f081a5246d294b7" class="collapse collapse-content"><p></p> 传统商业数据指来自于各类企业ERP系统、各种POS终端及网上支付系统等业务系统的数据,包括审计和日志等自动生成的信息。 <p></p></div></div></div> 15.简述互联网数据 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-fbf0a740617c1a08c3042579ec53a82583" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-fbf0a740617c1a08c3042579ec53a82583" class="collapse collapse-content"><p></p> 互联网数据是指网络空间交互过程中产生的大量数据 <p></p></div></div></div> 16.为什么要进行数据预处理? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-457dc606df076539692ed424b356c0cb55" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-457dc606df076539692ed424b356c0cb55" class="collapse collapse-content"><p></p> 由于庞大的数据库和繁多的异构数据源,当今现实世界的数据库极易受噪声、默认值和不一致数据的侵扰,低质量的数据将导致低质量的挖掘结果,故需要进行数据预处理。 <p></p></div></div></div> 17.简述大数据预处理的方法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-60655a1d845aacac90918acccf893d8098" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-60655a1d845aacac90918acccf893d8098" class="collapse collapse-content"><p></p> 1. 数据清洗 2. 数据集成 3. 数据变换 4. 数据归约 <p></p></div></div></div> 18.简述数据清洗目的 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-8543645b74198422b85b6526723d983656" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-8543645b74198422b85b6526723d983656" class="collapse collapse-content"><p></p> 数据清洗目的在于纠正存在的错误,并提供数据一致性。 <p></p></div></div></div> 19.数据规范化有那些算法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-67aeed9098f8a4291a8dbf5b9ec03b7e95" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-67aeed9098f8a4291a8dbf5b9ec03b7e95" class="collapse collapse-content"><p></p> 1. 归一化 2. 标准化 3. 中心化 <p></p></div></div></div> 20.大数据存储面临那些挑战 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-01bf7fd96a044b03aaa881c1cd6a4fe133" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-01bf7fd96a044b03aaa881c1cd6a4fe133" class="collapse collapse-content"><p></p> 1. 容量问题 2. 延迟问题 3. 安全问题 4. 成本问题 5. 数据的积累 6. 灵活性 7. 应用感知 <p></p></div></div></div> # 主观题 ### 1.计算准确率与召回率 计算信息检索系统评价指标,一个是准确率,一个是召回率如下图的检索结果,请计算此系统的准确率和召回率。 | | 实际上相关的文档 | 实际上不相关的文档 | | -------------------------------------- | ------------------ | -------------------- | | 检索系统返回的、判断为相关的文档 | 15 | 3 | | 检索系统不返回的、判断为不相关的文档 | 6 | 3 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-305375aa2a77b6a530e9d340e3a23e6380" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-305375aa2a77b6a530e9d340e3a23e6380" class="collapse collapse-content"><p></p> 没什么好分析的看图上公式: ![准确率、召回率.png](https://blog.fivk.cn/usr/uploads/2021/11/2178677838.png) <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-d6ba95d4d5b505ca5638d4acdc4e5aa854" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-d6ba95d4d5b505ca5638d4acdc4e5aa854" class="collapse collapse-content"><p></p> 准确率 = $\frac{15+3}{15+3+6+3}\quad$ = $\frac{2}{3}\quad$ 召回率 = $\frac{15}{15+6}\quad$ = $\frac{5}{7}\quad$ <p></p></div></div></div> ### 2.简单线性回归 这是一家4S店投放的广告和销售量的记录表,假设投放的广告量为15,用简单线性回归模型预测销售是多少? | 广告量 | 销售量 | | -------- | -------- | | 1 | 6 | | 7 | 21 | | 3 | 10 | | 5 | 18 | | 9 | 25 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3eb53ee458ce2b3edd19c029735c48dd59" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3eb53ee458ce2b3edd19c029735c48dd59" class="collapse collapse-content"><p></p> 带入简单线性回归方程即可: $$ y=kx+b $$ $$ k=\frac{\sum_{i=1}^{n}(x_i-\overline{\text{x}} )(y_i-\overline{\text{y}} )}{\sum_{i=1}^{n}(x_i-\overline{\text{x}} )^2}\quad $$ $$ b=\overline{\text{y}}-b_1\overline{\text{x}} $$ <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-49bb0595441520253af9f04bea30b9a944" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-49bb0595441520253af9f04bea30b9a944" class="collapse collapse-content"><p></p> ```matlab 答:由题意,假设广告量为自变量x,销售量为应变量y。 所以可设线性回归方程 y = kx+b 其中k为斜率,b为截距。 根据记录表的数据可知: 平均广告量x` = (1 + 7 + 3 + 5 + 9)/5 = 5 平均销售量y` = (6 + 21 + 10 + 18 + 25)/5 = 16 所以 k = [(1-5)*(6-16)+(7-5)*(21-16)+(3-5)*(10-16)+(5-5)*(18-16)+(9-5)*(25-16)]/[(1-5)^2+(7-5)^2+(3-5)^2+(5-5)^2+(9-5)^2] = (-4*(-10)+2*5+(-2)*(-6)+4*9)/(16+4+4+16) = 98/40 = 2.45 由 b = y` - kx` = 16 - 2.45*5 = 3.75 得到回归方程为 y = 2.45x + 3.75 所以当广告量为 15 时 2.45*15 + 3.75 = 40.5 ``` <p></p></div></div></div> ### 3.K近邻(KNN)算法 假设下表是判断糖尿病的训练集,请用K近邻(KNN)算法来预测第8个用户是否患病,若k=5, 采用欧式距离为距离度量,请写出预测结果。 | 编号 | k | u | class | | ------ | --- | --- | ------- | | 1 | 2 | 3 | 0 | | 2 | 4 | 4 | 1 | | 3 | 6 | 2 | 1 | | 4 | 1 | 4 | 1 | | 5 | 3 | 7 | 0 | | 6 | 5 | 2 | 1 | | 7 | 6 | 4 | 0 | | 8 | 3 | 8 | ? | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-94f49421a17f03751f26a530f2b9873f47" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-94f49421a17f03751f26a530f2b9873f47" class="collapse collapse-content"><p></p> 首先给到需要使用到的公式: * 欧式距离公式:$d=\sqrt{\sum_{j=1}^n(x_j-y_j)^2}\quad$ $$ 比如说A(a_1,b_1,c_1),B(a_2,b_2,c_2): $$ $$ d=\sqrt{(a_1-a_2)^2+(b_1-b_2)^2+(c_1-c_2)^2}\quad $$ - 曼哈顿距离公式:$d=\sum_{j=1}^n|x_j-y_j|$ $$ 比如说A(a_1,b_1,c_1),B(a_2,b_2,c_2): $$ $$ d=|a_1-a_2|+|b_1-b_2|+|c_1-c_2| $$ KNN计算步骤: 1. 计算所有数据与预测数据的距离 2. 按照距离从近到远进行排序 3. 选取k组(一般题目会给出k值)中类别较多的一组 4. 将需要预测的一组归类为广告选取的类别 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-206e1159a0916ea0ea31ba3e5592ff0e3" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-206e1159a0916ea0ea31ba3e5592ff0e3" class="collapse collapse-content"><p></p> ```matlab 答:由题意,k=5,使用欧氏距离。 各个编号与第8个用户计算得到的距离为: dis[1] = sqrt((2-3)^2+(3-8)^2) = sqrt(26) dis[2] = sqrt((4-3)^2+(4-8)^2) = sqrt(17) dis[3] = sqrt((6-3)^2+(2-8)^2) = sqrt(45) dis[4] = sqrt((1-3)^2+(4-8)^2) = sqrt(18) dis[5] = sqrt((3-3)^2+(7-8)^2) = sqrt(1) dis[6] = sqrt((5-3)^2+(2-8)^2) = sqrt(40) dis[7] = sqrt((6-3)^2+(4-8)^2) = sqrt(25) 注:其中sqrt为根号 由k=5,选取5个与第8个用户最近的邻居。 选取结果为:dis[5]、dis[2]、dis[4]、dis[7]、dis[1] 其中 class 为 0 的令居有dis[5]、dis[7]、dis[1] 一共3个。 class 为 1 的令居有dis[2]、dis[2] 一共2个。 综上所述:第8个用户class 为 0,即第8个用户不患病。 ``` <p></p></div></div></div> ### 4.计算支持度、置信度、提高度 有购物数据集如下,请计算支持度S(面包- -> 牛奶),及置信度C(面包一-> 牛奶),提高度L(面包一-> 牛奶)。 | 购物记录 | 商品 | | ---------- | -------------------------------- | | 1 | 啤酒、面包、薯条、阿司匹林 | | 2 | 尿布、面包、葡萄酒、米糊、牛奶 | | 3 | 雪碧、薯条、牛奶 | | 4 | 啤酒、牛奶、冰淇淋、薯条 | | 5 | 雪碧、咖啡、牛奶、面包、啤酒 | | 6 | 啤酒、薯条 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-969be83eb8a9c3bc757243798ea8d54257" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-969be83eb8a9c3bc757243798ea8d54257" class="collapse collapse-content"><p></p> 废话少说,上公式 <div class="tip inlineBlock info"> **1.支持度** </div> - 支持度$S(A→B)$指的是<span style='color:#A52A2A'>**A与B同时出现的概率**</span> 计算公式: $S(A→B)=\frac{N(A\&B)}{N}\quad$ <div class="tip inlineBlock info"> **2.置信度** </div> - 置信度$C(A→B)$指的是<span style='color:#A52A2A'>**A出现的情况下B同时出现的概率**</span> 计算公式: $C(A→B)=\frac{N(A\&B)}{N(A)}\quad$ <div class="tip inlineBlock info"> **3.提高度** </div> - 提高度$L(A→B)$指的是<span style='color:#A52A2A'>**A出现的情况对于B出现的影响度**</span> 计算公式: $L(A→B)=\frac{C(A→B)}{S(B)}\quad$ <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5c3880508c16408d386932fdf9f8bb040" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5c3880508c16408d386932fdf9f8bb040" class="collapse collapse-content"><p></p> ```matlab 答:根据购物数据集 S(面包->牛奶) = N(面包&牛奶)/N = 2/6 = 1/3 C(面包->牛奶) = N(面包&牛奶)/N(面包) = 2/3 S(牛奶) = 4/6 = 2/3 L(面包->牛奶) = C(面包->牛奶) /S(牛奶)=(2/3)/(2/3)=1 ``` <p></p></div></div></div> ### 5.K-means聚类算法 假设采用K-means聚类算法将下表的用户分成两类,请描述K-means聚类算法步骤,距离函数自由选定。 | 用户 | A | B | C | | ------ | --- | --- | --- | | 1 | 1 | 1 | 2 | | 2 | 2 | 4 | 1 | | 3 | 4 | 6 | 7 | | 4 | 3 | 1 | 3 | | 5 | 1 | 2 | 1 | | 6 | 6 | 3 | 2 | | 7 | 5 | 5 | 4 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4a8d7eaf3d2234a7505395abefbfe60f90" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4a8d7eaf3d2234a7505395abefbfe60f90" class="collapse collapse-content"><p></p> 下面视频教程: <iframe class="iframe_video" src="https://player.bilibili.com/player.html?aid=797539164&bvid=BV1py4y1r7DN&cid=249834109&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe> <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a6c9db9f460abca6247ce229b9bcf05e68" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a6c9db9f460abca6247ce229b9bcf05e68" class="collapse collapse-content"><p></p> ```matlab 答:根据表中数据,我选择曼哈顿距离公式。 将这7个数据中,3、4设置为类心1、2。 根据曼哈顿距离公式其余各点到两个类心的距离 (1)到类心3的距离: G1[1] = |1-4| + |1-6| + |2-7| = 13 G1[2] = |2-4| + |4-6| + |1-7| = 10 G1[3] = |4-4| + |6-6| + |7-7| = 0 G1[4] = |3-4| + |1-6| + |3-7| = 10 G1[5] = |1-4| + |2-6| + |1-7| = 13 G1[6] = |6-4| + |3-6| + |2-7| = 10 G1[7] = |5-4| + |5-6| + |4-7| = 5 (2)到类心4的距离: G2[1] = |1-3| + |1-1| + |2-3| = 3 G2[2] = |2-3| + |4-1| + |1-3| = 6 G2[3] = |4-3| + |6-1| + |7-3| = 10 G2[4] = |3-3| + |1-1| + |3-3| = 0 G2[5] = |1-3| + |2-1| + |1-3| = 5 G2[6] = |6-3| + |3-1| + |2-3| = 6 G2[7] = |5-3| + |5-1| + |4-3| = 7 根据每个点距离最近的类 类1:3、7 类2:1、2、4、5、6 根据两个类所有的点,可得两类的平均坐标 A(4.5, 5.5, 5.5) B(2.6, 2.2, 1.8) 即获得新的两类坐标。 再计算各个点到两类的距离 (1)到类A的距离 dis_A[1] = |1-4.5| + |1-5.5| + |2-5.5| = 11.5 dis_A[2] = |2-4.5| + |4-5.5| + |1-5.5| = 8.5 dis_A[3] = |4-4.5| + |6-5.5| + |7-5.5| = 2.5 dis_A[4] = |3-4.5| + |6-5.5| + |7-5.5| = 3.5 dis_A[5] = |1-4.5| + |2-5.5| + |1-5.5| = 11.5 dis_A[6] = |6-4.5| + |3-5.5| + |2-5.5| = 8.5 dis_A[7] = |5-4.5| + |5-5.5| + |4-5.5| = 2.5 (2)到类B的距离 dis_B[1] = |1-2.6| + |1-2.2| + |2-1.8| = 3 dis_B[2] = |2-2.6| + |4-2.2| + |1-1.8| = 3.2 dis_B[3] = |4-2.6| + |6-2.2| + |7-1.8| = 11.8 dis_B[4] = |3-2.6| + |1-2.2| + |3-1.8| = 3.8 dis_B[5] = |1-2.6| + |2-2.2| + |1-1.8| = 3.6 dis_B[6] = |6-2.6| + |3-2.2| + |2-1.8| = 4.4 dis_B[7] = |5-2.6| + |5-2.2| + |4-1.8| = 8.4 根据每个点距离最近的类 类1:3、7 类2:1、2、4、5、6 由于关联点没有变化,所以停止计算。 所以按照题目要求,用户分为了 第一类:3、7 第二类:1、2、4、5、6 ``` <p></p></div></div></div> 最后修改:2021 年 12 月 25 日 © 允许规范转载 打赏 赞赏作者 支付宝微信 赞 3 如果觉得我的文章对你有用,请随意赞赏