Loading... <div class="tip inlineBlock success"> 选择题看历次课前测试 </div> # 常见的有监督学习算法 * 支持向量机(Support Vector Machines * 线性回归(linear regression) * 逻辑回归(logistic regression) * 朴素贝叶斯(naive Bayes) * 线性判别分析(linear discriminant analysis) * 决策树(decision trees) * K-近邻(k-nearest neighbor algorithm) * Multilayer perceptron <div class="tip inlineBlock info"> K-means算法为无监督学习算法 </div> # 经典案例分类 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-39643f4cae902bd8ab754dde8db818b064" aria-expanded="true"><div class="accordion-toggle"><span style="">关联分析</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-39643f4cae902bd8ab754dde8db818b064" class="collapse collapse-content"><p></p> * **购物篮分析法** * **亚马逊的个性化推荐** * 潘多拉音乐组计划 * 塔吉特的大数据营销 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-db0c96be9b8b3d41aa6d5a96cfa08f8259" aria-expanded="true"><div class="accordion-toggle"><span style="">趋势预测</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-db0c96be9b8b3d41aa6d5a96cfa08f8259" class="collapse collapse-content"><p></p> - **谷歌流感趋势** - **奥斯卡预测** - Farecast案例详析 - Decide案例详析 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a9922bfbcfe2a7ad39baebfd08d7429038" aria-expanded="true"><div class="accordion-toggle"><span style="">决策支持</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a9922bfbcfe2a7ad39baebfd08d7429038" class="collapse collapse-content"><p></p> - **美国总统大选** - 《纸牌屋》 <p></p></div></div></div> # 填空题 1.数据科学是一门通过 **填空 1** 来获取 **填空 2** 的科学。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b2ce99a4fea675961515dcc8284a6a8983" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b2ce99a4fea675961515dcc8284a6a8983" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**系统性研究** 填空 2:**与数据相关的知识体系** <p></p></div></div></div> 2.可视化领域包括三个主要分支,分别是 **填空 1** 、 **填空 2** 以及 **填空 3**等。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-d35ffd46af3219da6a52031bcbeadd5828" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-d35ffd46af3219da6a52031bcbeadd5828" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**科学可视化** 填空 2:**信息可视化** 填空 3:**可视分析** <p></p></div></div></div> 3.大数据能被用于打击罪犯的特征有真实性、**填空 1**、**填空 2** 。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-ae0d047f96c4411ce221a0f84ab84b2836" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-ae0d047f96c4411ce221a0f84ab84b2836" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**多样性** 填空 2:**速度** PS:大数据的4V特征: 1. 体量 2. 多样性 3. 真实性(价值性) 4. 速度 <p></p></div></div></div> 4.谷歌流感监测是大数据在 **填空 1** 方面的应用。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5b9db186eed64a3835525f0621a1604d82" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5b9db186eed64a3835525f0621a1604d82" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**趋势预测** PS:大数据的应用 1. 预测 2. 推荐 3. 商业情报分析 4. 科学研究 <p></p></div></div></div> 5.有训练样本,有标注的机器学习称为 **填空 1** 学习,而有训练样本无标注的机器学习称为 **填空 2** 学习。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-081996d0220146f5851cb121be347bde37" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-081996d0220146f5851cb121be347bde37" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**有监督** 填空 2:**无监督** <p></p></div></div></div> 6.列举两个基于`python`的中文处理工具包 **填空 1** 、 **填空 2**。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-89dda836df534af8cd820d2d96a9ac5829" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-89dda836df534af8cd820d2d96a9ac5829" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**jieba** 填空 2:****thulac**** PS: * jieba【推荐】 * thulac(清华大学) :处理utf-8编码 * SnowNLP:处理unicode编码,使用时需要decode/unicode,包括情感分析部分的分词处理 * pynlpir * CoreNLP * pyNLP(哈工大) * NLPIP:可处理少数民族语言的分词包 <p></p></div></div></div> <div class="tip inlineBlock warning"> 以上是去年测试题 </div> --- <div class="tip inlineBlock info"> 以下是历次课前测试填空题 </div> 7.python中注释语句符号为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a465c259759ecd187727ab72cb04165795" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a465c259759ecd187727ab72cb04165795" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**#** <p></p></div></div></div> 8.python中 **填空 1** 数据结构能容纳不同类型的数据。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3f43950e85e5d632254e49600aa3289930" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3f43950e85e5d632254e49600aa3289930" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**list** <p></p></div></div></div> 9.python中else与if连写为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-d200d05034d204acacace306b4bf984535" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-d200d05034d204acacace306b4bf984535" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**elif** <p></p></div></div></div> 10.**填空 1**库是面向Python的机器学习软件包,它可以支持主流的有监督机器学习方法和无监督机器学习方法。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-04dd6ca55f30d2e20cd979899e9194f727" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-04dd6ca55f30d2e20cd979899e9194f727" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**scikit-learn** <p></p></div></div></div> 11.Anaconda中安装第三方库所用的命令可以在库名前加 **填空 1** 。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-de7f2b1c24b537e897688ada974d225f51" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-de7f2b1c24b537e897688ada974d225f51" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**pip install** <p></p></div></div></div> 12.**填空 1** 是 Python 语言的一个扩展程序库,支持大量的维度数组与矩阵运算,此外也针对数组运算提供大量的数学函数库。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-492bd9c6415c5372db5fc43435d6912a13" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-492bd9c6415c5372db5fc43435d6912a13" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**Numpy** <p></p></div></div></div> 13.在python中想使用numpy库时,可用 **填空 1** 命令装载它。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-8d71171b7e97b03a8560a4088d5836c63" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-8d71171b7e97b03a8560a4088d5836c63" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**import numpy** <p></p></div></div></div> 14.有监督学习算法与无监督学习算法的不同是,必须对训练的样本给出 **填空 1**。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-57d64758ccf0dab8ee32afc00fcc8f6631" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-57d64758ccf0dab8ee32afc00fcc8f6631" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**标签** <p></p></div></div></div> 15.机器学习框架中,首先采集、预处理数据,再针对训练集进行 **填空 1** 的设计,确定其参数,再用它对测试集进行预测。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0d8bfcb5cdb608b17fe0b24ac0747c2814" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0d8bfcb5cdb608b17fe0b24ac0747c2814" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**模型** <p></p></div></div></div> 16.**填空 1**和 **填空 2** 是目前常见的大数据处理平台。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-33779e84b11ae5496af7cdce0941232792" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-33779e84b11ae5496af7cdce0941232792" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**Hadoop** 填空 2:**Spark** <p></p></div></div></div> 17.python中 **填空 1** 形式用来表示下一条语句是在上一条语句的结构里。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-860bec851c07c0385f2ff3f49004917a38" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-860bec851c07c0385f2ff3f49004917a38" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**缩进** <p></p></div></div></div> 18.购物篮分析法约分为两类 **填空 1** 和 **填空 2** 购物篮分析法,两者之所以不同思路的根本原因是因为 **填空 3** 差别很大。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-eed29bb9e2b3211454070513ec9fe7ef2" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-eed29bb9e2b3211454070513ec9fe7ef2" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**美式** 填空 2:**日式** 填空 3:**营业面积** <p></p></div></div></div> 19.采用Apriori算法进行相关性分析时,需要进行两个步骤 **填空 1** 和 **填空 2** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-9e6a580e408f3cb3bd165713ec2c38ea88" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-9e6a580e408f3cb3bd165713ec2c38ea88" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**寻找频繁集** 填空 2:**挖掘关联规则** <p></p></div></div></div> 20.可视化的目的,是把 **填空1** ,首要的原则是 **填空 2** 和 **填空 3** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-03f7e1ef2b46433eb7ca29149bf0bd7a63" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-03f7e1ef2b46433eb7ca29149bf0bd7a63" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**把复杂数据有效地展示出来** 填空 2:**准确** 填空 3:**清晰** <p></p></div></div></div> 21.目前基于视频的车流量检测主要有 **填空 1** 和 **填空 2** 两种方法。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c882a2f099b35a817e7d7c4b0192a6db45" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c882a2f099b35a817e7d7c4b0192a6db45" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**设置虚拟线圈** 填空 2:**车辆目标跟踪** <p></p></div></div></div> 22.交通流预测分为 **填空 1** 交通流预测和 **填空 2** 交通流预测。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-6831a39a6e8fd07d5fe75d680abded9b74" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-6831a39a6e8fd07d5fe75d680abded9b74" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**短时** 填空 2:**中长时** <p></p></div></div></div> 23.有监督学习是指对 **填空 1** 的数据进行建模,无监督学习是对 **填空 2** 的数据进行建模。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-aed11d14434198115fc0b93a34aedd6528" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-aed11d14434198115fc0b93a34aedd6528" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**已标注** 填空 2:**无标注** <p></p></div></div></div> 24.信息系统的评价有两个指标: **填空 1** 、 **填空 2** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-85d52baddeb5b3ab0945e0547b22c9a621" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-85d52baddeb5b3ab0945e0547b22c9a621" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**准确率** 填空 2:**召回率** <p></p></div></div></div> 25.文本检索中,向量空间模型在求两个代表文本信息的向量的距离时,采用的策略是求两向量的 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-ac90609cc91c054d821e9c80dbfa670b61" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-ac90609cc91c054d821e9c80dbfa670b61" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**夹角余弦值** <p></p></div></div></div> 26.文本分析前要做预处理,需对文档里的文本做 **填空 1** 分割、 **填空 2** 切分。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-b779d4b44f278bd561c8eed801e73e7f40" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-b779d4b44f278bd561c8eed801e73e7f40" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**句子** 填空 2:**词** <p></p></div></div></div> 27.文本数据也可可视化,常见的方法为 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-8e9a226baf4cb7acf360eb8da7ff624e69" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-8e9a226baf4cb7acf360eb8da7ff624e69" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**词云** <p></p></div></div></div> 28.把文档的内容简要概括,称为文档 **填空 1** <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5802ffa6a626fd757c1252d4043838bd49" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5802ffa6a626fd757c1252d4043838bd49" class="collapse collapse-content"><p></p> 正确答案: 填空 1:**摘要** <p></p></div></div></div> 29.字符串可以用 **填空 1** 或者 **填空 2** 括起来。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-75a161913febff3938f4a20592c94bbd97" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-75a161913febff3938f4a20592c94bbd97" class="collapse collapse-content"><p></p> 填空 1:**单引号** 填空 2:**双引号** <p></p></div></div></div> # 简答题 1.请简述什么是大数据傲慢: <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-6ab318b83fc47f4c13e4d5a01b70ba8d89" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-6ab318b83fc47f4c13e4d5a01b70ba8d89" class="collapse collapse-content"><p></p> 以为利用大数据,就能完全忽略和取代传统数据收集方法。 <p></p></div></div></div> 2.趋势预测与关联分析的不同之处在于? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a47facac69c16535fe8900150f7affe281" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a47facac69c16535fe8900150f7affe281" class="collapse collapse-content"><p></p> 前者着重数据之间的相关关系建模,后者着重挖掘数据之间相关关系的存在 <p></p></div></div></div> 3.简述基于内容的推荐算法思路: <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-1e134ba746d9304743c27079148a1b1237" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-1e134ba746d9304743c27079148a1b1237" class="collapse collapse-content"><p></p> 根据物品的内容来分类,类似的物品间进行推荐 <p></p></div></div></div> 4.简述数据清洗阶段平滑噪声数据常见的三种方法? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-2962a11d6eae21f4601d964dec2d106310" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-2962a11d6eae21f4601d964dec2d106310" class="collapse collapse-content"><p></p> 1. 分箱 2. 回归 3. 聚类 <p></p></div></div></div> 5.试述购物篮分析法有几种分类及所它们所应用的场所。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c3fe2140e50fd9b7f8851881b034be6b9" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c3fe2140e50fd9b7f8851881b034be6b9" class="collapse collapse-content"><p></p> 购物篮分析法有2种分类。 - 第一类是**美式购物篮分析法**,适用于**卖场面积大**、**商品种类多**、**商品陈列区域距离相差大**的卖场,类似于沃尔玛; - 第二类是**日式购物篮分析法**,适用于**营业面积小**,**商品种类少**、**商品陈列区域距离相差小**的卖场,类似于便利店。 <p></p></div></div></div> 6.简述什么是最佳拟合线。 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-05399798e723064ba04cbf71ab9a852274" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-05399798e723064ba04cbf71ab9a852274" class="collapse collapse-content"><p></p> 最佳拟合线指在散点图上绘制一条直线,使得这条直线尽可能通过数据点。 <p></p></div></div></div> 7.趋势预测的原理是? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-ec555f5b86d72ca2047f7e8a0db34d5854" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-ec555f5b86d72ca2047f7e8a0db34d5854" class="collapse collapse-content"><p></p> 收集与要预测的变量可能相关的数据,建立预测模型 <p></p></div></div></div> 8.简述基于人口统计学的推荐算法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-832564e63bf0e35ac58895c44118ecda82" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-832564e63bf0e35ac58895c44118ecda82" class="collapse collapse-content"><p></p> 给用户来进行分类,根据用户的喜好推荐给相似的用户。 <p></p></div></div></div> 9.简述Apriori算法中的频繁集与关联规则 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-7d48fb3817f1c152f4ea25504f4e85bc14" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-7d48fb3817f1c152f4ea25504f4e85bc14" class="collapse collapse-content"><p></p> **频繁集**:是指经常在一起购买的物品集合。 **关联规则**:是频繁集中物品之间的影响规则。 <p></p></div></div></div> 10.简述决策树方法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-c0ce17e033b47917ef84054ee9ede7f385" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-c0ce17e033b47917ef84054ee9ede7f385" class="collapse collapse-content"><p></p> 决策树方法是人们把决策问题的自然状态或条件出现的概率、行动方案、益损值、预测结果等,用一个树状图表示出来,并利用该图反映出人们思考、预测、决策的全过程。 <p></p></div></div></div> 11.决策树有那些步骤? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-21611e706143da4c421b2019cc78a48244" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-21611e706143da4c421b2019cc78a48244" class="collapse collapse-content"><p></p> 1. 特征选择 2. 决策树的生成 3. 决策树的修剪 <p></p></div></div></div> 12.简述数据可视化 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-1215f2720496a449230e191c8000af324" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-1215f2720496a449230e191c8000af324" class="collapse collapse-content"><p></p> 数据可视化是指利用计算机图形学等技术,将数据通过图形化的方式展示出来,直观地表达数据中蕴含的信息、规律和逻辑,便于用户进行观察和理解。 <p></p></div></div></div> 13.简述大数据平台一般处理流程 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-e79019a9bd820a74ed1beb8be631111388" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-e79019a9bd820a74ed1beb8be631111388" class="collapse collapse-content"><p></p> 1. 数据采集 2. 数据存储 3. 数据处理 4. 数据展现 <p></p></div></div></div> 14.简述传统商业数据 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-4372be4e77e3758c8f1a82e0120c8e2869" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-4372be4e77e3758c8f1a82e0120c8e2869" class="collapse collapse-content"><p></p> 传统商业数据指来自于各类企业ERP系统、各种POS终端及网上支付系统等业务系统的数据,包括审计和日志等自动生成的信息。 <p></p></div></div></div> 15.简述互联网数据 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-3263dc653015fc94ffa7f5aa2a6952e298" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-3263dc653015fc94ffa7f5aa2a6952e298" class="collapse collapse-content"><p></p> 互联网数据是指网络空间交互过程中产生的大量数据 <p></p></div></div></div> 16.为什么要进行数据预处理? <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a895df8e2210da327315cbb0ffd5796d5" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a895df8e2210da327315cbb0ffd5796d5" class="collapse collapse-content"><p></p> 由于庞大的数据库和繁多的异构数据源,当今现实世界的数据库极易受噪声、默认值和不一致数据的侵扰,低质量的数据将导致低质量的挖掘结果,故需要进行数据预处理。 <p></p></div></div></div> 17.简述大数据预处理的方法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f33f85e5b8acb325e40eb0dd1baae61839" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f33f85e5b8acb325e40eb0dd1baae61839" class="collapse collapse-content"><p></p> 1. 数据清洗 2. 数据集成 3. 数据变换 4. 数据归约 <p></p></div></div></div> 18.简述数据清洗目的 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-6788a0d063a9d4402dcf5279a5f88ff682" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-6788a0d063a9d4402dcf5279a5f88ff682" class="collapse collapse-content"><p></p> 数据清洗目的在于纠正存在的错误,并提供数据一致性。 <p></p></div></div></div> 19.数据规范化有那些算法 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-42ac3ce29e56f4609c916b13422a523d26" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-42ac3ce29e56f4609c916b13422a523d26" class="collapse collapse-content"><p></p> 1. 归一化 2. 标准化 3. 中心化 <p></p></div></div></div> 20.大数据存储面临那些挑战 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-513e36e171d3fbc6757db30a7c9951d953" aria-expanded="true"><div class="accordion-toggle"><span style="">答案</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-513e36e171d3fbc6757db30a7c9951d953" class="collapse collapse-content"><p></p> 1. 容量问题 2. 延迟问题 3. 安全问题 4. 成本问题 5. 数据的积累 6. 灵活性 7. 应用感知 <p></p></div></div></div> # 主观题 ### 1.计算准确率与召回率 计算信息检索系统评价指标,一个是准确率,一个是召回率如下图的检索结果,请计算此系统的准确率和召回率。 | | 实际上相关的文档 | 实际上不相关的文档 | | -------------------------------------- | ------------------ | -------------------- | | 检索系统返回的、判断为相关的文档 | 15 | 3 | | 检索系统不返回的、判断为不相关的文档 | 6 | 3 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-f07648d3e96fc7909c08477b72a3a7f479" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-f07648d3e96fc7909c08477b72a3a7f479" class="collapse collapse-content"><p></p> 没什么好分析的看图上公式:  <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-5372b8e9fa863396e57f03055e780f9373" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-5372b8e9fa863396e57f03055e780f9373" class="collapse collapse-content"><p></p> 准确率 = $\frac{15+3}{15+3+6+3}\quad$ = $\frac{2}{3}\quad$ 召回率 = $\frac{15}{15+6}\quad$ = $\frac{5}{7}\quad$ <p></p></div></div></div> ### 2.简单线性回归 这是一家4S店投放的广告和销售量的记录表,假设投放的广告量为15,用简单线性回归模型预测销售是多少? | 广告量 | 销售量 | | -------- | -------- | | 1 | 6 | | 7 | 21 | | 3 | 10 | | 5 | 18 | | 9 | 25 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-a1b1bffb34ebf005c45168a41e436ee447" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-a1b1bffb34ebf005c45168a41e436ee447" class="collapse collapse-content"><p></p> 带入简单线性回归方程即可: $$ y=kx+b $$ $$ k=\frac{\sum_{i=1}^{n}(x_i-\overline{\text{x}} )(y_i-\overline{\text{y}} )}{\sum_{i=1}^{n}(x_i-\overline{\text{x}} )^2}\quad $$ $$ b=\overline{\text{y}}-b_1\overline{\text{x}} $$ <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-bde939570f274351731f05fad5d3a27064" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-bde939570f274351731f05fad5d3a27064" class="collapse collapse-content"><p></p> ```matlab 答:由题意,假设广告量为自变量x,销售量为应变量y。 所以可设线性回归方程 y = kx+b 其中k为斜率,b为截距。 根据记录表的数据可知: 平均广告量x` = (1 + 7 + 3 + 5 + 9)/5 = 5 平均销售量y` = (6 + 21 + 10 + 18 + 25)/5 = 16 所以 k = [(1-5)*(6-16)+(7-5)*(21-16)+(3-5)*(10-16)+(5-5)*(18-16)+(9-5)*(25-16)]/[(1-5)^2+(7-5)^2+(3-5)^2+(5-5)^2+(9-5)^2] = (-4*(-10)+2*5+(-2)*(-6)+4*9)/(16+4+4+16) = 98/40 = 2.45 由 b = y` - kx` = 16 - 2.45*5 = 3.75 得到回归方程为 y = 2.45x + 3.75 所以当广告量为 15 时 2.45*15 + 3.75 = 40.5 ``` <p></p></div></div></div> ### 3.K近邻(KNN)算法 假设下表是判断糖尿病的训练集,请用K近邻(KNN)算法来预测第8个用户是否患病,若k=5, 采用欧式距离为距离度量,请写出预测结果。 | 编号 | k | u | class | | ------ | --- | --- | ------- | | 1 | 2 | 3 | 0 | | 2 | 4 | 4 | 1 | | 3 | 6 | 2 | 1 | | 4 | 1 | 4 | 1 | | 5 | 3 | 7 | 0 | | 6 | 5 | 2 | 1 | | 7 | 6 | 4 | 0 | | 8 | 3 | 8 | ? | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-0a487be0853c29c36a25f987b632063817" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-0a487be0853c29c36a25f987b632063817" class="collapse collapse-content"><p></p> 首先给到需要使用到的公式: * 欧式距离公式:$d=\sqrt{\sum_{j=1}^n(x_j-y_j)^2}\quad$ $$ 比如说A(a_1,b_1,c_1),B(a_2,b_2,c_2): $$ $$ d=\sqrt{(a_1-a_2)^2+(b_1-b_2)^2+(c_1-c_2)^2}\quad $$ - 曼哈顿距离公式:$d=\sum_{j=1}^n|x_j-y_j|$ $$ 比如说A(a_1,b_1,c_1),B(a_2,b_2,c_2): $$ $$ d=|a_1-a_2|+|b_1-b_2|+|c_1-c_2| $$ KNN计算步骤: 1. 计算所有数据与预测数据的距离 2. 按照距离从近到远进行排序 3. 选取k组(一般题目会给出k值)中类别较多的一组 4. 将需要预测的一组归类为广告选取的类别 <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-9805ced2c710cca7a007abb9bec1777063" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-9805ced2c710cca7a007abb9bec1777063" class="collapse collapse-content"><p></p> ```matlab 答:由题意,k=5,使用欧氏距离。 各个编号与第8个用户计算得到的距离为: dis[1] = sqrt((2-3)^2+(3-8)^2) = sqrt(26) dis[2] = sqrt((4-3)^2+(4-8)^2) = sqrt(17) dis[3] = sqrt((6-3)^2+(2-8)^2) = sqrt(45) dis[4] = sqrt((1-3)^2+(4-8)^2) = sqrt(18) dis[5] = sqrt((3-3)^2+(7-8)^2) = sqrt(1) dis[6] = sqrt((5-3)^2+(2-8)^2) = sqrt(40) dis[7] = sqrt((6-3)^2+(4-8)^2) = sqrt(25) 注:其中sqrt为根号 由k=5,选取5个与第8个用户最近的邻居。 选取结果为:dis[5]、dis[2]、dis[4]、dis[7]、dis[1] 其中 class 为 0 的令居有dis[5]、dis[7]、dis[1] 一共3个。 class 为 1 的令居有dis[2]、dis[2] 一共2个。 综上所述:第8个用户class 为 0,即第8个用户不患病。 ``` <p></p></div></div></div> ### 4.计算支持度、置信度、提高度 有购物数据集如下,请计算支持度S(面包- -> 牛奶),及置信度C(面包一-> 牛奶),提高度L(面包一-> 牛奶)。 | 购物记录 | 商品 | | ---------- | -------------------------------- | | 1 | 啤酒、面包、薯条、阿司匹林 | | 2 | 尿布、面包、葡萄酒、米糊、牛奶 | | 3 | 雪碧、薯条、牛奶 | | 4 | 啤酒、牛奶、冰淇淋、薯条 | | 5 | 雪碧、咖啡、牛奶、面包、啤酒 | | 6 | 啤酒、薯条 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-7f3c29c842e67307bdccdefa67f21fa352" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-7f3c29c842e67307bdccdefa67f21fa352" class="collapse collapse-content"><p></p> 废话少说,上公式 <div class="tip inlineBlock info"> **1.支持度** </div> - 支持度$S(A→B)$指的是<span style='color:#A52A2A'>**A与B同时出现的概率**</span> 计算公式: $S(A→B)=\frac{N(A\&B)}{N}\quad$ <div class="tip inlineBlock info"> **2.置信度** </div> - 置信度$C(A→B)$指的是<span style='color:#A52A2A'>**A出现的情况下B同时出现的概率**</span> 计算公式: $C(A→B)=\frac{N(A\&B)}{N(A)}\quad$ <div class="tip inlineBlock info"> **3.提高度** </div> - 提高度$L(A→B)$指的是<span style='color:#A52A2A'>**A出现的情况对于B出现的影响度**</span> 计算公式: $L(A→B)=\frac{C(A→B)}{S(B)}\quad$ <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-ce6e02f4ac770639e6977da38f2bf3dc42" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-ce6e02f4ac770639e6977da38f2bf3dc42" class="collapse collapse-content"><p></p> ```matlab 答:根据购物数据集 S(面包->牛奶) = N(面包&牛奶)/N = 2/6 = 1/3 C(面包->牛奶) = N(面包&牛奶)/N(面包) = 2/3 S(牛奶) = 4/6 = 2/3 L(面包->牛奶) = C(面包->牛奶) /S(牛奶)=(2/3)/(2/3)=1 ``` <p></p></div></div></div> ### 5.K-means聚类算法 假设采用K-means聚类算法将下表的用户分成两类,请描述K-means聚类算法步骤,距离函数自由选定。 | 用户 | A | B | C | | ------ | --- | --- | --- | | 1 | 1 | 1 | 2 | | 2 | 2 | 4 | 1 | | 3 | 4 | 6 | 7 | | 4 | 3 | 1 | 3 | | 5 | 1 | 2 | 1 | | 6 | 6 | 3 | 2 | | 7 | 5 | 5 | 4 | <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-37d3d5b6773e13227f15eee93bb97f7235" aria-expanded="true"><div class="accordion-toggle"><span style="">【算法分析】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-37d3d5b6773e13227f15eee93bb97f7235" class="collapse collapse-content"><p></p> 下面视频教程: <iframe class="iframe_video" src="https://player.bilibili.com/player.html?aid=797539164&bvid=BV1py4y1r7DN&cid=249834109&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe> <p></p></div></div></div> <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-dd5fba6ed713cb23cafe63b3e74ecba485" aria-expanded="true"><div class="accordion-toggle"><span style="">【参考程序】</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-dd5fba6ed713cb23cafe63b3e74ecba485" class="collapse collapse-content"><p></p> ```matlab 答:根据表中数据,我选择曼哈顿距离公式。 将这7个数据中,3、4设置为类心1、2。 根据曼哈顿距离公式其余各点到两个类心的距离 (1)到类心3的距离: G1[1] = |1-4| + |1-6| + |2-7| = 13 G1[2] = |2-4| + |4-6| + |1-7| = 10 G1[3] = |4-4| + |6-6| + |7-7| = 0 G1[4] = |3-4| + |1-6| + |3-7| = 10 G1[5] = |1-4| + |2-6| + |1-7| = 13 G1[6] = |6-4| + |3-6| + |2-7| = 10 G1[7] = |5-4| + |5-6| + |4-7| = 5 (2)到类心4的距离: G2[1] = |1-3| + |1-1| + |2-3| = 3 G2[2] = |2-3| + |4-1| + |1-3| = 6 G2[3] = |4-3| + |6-1| + |7-3| = 10 G2[4] = |3-3| + |1-1| + |3-3| = 0 G2[5] = |1-3| + |2-1| + |1-3| = 5 G2[6] = |6-3| + |3-1| + |2-3| = 6 G2[7] = |5-3| + |5-1| + |4-3| = 7 根据每个点距离最近的类 类1:3、7 类2:1、2、4、5、6 根据两个类所有的点,可得两类的平均坐标 A(4.5, 5.5, 5.5) B(2.6, 2.2, 1.8) 即获得新的两类坐标。 再计算各个点到两类的距离 (1)到类A的距离 dis_A[1] = |1-4.5| + |1-5.5| + |2-5.5| = 11.5 dis_A[2] = |2-4.5| + |4-5.5| + |1-5.5| = 8.5 dis_A[3] = |4-4.5| + |6-5.5| + |7-5.5| = 2.5 dis_A[4] = |3-4.5| + |6-5.5| + |7-5.5| = 3.5 dis_A[5] = |1-4.5| + |2-5.5| + |1-5.5| = 11.5 dis_A[6] = |6-4.5| + |3-5.5| + |2-5.5| = 8.5 dis_A[7] = |5-4.5| + |5-5.5| + |4-5.5| = 2.5 (2)到类B的距离 dis_B[1] = |1-2.6| + |1-2.2| + |2-1.8| = 3 dis_B[2] = |2-2.6| + |4-2.2| + |1-1.8| = 3.2 dis_B[3] = |4-2.6| + |6-2.2| + |7-1.8| = 11.8 dis_B[4] = |3-2.6| + |1-2.2| + |3-1.8| = 3.8 dis_B[5] = |1-2.6| + |2-2.2| + |1-1.8| = 3.6 dis_B[6] = |6-2.6| + |3-2.2| + |2-1.8| = 4.4 dis_B[7] = |5-2.6| + |5-2.2| + |4-1.8| = 8.4 根据每个点距离最近的类 类1:3、7 类2:1、2、4、5、6 由于关联点没有变化,所以停止计算。 所以按照题目要求,用户分为了 第一类:3、7 第二类:1、2、4、5、6 ``` <p></p></div></div></div> 最后修改:2021 年 12 月 25 日 © 允许规范转载 打赏 赞赏作者 支付宝微信 赞 3 如果觉得我的文章对你有用,请随意赞赏