打开

Glossary

术语表

Aggregation bias
Aggregation bias occurs when incorrect inferences are made from data that have been aggregated, or summarized. This includes making inferences about the characteristics of the parts of a whole based on the characteristics of the whole itself.
加总偏误
加总偏误往往来源于对数据进行汇总、总结的过程中的错误推导(比如依据总体推测部分)。
Aggregation
Aggregation refers to the process by which data have been collected and summarized in some way or sorted into categories.
汇总
表示对数据进行汇总、分类的过程。
Applied statistics
Applied statistics is a type of statistics that involves the application of statistical methods to disciplines and areas of study.
应用统计学
与偏向统计方法数学原理的理论统计学相对,应用统计学侧重于统计方法的应用。
Arithmetic mean
An arithmetic mean, often simply called a mean, is a type of average, or measure of central tendency, in which the middle of a dataset is determined by adding its numeric values and dividing by the number of values.
算术平均数
通常被简称为平均数,又称均值,数值上等于数据之和与数据个数的比值。
Attitudinal statement
An attitudinal statement is a type of response given to a scaled question on a survey that asks an individual to rate his or her feelings toward a particular topic.
态度声明
指当问卷中涉及描述个人对一个特殊话题的看法时,给出了划定范围的选项时人们的一种表现。
Axis label
An axis label is used on a graph to denote the kind of unit or rate of measurement used as the dependent or independent variable (or variables), and can be found along an axis of a graph.
坐标轴标签
轴标签沿图表中的坐标轴被标示出,用于标注自变量和因变量的单位种类或比率。
Back transformation
Back transformation is the process by which mathematical operations are applied to data in a dataset that have already been transformed, in order to back transform, or revert, the data to their original form.
逆向转换
逆向转换过程,应用数学运算到已转换的数据集上,为了逆向转换或者说恢复数据到原来的形式。
Back-end check
A back-end check, also called a server-side check is a type of data validation for datasets gathered electronically, and is performed at the back end, or after data are stored in an electronic database.
后台检测
后台检测,也被称为服务端检测,是对电子数据库进行检测的一种方式。在数据被储存在电子数据库之后,于后台终端进行。
Bar graph
A bar graph or chart uses horizontal or vertical bars whose lengths proportionally represent values in a dataset. A chart with vertical bars is also called a column graph or chart.
条形图
条形图使用水平或垂直的条形,其比例长度表示数据的值。垂直的条形图也被称为柱状图。
Cartogram
A cartogram is a map that overlays categorical data onto a projection and uses a different color to represent each category. Unlike a heat map, a cartogram does not necessarily use color saturation to depict the frequency of values in a category.
变形地图
变形地图是指在一幅地图上,用不同的颜色来代表每个类别,将数据信息呈现在地图上。和热点地图不同,变形地图并不一定需要用到不同深浅的颜色来表现某个种类发生的频率。
Categorical data
Categorical data are data, quantitative or qualitative, that can be sorted into distinct categories.
可分类数据
指能够以定性或定量的方式分门别类的数据。
Category label
A category label is used on a graph to denote the name of a category, or a group, of data and may be descriptive or a range of numeric values.
类别标签
类别标签用于在图表中给某类、某组数据标注类别名称,以解释说明数据信息。
Chart title
A chart title is the description assigned to the graph and includes a summary of the message aimed at the target audience and may include information about the dataset.
图表标题
图表标题是对整个图表以及所有信息概要的解释说明,往往总结了整个图表所要表达的信息,以引起目标受众的注意。
Checkbox response
A checkbox response refers to an answer given to a question in a survey administered in electronic form, for which one or more responses can be selected at a time, as may be indicated by a checkbox an individual clicks on.
复选框选项
复选框通常出现在问卷调查的电子表格中,其问题的答案可供个人同时单选或多选,即多重选项。
Chroma
Chroma is the saturation, or vividness, of a hue.
色度
反映颜色色调和饱和度的物理量。
Closed question
A closed, or closed-ended, question is a type of question featured in a poll or survey that requires a limited, or specific kind of, response and is used to collect quantitative data or data that can be analyzed quantitatively later on.
封闭式问题
封闭式问题是用于调查问卷或投票中的一种问题类型,只给出限定的、明确的答案以供选择。通常被用于定量数据或能被定量统计的数据。
Codebook
A codebook documents the descriptions, terms, variables, and values that are represented by abbreviated or coded words or symbols used in a dataset, and serves as a means for coding and decoding the information.
电报密码本
一本码书以缩略加密的词语或在某一数据集中使用的符号来记录一些类型、条件、变量以及取值,同时它也可以作为一种编码或解码信息的工具。
Color theory
Color theory refers to principles of design focuses on colors and the relationships between them.
颜色理论
颜色理论表示一系列基于颜色以及它们之间关系的设计原则。
Continuous variable
A continuous variable, or continuous scale, has an unlimited number of possible values between the highest and lowest values in a dataset.
连续变量
在一定区间内可以任意取值的变量,其数值是连续不断的。
Correlation
Correlation measures the degree of association, or the strength of the relationship, between two variables using mathematical operations.
相关性
指利用数学运算来衡量两个变量之间的相关程度或联系强度。
CRAAP test
The CRAAP test denotes a set of questions a researcher may use to assess the quality of source information across five criteria: currency, relevance, authority, accuracy, and purpose.
CRAAR测试
研究者可基于以下五个标准:传播性、实用性、权威性、准确性和目的性,提出一系列问题,来评价资源信息的质量。这样的问题即被称为CRAAR测试。
Data cleaning
Data cleaning, also called data checking or data validation, is the process by which missing, erroneous, or invalid data are determined and cleaned, or removed, from a dataset and follows the data preparation process.
数据清理
数据清理也叫数据检测或数据核实,是从数据集中判断、整理或转移那些丢失、弄错或未经核实的数据。一般在数据准备的下一个阶段。
Data label
A data label is used on a graph to denote the value of a plotted point.
数据标签
数据标签被用于说明图表上被标绘点的值。
Data preparation
Data preparation is the process by which data are readied for analysis and includes the formatting, or normalizing, of values in a dataset.
数据准备
指将数据准备好并用于分析的过程,包括将数据集里的值格式化或者正规化。
Data transformation
Data transformation is the process by which data in a dataset are transformed, or changed, during data cleaning and involves the use of mathematical operations in order to reveal features of the data that are not observable in their original form.
数据转换
数据转换是指数据集中的数据在数据清理过程中被转换、更改的过程。通过数学运算,可以将数据在原始形式中隐含的特点挖掘出来。
Data visualization
Data visualization, or data presentation, is the process by which data are visualized, or presented, after the data cleaning process, and involves making choices about which data will be visualized, how data will be visualized, and what message will be shared with the target audience of the visualization. The end result may be referred to as a data visualization.
数据可视化
数据可视化,也叫数据展示,是在经过数据清理后,将数据图形化并呈现出来的过程。这就包括选择将何种类型的数据可视化、如何可视化,以及通过可视化能让目标受众分享到什么样的信息。
Data
Data are observations, facts, or numeric values that can be described or measured, interpreted or analyzed.
数据
数据是观测值、事实或数据值,能够被描绘、计量、解释以及分析。
Dependent variable
A dependent variable is a type of variable whose value is determined by, or depends on, another variable.
因变量
指一类决定于、随另一组变量变化的变量。
Descriptive statistics
Descriptive statistics is a type of applied statistics that numerically summarizes or describes data that have already been collected and is limited to the dataset.
描述统计学
描述统计学是一类应用统计学,当一组数据被采集后,通过数学手段在这组数据范围内对数据进行总结或解释。
Diary
A diary is a data collection method in which data, qualitative or quantitative, are tracked over an extended period of time.
日记
日志是一种数据收集方法,在一段时间范围内跟踪定性或定量的数据。
Dichotomous question
A dichotomous question is a type of closed question featured in a poll or survey that requires an individual to choose only one of two possible responses.
两分问题
两分问题是投票或调查中特有的一类封闭式问题,要求每个个体从两个选项里选出一个答案。
Direct measurement
Direct measurement is a type of measurement method that involves taking an exact measurement of a variable and recording that numeric value in a dataset.
直接测量
测量方法的一类,包括对一个变量进行准确测量,并将其数值记录于数据集中。
Discrete variable
A discrete variable, or a discrete scale, has a limited number of possible values between the highest and lowest values in a dataset.
离散变量
假如一个变量仅可能取有限个值或可列个值则称为离散变量
External data
External data refer to data that a researcher or organization use, but which have been collected by an outside researcher or organization.
外部数据
来自其他研究者或机构,但被研究者自己或自身机构利用的数据。
Factoid
A factoid, or trivial fact, is a single piece of information that emphasizes a particular point of view, idea, or detail. A factoid does not allow for any further statistical analysis.
仿真陈述
仿真陈述表示一种缺乏可信度及佐证的陈述,往往因其发表在出版物上而被传播并使大众信以为真。
Filter
A filter is a programmed list of conditions that filters, or checks, items that meet those conditions and may specify further instructions either for the filtered items.
过滤器
选取某一“滤子”对集合中所有数据进行校验以达到去除某些无关数据的方法。
Focus group
A focus group is a data collection method used for qualitative research in which a group of selected individuals participate in a guided discussion.
焦点小组
用于定性研究的数据采集方法,在由经特定挑选的成员组成的小组中进行指向性讨论来获得数据。
Forced question
A forced question is a type of scaled question featured in a survey that requires an individual to choose from a give range of possible responses, none of which is neutral.
必选问题
问卷调查中的一类问题,要求被调查者从给定的不中立的选项中选择一项。
Front-end check
A front-end check, also called a client-side check, is a type of data validation for datasets gathered electronically, and is performed at the front end, or before data are stored in an electronic database.
前端检测
也被称作客户端检测,一种对电子采集的数据集进行数据核实的方法。在电子数据集被采集之前与前端执行。
Graphical user interface (GUI)
A graphical user interface, or GUI, is a type of interface that allows a user to interact with a computer through graphics, such as icons and menus, in place of lines of text.
用户图形界面(GUI)
用户图形界面(GUI)让用户能在电脑和图形之间进行交互的界面,比如图标或菜单,来代替文本。
Heat map
A heat map is a graph that uses colors to represent categorical data in which the saturation of the color reflects the category’s frequency in the dataset.
热点地图
热点地图利用颜色来表示不同类别的数据,用色彩的饱和度来反映某类数据发生的频率。
Histogram
A histogram is a graph that uses bars to represent proportionally a continuous variable according to how frequently the values occur within a dataset.
柱状图
柱状图用条形来成比例地表现一组连续变量在数据集中出现的频率。
Hue
A hue, as defined in color theory, is a color without any black or white pigments added to it.
色调
在色彩理论中所定义的未添加任何黑白色素的颜色。
Independent variable
An independent variable is a type of variable that can be changed, or manipulated, and determines the value of at least one other variable.
自变量
指能被修改、操作,决定至少一组其他变量的一类变量。
Inferential statistics
Inferential statistics is a type of applied statistics that makes inferences, or predictions, beyond the dataset.
推论统计学
指基于数据样本来进行推论、预测的一种应用统计学。
Infographic
An infographic is a graphical representation of data that may combine several different types of graphs and icons in order to convey a specific message to a target audience.
信息图
通过组合集中不同类型的图表和图标,来为受众传递某个具体的信息,是数据的图形化展示。
Interactive graphic
An interactive graphic is a type of visualization designed for digital or print media that presents information that allows, and may require, input from the viewer.
交互图形
为数字或平面媒体进行可视化设计,以呈现允许、或要求能让读者参与操作的信息。
Interviewer effect
Interviewer effect refers to any effect an interviewer can have on subjects such that he or she influences the responses to the questions.
采访者效应
指在进行采访时,会由于采访者个人的因素影响被采访者的回答。
Invalid data
An invalid data are values in a dataset that fall outside the range of valid, or acceptable, values during data cleaning.
无效数据
指在数据清理的过程中,一组数据集里超出有限、合理的范围的数值。
Leading question
A leading question is a type of question featured in a poll or survey that prompts, or leads, an individual to choose a particular response and produces a skewed, or biased, dataset.
诱导性提问
指在进行调查的过程中,提示或引导了被采访者选择某个特定的答案,从而导致产生出错误、有偏差的数据集。
Legend
A legend is used on a graph in order to denote the meaning of colors, abbreviations, or symbols used to represent data in dataset.
图例
图例被用于说明图表中颜色、缩写或符号的意义。
Legibility
Legibility is a term used in typography and refers to the ease with which individual characters in a text can be distinguished from one another when read.
易读性
用于印刷排版学的术语,衔接两篇文章时,通过使用不同的字体使读者在阅读时能更容易辨识。
Line graph
A line graph uses plotted points that are connected by a line to represent values of a dataset with one or more dependent variables and one independent variable.
线形图
有多个标注点组成一条线,通过一个或多个的因变量以及一个自变量来表示数据值。
Median
A median is a type of average, or measure of central tendency, in which the middle of a dataset is determined by arranging its numeric values in order.
中位数
用于衡量数据集的平均水平和集中趋势,通过将数据按大小排序,来找出其中的中间值。
Metadata
Metadata are data about other data, and may be used to clarify or give more information about some part or parts of another dataset.
元数据
元数据是“描述其他数据的数据”,用于解释或给部分其他数据集提供更多信息。
Missing data
Missing data are values in a dataset that have not been stored sufficiently, whether blank or partial, and may be marked by the individual working with the dataset.
数据丢失
指数据未被有效储存,造成数据空缺或不全。个人操作数据库时容易出现这种情况。
Mode
A mode is a numeric value that appears most often in a dataset.
众数
在一组数据中出现最多的数值。
Motion graphic
A motion graphic is a type of visualization designed for digital media that presents moving information without need for input from the viewer.
动态图
为数字媒体设计的可视化数据类型,不需要读者的操作即可呈现动态的信息。
Multiseries
A multiseries is a dataset that compares multiple series,or two or more dependent variables and one independent variable.
多系列
两种或两种以上的因变量和一个自变量。
Normal distribution
A normal distribution, often called a bell curve, is a type of data distribution in which the values in a dataset are distributed symmetrically around the mean value. Normally distributed data take the shape of a bell when represented on a graph,the height of which is determined by the mean of the sample, and the width of which is determined by the standard deviation of the sample.
正态分布
正态分布,有时也被称为钟形曲线,是数据分布的一种,其数值关于均数对称分布。正态分布数据在图表中表现为一个“钟形”,其高度决定于样本的均值,而宽度取决于样本的标准方差。(译注:纯直译,但感觉有点问题)
Open content
Open content, open access, open source, and open data are closely-related terms that refer to digital works that are free of most copyright restrictions. Generally, the original creator has licensed a work for use by others at no cost so long as some conditions, such as author attribution, are met (See: Suber, Peter. Open Access, Cambridge, Massachusetts: MIT Press, 2012). Conditions vary from license to license and determine how open the content is.
开放内容
开放内容,也叫开放资源、开放存取、开放数据,是与数字工作息息相关的概念,指免费、公开的著作版权。通常来说,除非在某些特别情况下,比如作者注明了“该版权归作者所有”(如:苏泊尔,皮特。开放存取,剑桥,马萨诸塞州:麻省理工学院出版社,2012),原创作者允许其他人免费使用其作品。不同的内容往往有不同的许可情况。
Open question
An open, or open-ended question, is a type of question featured in a survey that does not require a specific kind of response and is used to collect qualitative data.
开放式问题
开放式问题是一类在调查中特有的问题。没有要求被访者给出明确的答案,用于收集定性数据。
Order bias
Order bias occurs when the sequencing of questions featured in a survey has an effect on the responses an individual chooses, and produces a biased, or skewed, dataset.
顺序误差
顺序误差发生在进行调查时,由于问题的顺序安排会对个体选择答案产生影响,从而产生有误差的、错误的数据。
Outlier
An outlier is an extremely high or extremely low numeric value that lies outside the distribution of most of the values in a dataset.
离群值
某些数值过大或过小,与数据集中其他的大多数值相差过大,从而超出了一定的数据范围。这样的数值被称为离群值。
Pattern matching
Pattern matching is the process by which a sequence of characters is checked against a pattern in order to determine whether the characters are a match.
模式匹配
通过核对字符顺序,以确认字符是否匹配的模式。
Peer-to-peer (P2P) network
A peer-to-peer network, often abbreviated P2P, is a network of computers that allows for peer-to-peer sharing, or shared access to files stored on the computers in the network rather than on a central server.
对等(P2P)网络
A peer-to-peer network, often abbreviated P2P, is a network of computers that allows for peer-to-peer sharing, or shared access to files stored on the computers in the network rather than on a central server.
Pie chart
A pie chart is a circular graph divided into sectors, each with an area relative to whole circle, and is used to represent the frequency of values in a dataset.
饼图
被切分成扇形的圆形图形,每一个区域都与整个圆形相关。用于表示数值出现的频率。
Population
A population is the complete set from which a sample is drawn.
总体
总体指一套完整的整体。一份样本即从总体中抽出。
Probability
Probability is the measure of how likely, or probable, it is that an event will occur.
可能性
用来衡量一个事件发生的可能性或几率。
Qualitative data
Qualitative data are a type of data that describe the qualities or attributes of something using words or other non-numeric symbols.
定性数据
用文字或其他非数字符号来描述某物的品质或属性。
Quantitative data
Quantitative data are a type of data that quantify or measure something using numeric values.
定量数据
用数值来量化、衡量事物的数据类型。
Radio response
A radio response refers to an answer given to a question in a poll or survey administered in electronic form, for which only one response can be selected at a time, as may be indicated by a round radio button an individual clicks on.
单选框响应
用电子表格做调查问卷时,一次只能选择一个选项,通常表现为可单击的圆钮。
Range check
A range check is a type of check used in data cleaning that determines whether any values in a dataset fall outside a particular range.
区域检查
一种用于数据清理的过程中,判断是否有数值不属于所需的特定范围内的检查。
Range
A range is determined by taking the difference between the highest and lowest numeric values in a dataset.
区域
区域由数据集中的被划分的数值决定,以最大值和最小值为界限。
Raw data
Raw data refer to data that have only been collected, not manipulated or analyzed, from a source.
原始数据
指仅仅从源头被采集,还没有进行操作、分析的数据。
Readability
Readability is a term used in typography and refers to the ease with which a sequence of characters in a text can be read. Factors affecting readability include the placement of text on a page and the spacing between characters, words, and lines of text.
可读性
印刷出版中的术语,表示受文章中的文字排列影响,造成阅读过程中难易程度不同。影响可读性的因素包括,文章在页面上的排布、字母、文字、行列之间的空间等。
Sample
A sample is a set of collected data.
样本
指被采集的一套数据。
Sampling bias
Sampling bias occurs when some members of a population are more or less likely than other members to be represented in a sample of that population.
抽样偏误
指在从总体抽样的过程中,出现了某些与其他相比差异性过大的个体,从而使样本产生误差。
Scaled question
A scaled question is a type of question featured in a survey that requires an individual to choose from a given range of possible responses.
Scaled question
A scaled question is a type of question featured in a survey that requires an individual to choose from a given range of possible responses.
Scatterplot
A scatterplot uses plotted points (that are not connected by a line) to represent values of a dataset with one or more dependent variables and one independent variable.
散点图
散点图表示利用标出点(这里并非组成线段),以及一个或多个因变量与一个自变量的关系来表示数值的图形。
Series graph
A series graph proportionally represents values of a dataset with two or more dependent variables and one independent variable.
多序列图
用于展示包含两个或以上的应变量和一个自变量的数据集的值。
Series
A series is a dataset that compares one or more dependent variables with one independent variable.
多序列
比较一个或以上的应变量和一个自变量的数据集。
Shade
Shade refers to adding black to a hue in order to darken it.
阴影
指在色调上添加黑色来使其变暗。
Skewed data
Skewed data are data with a non-normal distribution and tend to have more values to the left, as in left-skewed, or right, as in right-skewed, of the mean value when represented on a graph.
偏态数据
指数据呈非正态分布趋势,并且有多项数值呈现在图表上时,较之均值有较明显的左偏、右偏现象。
Stacked bar graph
A stacked bar graph is a type of bar graph whose bars are divided into sub-sections, each of which proportionally represent categories of data in a dataset that can be stacked together to form a larger category.
堆叠条形图
堆叠条形图的条形被分成不同的小段,每一小段表示数据的小类,通过将其成比例地堆积构成一个更大的类别。
Standard deviation
A standard deviation is a measure of how much the values in a dataset vary, or deviate, from the arithmetic mean by taking the square root of the variance.
标准差
通过求出方差的平方根,来衡量一组数据的变化程度、离散程度。
Static graphic
A static graphic is a type of visualization designed for digital or print media that presents information without need for input from the viewer.
静态图
静态图表示一类被用于数字或平面媒体的可视化信息,其不需要读者参与操作。
Statistics
Statistics is the study of collecting, measuring, and analyzing quantitative data using mathematical operations.
统计学
统计学是一门通过数学运算采集、测量、分析定量数据的学科。
Summable multiseries
A summable multiseries is a type of multiseries with two or more dependent variables that can be added together and compared with an independent variable.
加合多序列
A summable multiseries is a type of multiseries with two or more dependent variables that can be added together and compared with an independent variable.
Summary record
A summary record is a record in a database that has been sorted,or aggregated, in some way after having been collected.
汇总记录
数据集被采集后,被分类、汇总时的历史记录。
Tint
Tint refers to adding white to a hue in order to lighten it.
颜色淡化
表示增加白色来提高其亮度。
Transactional record
A transactional record is a record in a database that has not yet been sorted, or aggregated, after collection.
事务记录
指还被采集后还未被分类、汇总的数据集记录。
Value (color)
Value, or brightness, refers to the tint, shade, or tone of a hue that results black or white pigments to a base color.
亮度
亮度指在基础色彩上,综合色彩、阴影或色调所呈现出的黑与白的明暗。
Variance
Variance, or statistical variance, is a measure of how spread out the numeric values in a dataset are, or how much the values vary, from the arithmetic mean.
方差
用方差指在一组数据中,衡量数据与其算术平均数的分散程度或偏离程度的手段。