Content-type: text/html ~ Stephen's Web ~ Three Frameworks for Data Literacy

Stephen Downes

Knowledge, Learning, Community

Oct 04, 2023

Three Frameworks For Data Literacy

Stephen Downes

National Research Council Canada

Ottawa, Ontario, Canada


Data literacy is the ability to collect, manage, evaluate, and apply data, in a critical manner. It is a relatively new field of study, dating only from the 2010s. It includes the skills necessary to discover and access data, manipulate data, evaluate data quality, conduct analysis using data, interpret results of analyses, and understand the ethics of using data. This paper considers data literacy education across three frameworks: the competency model defining data literacy, the assessment of data literacy competencies, and methods for the development of data literacy in an organization. These principles are applied to a discussion of the development of an open online course supporting the development of data literacy in the form of a corresponding data structure encompassing the three frameworks identified in the literature.


Data Literacy, Model, Framework, Learning


What is the difference between 'learning' a discipline or domain when thought of as data, and supported by the principles of data literacy, as compared to 'learning' through of as analogous to reading, and supported by traditional literacy? In this paper we address this question through analysis of the concept of data literacy, an examination of how data literacy is currently assessed, and research and development in the teaching of data literacy for individuals and organizations.

It is evident that 'learning' a 'literacy' involves more than learning about the components of that literacy, and that there is an element of 'being literate', which is intended as an outcome of that learning. To be literate is to embody a set of skills and competencies typically thought to define that literacy, as reflected in an assessment of that literacy, and which in turn informs the teaching of that literacy.

But the study of data literacy is nascent, limited to a few (mostly commercial) initiatives, and not benefiting from a wide-reaching analysis considering all aspects of the definition, development and application of data. This paper seeks to fill that gap, providing a comprehensive overview of data literacy as it is taught and learned today, and suggesting a set of frameworks that will inform future research and development of data literacy learning initiatives.


1.1  Methodology

Originating as work conducted for the office of the Assistant Deputy Minister (Data, Innovation and Analytics) in the government of Canada, this paper is a summarization of a comprehensive literature review and design research project. A formal review was conducted by the National Research Council information management office of Canada's National Science Library for publications related to the definition, application and development of data literacy. A wider search using the same parameters was undertaken using Google Scholar. Approximately 150 results were obtained, from which 20 items were found to contain an identifiable data literacy model, and three major assessment frameworks were identified. A small number of highly specific data literacy development models were also identified. The design framework employed draw from previous work by the author on connectivist massive open online courses (cMOOC) with the specific intent of adapting the data literacy models table into the connectivist course framework.

2.      Three Frameworks

2.1 Competency Model or Framework

Data literacy includes the skills necessary to discover and access data, manipulate data, evaluate data quality, conduct analysis using data, interpret results of analyses, and understand the ethics of using data, where by data we mean the representation of facts in media. These are core skills required to support key competencies in intelligence and trend analysis, mission-driven metric reporting, health and human response to stress and injury, training and development functions, deployment, supply management and logistics, and information warfare, to name a few. The following major themes emerge from the discussion of data literacy over the last decade: data literacy as a set of skills or competencies; the idea of deriving meaningful information from data; the data lifecycle or data workflow; complexity of skills for differing roles; data literacy as individual and corporate capacities.

2.1.1 Competencies

Competencies are commonly defined as "a set of basic knowledge, skills, abilities, and other characteristics that enable people at work to efficiently and successfully accomplish their job tasks." Following Oberländer, et al. (2020) we use the term 'competencies' here to draw on a well-established concept that includes knowledge, skills, abilities, and other characteristics (KSAO).

The concept of competencies also includes the requirement of evidence for competencies. Thus, employing a definition using competencies is well suited to a discussion of data literacy that includes the fostering and assessment of knowledge, skills, and abilities.

2.1.2 Analysis

We drew on 20 studies that offered a (more or less) competency-based definition of data literacy and compared the set of competencies each proposed. The selection of sources was intended to draw from and be representative of various data literacy models. In assigning the competencies interpretation was required, as the studies did not all employ the same terminology. Figure 1 displays the result of the analysis:

Figure 1

2.1.3 Models

The list of competencies identified also makes it clear that data literacy does not fall into any single category described above. It contains elements of critical thinking, statistical reasoning, data management, and scientific research. Data literacy therefore represents a certain level of competency across a broad range of data-related skills, not a narrowly defined subset of some other type of literacy. Most work in data literacy falls into one of several models or interpretations. "They each have a different focus which tends to reflect the context in which it was derived. They also have a different level of granularity, not just between the definitions, but also within them" (Wolff, et al., 2016). Schield (2004) describes these as 'perspectives', for example, the 'critical thinking' perspective and the 'social science data' perspective:

•           Data Stewardship Model: This model describes approaches to data literacy that emphasize data acquisition, curation, quality and deployment. A prototypical example of this approach is the Statistics Canada descriptions of data quality and the data journey (Statistics Canada, 2020).

•           Analysis and Decision-Making Model: This model is focused mostly on the use of data to support analytics and decision-making, for example, the collection of approaches taken by members of the Data Literacy Project, including Qlik (a data analytics company), Accenture, Cognizant, Experian, Pluralsight, the Chartered Institute of Marketing, and Data to the People.

•           Information Literacy Model: "According to Hunt (2004), data literacy education should borrow heavily from information literacy education, even if the domain of data literacy is more fragmented than the field of information literacy." (Koltay, 2016). Similarly, Maybee & Zilinski (2016) write, "The emerging construct of data literacy has typically been closely related to information literacy."

•           Science and Research Data Literacy Model: This model of data literacy emphasizes aspects of data related to computer science, mathematics and statistics. It defines a set of data skills including data awareness, forms of statistical representation, the ability to analyze, interpret and evaluate statistical information, and communication of statistical information (Australian Bureau of Statistics, 2010).

•           Social Engagement Model: This model distinguishes between the need for everyday uses of data from the deeper requirements of data science. It is only really articulated in a single source (Rahul Bhargava, et. al., 2015), though it has its origins in a broader definition of literacy, as exemplified by Robinson (2005), who talks of literacy as enabling individuals to achieve their goals, to develop their knowledge and potential, and to participate fully in their community and wider society" (p. 13).

As discussed below, no single model accounts for all aspects of data literacy applicable in a specific content or role, hence, rather than describe a metric for model selection, a comprehensive model based on specific skills and competencies defining a job, task or role is proposed.

2.1.4 Application

In our analysis we looked more closely at the nature of artificial intelligence and machine learning, two disciplines largely defined by their relation to data, to understand what might be understood as the full 'data workflow'. This section makes it clear that data literacy involves much more than 'reading' and 'writing' with data and includes but not limited to the framing of the problem or context of use, the data set itself, application, and testing.

For example, machine learning engineering describes the construction and use of these three elements: data engineering, which describes the acquisition, exploration, cleaning, labeling and management of data; model engineering, which consists of the development or training of the model, testing and evaluation, and packaging for use in an application; and deployment, which describes how the model is served and used, performance evaluation, and performance logging (Visengeriyeva, et al., 2022). Similarly, Statistical research methods workflows emphasize "the importance of asking questions throughout the statistical problem-solving process (formulating a statistical investigative question, collecting or considering data, analyzing data, and interpreting results), and how this process remains at the forefront of statistical reasoning for all studies involving data." (Bargagliotti, et al., 2020).

Additionally, this study finds that data literacy is a concept that can be applied equally to both individuals and organizations, though both the description of data literacy as well as the assessment of data literacy will vary in the given context. Framing elements of data literacy as competencies, and employing a widely used model describing knowledge, skills and attitude, an overall framework for describing individual data learning competencies and organizational data literacy capabilities is proposed.

2.2 Evaluation or Assessment Framework

It is important to be able to evaluate or assess the level of data literacy competencies individually or across the organization for the purpose of assessing operational readiness and for the purpose of planning future training and development. Here we first provide an overview of some data literacy assessment programs, then consider some data literacy assessment models, and finally consider some data literacy methods.

2.2.1 Assessment Programs

We analyzed major skills and data literacy assessment programs, including the following:

•           OECD  Programme for the International Assessment of Adult Competencies (PIAAC) literacy assessment asks participants "access and identify tasks require respondents to locate information in a text, integrate and interpret tasks involve relating parts of one or more texts to each other, and evaluate and reflect tasks require the respondent to draw on knowledge, ideas or values" (Kirsch & Thorn, 2016,

•           Endorsed by the American Statistical Association, the Guidelines for Assessment and Instruction in Statistics Education (GAISE) emphasize that there is no one route to teaching and assessing statistical literacy and notes that "mastering specific techniques is not as important as understanding the statistical concepts and principles that underlie such techniques" (GAISE, 2016, 8).

•           By contrast with the OECD and GAISE programs, the Eckerson Group describes data literacy assessment specifically and includes assessment not only of individual data literacy but also of the organization (Wells, 2021). Assessments are based initially on a comprehensive Data Literacy Body of Knowledge (DLBOK) defined by the organization.

2.2.2 Data Literacy Model-Based Assessment

In the analysis of data literacy competencies described in the first section of this report we obtained an unstructured list of competencies. These competencies were organized into different categories by various studies, but there was no consistency whatsoever in the categorization scheme from study to study. What is offered here is a model based on a slightly modified full list of competencies drawn from the data literacy studies cross-referenced with a comprehensive skills taxonomy as suggested by the assessment programs considered above.

For the sake of consistency with much of the work done previously a slightly modified version of Bloom's taxonomy is used (Bloom, 1956). Bloom's three separate taxonomies - cognitive, affective and
psychomotor - can be thought of corresponding with the already-described taxonomy of knowledge, attitudes and skills, respectively. This taxonomy needs to be extended to accommodate both individual and organizational competencies.

Table 1.

2.2.3 Role-Defined Data Literacy

It is arguable that a single-factor measure of data literacy is insufficient to account for the variability in both the set of data literacy competencies and also the varying degree to which each competency is required in different job functions or roles. Accordingly, a role-defined data literacy model is proposed here.

This figure illustrates the calculation of a role-defined data literacy profile. It consists of a combination of the set of competencies as defined in the data literacy model with the actual job or function description. This allows for a definition of the relative importance of each competency for that function, demonstrated here in the form of a radar chart (also known as a spider chart).


Figure 2.

Job or function descriptions may be obtained from extant text (the example in the diagram is from the Careers page) or drafted as text by managers and those occupying the position. The competency profile may be created by a simply counting of the frequency of relevant terms, or by a more nuanced analysis, perhaps using machine learning.

The same process may be used to create actual competency profiles for each individual evaluated, by employing test results or actual communications generated by the person in question (such a process would be subject to ethical and privacy considerations). A similar process may be used to generate organizational level competency profiles. 

It is arguable that a single-factor measure of data literacy 'levels' as employed by numerous data literacy assessment schemes is insufficient to account for the variability in both the set of data literacy competencies and also the varying degree to which each competency is required in different job functions or roles. Accordingly, a role-defined data literacy model is proposed. This model illustrates the calculation of a role-defined data literacy profile, as well as the process used to create actual competency profiles.

2.3 Teaching Framework

There are few data literacy training initiatives extant, and no organization or institution-wide examples were found. So, in the context of data literacy development two areas of consideration are important: models and designs for data literacy program development in general, and examples of extant data literacy training programs and curricula.

2.3.1 Developing Data Literacy

The development of data literacy in an organization occupies a space between two extremes. On the one hand, we may find data literacy among other types of information and communication competencies, such as digital literacy or information management programs. On the other hand, we might think of data literacy as a first step in the development of higher-level competencies such as data architect or information management. Either approach envisions a large-scale and complex learning initiative.

But it need be neither, provided we think of data literacy not such as knowledge or content to be used, but rather, as a part of other processes and strategies employed to achieve real objectives or outcomes. This accords with the recommendations found in the literature, for example, to focus on performance rather than content knowledge and to ensure it encompasses real operational challenges using authentic data and examples.

The development of data literacy in the context of this report is tantamount to the development of individual and organizational data literacy, which consist of knowledge, skills and attitudes, or their analogues, in each of the data literacy competencies, defined as described in the first section, such that the achievement of these competencies can be reliably and validly assessed and detected using the assessment methodologies described in the second section.

2.3.2 Data Literacy Programs

There is not yet an established infrastructure for data literacy development; we mostly find commercial training courses and online resources. So, in the context of data literacy development, two area of consideration are important: models and designs for data literacy program development in general, and examples of data literacy training programs and curricula.

Models and designs for data literacy program development: some universities have conducted background research and there are numerous data literacy program development roadmaps provided by commercial consultants. For example:

•           The Data Information Literacy project funded by the Institute of Museum and Library Services (IMLS) which proposes a four-step methodology of planning, development, implementation, and assessment' (Carlson & Johnston, 2015).

•           QuantHub provides a methodology for developing individual and team data literacy learning and development plans. There are two major components: a series of 'foundational steps' to develop a data literacy vision and roadmap; and an iterative process of assessment, planning, learning and practice (Cowell, 2020).

•           Dave Wells of Eckerson Group offers a comprehensive data literacy program development methodology (2021) arguing that organizational data literacy is not merely a sum of individual data literacies but requires in addition factors such as tools and systems, incentives and motivators.

•           Gartner, by contrast, offers a report describing a three-phase methodology for the development of an institutional program (Panetta, 2021) consisting of assessment, data literacy training, and then evaluation of the outcome.

Data literacy training programs and curricula: After a brief surge in the mid 2010s, data literacy is enjoying a resurgence in 2023.

•           While no longer extant, the Data Literacy Project, founded in 2015 at Dalhousie University, proposed "a transdisciplinary examination of existing strategies and best practices for teaching data literacy, synthesizing documented explicit knowledge using a narrative-synthesis methodology and identifying areas where additional research is needed." (, Internet Archive, 2021).

•           Conducted online between January and March 2022, the EDUCAUSE Data Literacy Institute consisted of a series of eight synchronous online meeting to discuss resources, activities, and projects in support of seven key data literacy competency areas (Kleitz & Shelly, 2022).

2.3.3 Teaching and Learning Methods

Data literacy is new enough that specific pedagogies have not been broadly developed or applied. However, in many ways, data literacy training is similar to that in other disciplines, and especially those characterized as 'literacies'. Thus, recommendations for, say, digital, information or statistical literacy instruction may apply more broadly to data literacy in general. Some specific trials of different methods applied to the teaching of data literacy have been undertaken. Following is not a comprehensive listing of all methods but serves to illustrate how to apply the principles described just above in specific teaching contexts.

•           Datastorming: This is a way to think about using how to create designs using data using non-digital media. "To overcome their unfamiliarity to data, we aimed to craft abstract data into hands-on design materials in the form of cards." (Lim, et al., 2021)

•           Simulations and Interactive Technologies : Biehler, et al. (2016) describe pre-service teachers' reasoning about modeling a family factory with TinkerPlots, "a data visualization and modeling tool developed for use by middle school through university students."

•           Case-Based Teaching Method: Case¬â€based teaching is "an active learning strategy in which students read and discuss complex, real‐life scenarios that call on their analytical thinking skills and
decision‐making." (Riddle, et al., 2017).

•           Utilising affordances in real-world data: Based on the Teaching for Statistical Literacy Hierarchy, this method analyzes statistical literacy lessons that use real-world data from the perspective of the affordances in the data presentation (Chick & Pierce, 2012).

•           Data-Driven Decision-Making: According to Abbott, et al. (2015), this team-based approach combines a number of competency requirements in a single activity: expertise in data collection, management in a variable environment, allocation of space and time for the process, and the need to ensure process fidelity. This specific activity helps teachers design child literacy instruction, but the approach can be generalized to other data-driven decision-making activities.

2.2.4 Data Literacy MOOC

To a significant degree, discussions of data literacy focus on individual competencies and skills. Nowhere is this more evident than in the development of data literacy learning resources and environments, as just discussed, though with some notable exceptions this trend may be identified throughout.

As an experiment in conceptual design based on the findings of this study we developed a 'Data Literacy Massive Open Online Course (MOOC)', which may be found at [website redacted for peer review]. The course follows the structure described here, addressing each of the three frameworks in turn. In turn, associated concepts and resources identified in the study comprised separate contents for each of the three frameworks.

The model of a connectivist MOOC was employed because, unlike traditional courses, which are structured in a linear or book-like fashion, consisting of sequential modules and lessons, a connectivist MOOC is structured as a graph of connected people, resources, and concepts, in other words, much more like a collection of data.

Technically, a data-based MOOC (dMOOC) organizes content and resources in a structure suggested by the literature being studied in the course. Figure 3 is a sample of the structure used in a similar dMOOC on ethics and analytics (

Figure 3

Student activities in a dMOOC consist less of learning and remembering content and more of working with relevant data, and specifically:

·        Classifying and labeling major sets and subsets of data

·        Identifying and labeling specific instances of data subjects (for example: an article describing 'care' as a legal concept)

·        Identifying and labeling relations between sets and subsets of data, either view argument threds in extant literature, or through data analytics of relevant bodies of literature

·        Assessing the resulting data model, identifying significant threads, and interpreting the resulting model

In the ethics MOOC diagrammed in Figure 3 this activity was undertaken by a single individual, while in the corresponding data literacy MOOC this activity was undertaken collectively by the course participants.

Ideally, participation in a cMOOC does not involve individual study and retention of a pre-defined body of knowledge. Rather, it requires working with others in order to develop not only individual capacities and skills, but also social or community capacities and skills. These typically resist definition prior to the course, as the consequence of such social interaction and application of a skill or practice is often the development of knew knowledge, approaches, and competencies.


Above we asked what is the difference is between 'learning' a discipline or domain when thought of as data, and supported by the principles of data literacy, as compared to 'learning' through of as analogous to reading, and supported by traditional literacy?

3.2 What We Have Learned

What we have learned is that there is no single or simple definition of data literacy. What we think of as 'data literacy' is characterized by a set of widely divergent competencies, and the importance of one or another set of competencies varies according to the task or role in which data literacy is required. This is reflected not only in the many definitions of data literacy that we found, but also in the models of assessment offered by (mostly) commercial providers. Not only is literacy an embodiment of the of skills and competencies typically thought to define that literacy, 'data literacy' is something that can characterize both an individual and an organization.

But it is not yet taught that way. While the practices and pedagogies of data literacy being researched today address the question of use and immersion in a data-rich environment, they are addressed toward individual learning, and not the development of data literacy as an organizational or social skill. To this end we recommend developing and piloting non-hierarchal cooperative learning environments, such as the cMOOC, for the development of organizational and social competencies required for data literacy.

That said, these are assertions that need to be empirically tested before being widely adopted and applied. This paper offers the conceptual framework within which such assertions may be tested, but does not itself constitute a test of them, beyond the very limited application of the model in the development of the data literacy MOOC. And even so, much wider participation in such a MOOC would be required before any definitive assertions could be made.

3.2 Implications and Limitations

Models of data literacy found in specific domains, and especially scientific domains, do not encompass the full spectrum of data literacy skills and competencies. Hence, the teaching of data literacy should not be based on models originating from a specific discipline, but should rather be designed based on an analysis of the role or skills being taught, with a wide consideration of the corresponding data literacy competencies found across a range of models.

The small range of materials describing models and methods for teaching subjects related to data literacy tend to favour hands-on active learning, however, these were applied only in very narrow contexts. Accordingly, a course-wide model for developing data literacy was developed and proposed, whereby the course contents themselves are organized as a database, such that students participate by constructing and interpreting the data model.

While the is reasonable confidence that the list of data literacy model is comprehensive, it is possible that additional models of data literacy may be extant, and these may include competences not identified in the current study. Thus this work should be seen as a first draft of a wider survey by the data literacy survey as a whole. Though role-defined data literacy has antecedents in the literature, it should be clear that there is scope for alternative multi-model approaches to data literacy. Finally, while the application of the data literacy frameworks identified in in this paper allowed for the development of an instructional model, this model has not been adequately tested, and should be applied in pilot form before being adopted.



Abbott, Mary, et al. 2015. "A Team Approach to Data-Driven Decision-Making Literacy Instruction in Preschool Classrooms: Child Assessment and Intervention Through Classroom Team Self-Reflection." Young Exceptional Children, vol. 20, no. 3, Sept. 2015, pp. 117–32,

Bargagliotti, Anna, et al. 2020. Pre-K-12 Guidelines for Assessment and Instruction in Statistics Education II (GAISE II): A Framework for Statistics and Data Science Education. American Statistical Association, 2020,

Bhargava, Rahul, et al. 2015. "Beyond Data Literacy: Reinventing Community Engagement and Empowerment in the Age of Data." Data-Pop Alliance, Data-Pop Alliance, Nov. 2015,

Biehler, Rolf, et al. 2016. "Elementary Preservice Teachers' Reasoning about Modeling a 'Family Factory' with Tinkerplots - A Pilot Study." Statistics Education Research Journal, vol. 16, no. 2, Nov. 2017, pp. 244–86,

Bloom, B. S. 1956. "Taxonomy of Educational Objectives", Handbook: The Cognitive Domain. David McKay.

Canada School of Public Service (CSPS). 2022. How Data Literate Are You? Government of Canada, Apr. 2022,

Carlson, Jake, and Lisa R. Johnston. 2015. "Data Information Literacy: Librarians, Data, and the Education of a New Generation of Researchers." Purdue University Press, 2015,

Chick, Helen L., and Robyn Pierce. 2012. "Teaching for Statistical Literacu Utilising Affordances in Real-World Data." International Journal of Science and Mathematics Education, vol. 10, no. 2, Apr. 2012, pp. 339–62,


Cowell, Matt. 2020. "A Roadmap for Creating a Data Literacy Program - QuantHub." QuantHub, QuantHub, 18 June 2020,

DataLiteracy. 2021. "About This Project - DataLiteracy.Ca." DataLiteracy.Ca, Internet Archive, 22 Dec. 2021,

GAISE College Report ASA Revision Committee. 2016. Guidelines for Assessment and Instruction in Statistics Education (GAISE). American Statistical Association.

Kirsch, Irwin, and William Thorn. 2016. Technical Report of the Survey of Adult Skills (PIAAC) (2nd Edition). Organisation for Economic Co-operation and Development (OECD), 2016,

Kleitz, Lauren, and Joe Shelley. 2022. EDUCAUSE Data Literacy Institute. EDUCAUSE, 24 Jan. 2022,

Koltay, Tibor. 2016. "Data Governance, Data Literacy and the Management of Data Quality." IFLA Journal, vol. 42,
no. 4, Nov. 2016, pp. 303–12,

Maybee, Clarence, and Lisa Zilinski. 2016.  "Data Informed Learning: A next Phase Data Literacy Framework for Higher Education." Proceedings of the Association for Information Science and Technology, vol. 52, no. 1, Jan. 2015,
pp. 1–4,

Oberländer, Maren, et al. 2020. "Digital Competencies: A Review of the Literature and Applications in the Workplace." Computers & Education, vol. 146, Mar. 2020, p. 103752,

Panetta, Kasey. 2021. "A Data Literacy Guide For D&A Leaders." Gartner, Gartner, 26 Aug. 2021,

Riddle, Derek R., et al. 2017. "Making a Case for Case-Based Teaching in Data Literacy." Kappa Delta Pi Record, vol. 53, no. 3, July 2017, pp. 131–33,

Ridsdale, Chantel, et al. 2015. Strategies and Best Practices for Data Literacy Education. Dalhousie University, 2015,

Robinson, Clinton. 2005.  Aspects of Literacy Assessment: Topics and Issues from the UNESCO Expert Meeting. 2005,

Schield, Milo. 2004. "Information Literacy, Statistical Literacy and Data Literacy." IASSIST Quarterly, 2004,

St. Mary's. 2018. St. Mary's Practice Test 2017/2018. 2018,

Statistics Canada. 2020. The Daily - Begin Your Data Journey with Data Literacy Training Videos. Government of Canada, 23 Sept. 2020,

Visengeriyeva, Larysa, et al. 2022. An Overview of the End-to-End Machine Learning Workflow., Dec. 2021,

Wells, Dave. 2021. Building a Data Literacy Program. 4 Jan. 2021,

Wolff, Annika, et al. 2016. "Creating an Understanding of Data Literacy for a Data-Driven Society." Journal of Community Informatics, vol. 12, no. 3, Aug. 2016,

Yi Min Lim, Delia, et al. 2021. "Datastorming: Crafting Data into Design Materials for Design Students' Creative Data Literacy." C&C '21: Creativity and Cognition, Association for Computing Machinery, 2021, pp. 1–9,



- Three Frameworks For Data Literacy - Google Documenten, Oct 04, 2023

Stephen Downes Stephen Downes, Casselman, Canada

Copyright 2024
Last Updated: May 18, 2024 6:50 p.m.

Canadian Flag Creative Commons License.