Table of contents
Introduction
The data from large-scale assessments are provided by the organizations that conduct them as whole databases wrapped into ZIP archives. The ZIP archives from IEA studies also contain additional files with parameter estimates, programs, almanacs, codebooks, etc. and documentation (reports, encyclopedia, frameworks, user guides and questionnaires). Even if an analyst is interested in just the data files, and only in the data files from one population (e.g. in TIMSS, TALIS or ICCS 2009), he or she still has to download the entire ZIP archive. This archive can be rather big (e.g. nearly 1GB for TIMSS 2023) and can take a long time to download. After download, the analyst has to extract and process the content manually – move files, rearrange directory structure, etc. This can be a rather tedious process, prone to errors.
The lsa.download.data
function is a utility function that downloads just data files from the large-scale assessments’ ZIP-archived databases. Note that this function can download just the files for the countries and the population per assessment and its cycle without downloading the entire archive – it simply extracts just the required files without downloading the entire database. So, if the analyst needs the data files only for Kazakhstan and Lithuania from TIMSS 2023 grade 8, the function will download just the relevant files without transferring the entire database over the internet to the analyst’s local drive.
The above applies to all IEA studies, OECD TALIS and TALIS 3S.PISA and PISA for Development have different database structure, providing all countries in a single file per respondent type. As of now, it is not possible to download data for separate countries for these two studies. Instead, the lsa.download.data
function downloads all ZIP archives for the respective cycle (as of now this applies to PISA 2015, 2018 and 2022; and PISA for Development 2019) extracts the archives’ content and then removes the ZIP archives.
By default, the downloaded SPSS data files will be automatically converted to RALSA’s .RData
files using the lsa.convert.data
function. This can be changed, if the analyst prefers having just the original SPSS files.
The download data files function and its arguments
The lsa.download.data function has the following arguments:
study
– String, large-scale assessment or study name.cycle
– Numeric, study year of administration (cycle).POP
– String, population of interest. If none is provide, default is taken.ISO
– Vector containing character ISO codes of the countries’ data files to include in the merged file.out.folder
– Path to the folder where the downloaded (and optionally converted) files will be stored. If the final folder in the path does not exist, it will be created.append
– If some files for the study, cycle, populations and countries have already been downloaded, download only the new ones. (default isTRUE
).convert
– Logical, shall the data be converted tolsa.data
and stored in.RData
files (default) after being downloaded.missing.to.NA
– Logical, should the user-defined missing values be converted toNA
when converting the downloaded data (default isFALSE
)? Seelsa.convert.data
.
Notes:
- IEA studies, as well as OECD TALIS and TALIS 3S, provide their data in SPSS
.sav
format with same or very similar structure: one file per country and type of respondent (e.g. school principal, student, teacher, etc.) per population.For IEA studies and OECD TALIS and TALIS 3S use theISO
argument to specify the countries’ three-letter ISO codes whose data is to be downloaded. The three-letter ISO codes for each country can be found in the user guide for the study in scope. For example, the ISO codes of the countries participating in PIRLS 2016 can be found in its user guide on pages 52-54. To download the files from all countries for an IEA study and OECD TALIS and TALIS 3S, simply omit theISO
argument, this will download files for all countries for the population inPOP
in the study and cycle. TheISO
argument will not work for PISA files, as all data for all countries is provided within a single file per respondent type. IfISO
is provided anyway, it will be ignored. - Note that as of now, the function downloads PISA databases only from its latest cycles – 2015, 2018 and 2022.
- When all desired SPSS data files are downloaded, the function converts them to
lsa.data
objects and stores them as.RData
files on the disk, removing the data files downloaded in their original (SPSS) format. This is the default behavior which can be overridden by settingconvert = FALSE
. - The study argument defines the study for which data shall be downloaded. The acceptable strings are as follows:
CivED
– IEA Civic and Citizenship Education study (CivED)ICCS
– IEA International Civic and Citizenship Education Study (ICCS)ICILS
– IEA International Computer and Information Literacy Study (ICILS)PIRLS
– IEA Progress in International Reading Literacy Study (PIRLS)prePIRLS
– IEA PIRLS Literacy (prePIRLS)REDS
– IEA Responses to Educational Disruption Survey (REDS)RLII
– IEA Reading Literacy Study (RL), second roundSITES
– IEA Second Information Technology in Education Study (SITES)TIMSS
– IEA Trends in International Mathematics and Science Study (TIMSS)preTIMSS
– IEA TIMSS Numeracy (TIMSS)eTIMSS PSI
– IEA TIMSS with PSI items (TIMSS)TIMSS Advanced Mathematics
/TIMSS Advanced Physics
– IEA Trends in International Mathematics and Science Study in Mathematics and Physics (TIMSS Advanced)TiPi
– IEA joint TIMSS and PIRLS 2011 studyPISA
– OECD Programme for International Student Assessment (PISA)PISA D
– OECD Programme for International Student Assessment for low- and middle-income countries (PISA for Development)TALIS
– OECD Teaching and Learning International Survey (TALIS) andTALIS 3S
– OECD Starting Strong Teaching and Learning International Survey (TALIS Starting Strong Survey)
- The
cycle
argument provides information about the year of administration of a particular study for which SPSS data files can be downloaded. A numeric value for a specific year of administration needs to be provided. Here is the list from all released cycles for all studies RALSA supports till now:- For
CivED
–1999
- For
ICCS
–2009
,2016
, or2022
- For
ICILS
–2013
,2018
, or2023
- For
PIRLS
–2001
,2006
,2011
,2016
, or2021
- For
prePIRLS
–2016
- For
ePIRLS
–2016
- For
REDS
–2021
- For
RLII
–1991
or2001
- For
SITES
–1998
or2006
- For
TIMSS
–1995
,1998
,2003
,2007
,2011
,2015
,2019
or2023
- For
preTIMSS
–2015
- For
eTIMSS PSI
–2019
- For
TIMSS Advanced Mathematics
/TIMSS Advanced Physics
–1995
,2008
, or2015
- For
TiPi
–2011
- For
PISA
–2015
,2018
or2022
- For
PISA D
–2019
- For
TALIS
–2008
,2013
, or2018
- For
TALIS 3S
–2018
- For
- The data from the IEA Teacher Education and Development Study in Mathematics (TEDS-M) is not freely available from the IEA website due to data confidentiality issues and is available only on request from the IEA.
- Some studies (e.g. TIMSS and TALIS) have more than one population (i.e. students in grades 4 and 8 in TIMSS and teachers in different ISCED levels in TALIS). The
POP
argument is required for these studies, as thelsa.download.data
needs to know data from which population is needed. The population strings for thePOP
argument for the pertinent studies are listed below. For the exact meaning of the population names, see the respective study documentation. Note that ifPOP
is not provided, a default (first population for a study and/or a cycle) is applied.- CivED:
G8
(grade 8)G12
(grade 12)
- ICCS:
G8
(grade 8)G9
(grade 9, ICCS 2009 only)
- ICILS:
G8
(grade 8)
- PIRLS:
G4
(grade 4)
- prePIRLS:
G4
(grade 4)
- ePIRLS:
G4
(grade 4)
- REDS:
G8
(grade 8)
- RLII:
G4
(grade 4)
- SITES:
M1 POP A
(Module 1, 1998, population A)M1 POP B
(Module 1, 1998, population B)M1 POP C
(Module 1, 1998, population C)M2
(Module 2, 2006)
- TIMSS:
G4
(grade 4)G8
(grade 8)
- preTIMSS:
G4
(grade 4)
- eTIMSS PSI:
G4
(grade 4)G8
(grade 8)
- TIMSS Advanced Mathematics / TIMSS Advanced Physics:
G12
(grade 12)
- TiPi:
G4
(grade 4)
- PISA:
Y15
(15-year-old)
- PISA for Development
IS
(in school)OS
(out of school)
- TALIS:
I1
(ISCED 1)I2
(ISCED 2)I3
(ISCED 3)P
(PISA schools)
- TALIS 3S:
I0.2
(ISCED 0.2)IU3
(ISCED U3)
- CivED:
- The
out.folder
argument controls where the files shall be stored. Note that the files will not download the files directly in the folder path provided to the argument. Instead, it will create a folder named as the study name, cycle and population and place the downloaded files there. Note that for IEA studies, OECD TALIS and TALIS 3S, if the download folder already exists and it contains data files for a given study, cycle and population for some of the countries, the function will only append the new files in it, keeping the ones that already exist there, if theappend
argument equalsTRUE
(default). This can save a lot of time if the user needs to download just the additional files instead of download everything again. Ifappend
argument equalsTRUE
, the existing files will be overwritten. For OECD PISA and PISA for Development theappend
argument will be ignored and, if the study folder inout.folder
exists and contains any SPSS.sav
or.RData
files, the function will stop its execution and ask for moving the existing files. - It is not recommended to work further in the folder where the downloaded files reside, it is meant to be only for the downloaded (an possibly converted) files.
- In some study cycles (e.g. TIMSS 2019 and PIRLS 2021), there are the so-called “bridge studies”. These aim to test the differences between electronic and paper testing modes. When a study cycle contains data files from a bridge study, these will be downloaded too for the countries that conducted the study electronically.
Downloading study data files
The following two sections provide examples on how to download data from TIMSS 2023 using the command line and the graphical user interface.
Downloading study data files using command line
The code box below presents the syntax for downloading TIMSS 2023 grade 4 data for Australia and Slovenia (not Slovakia!) only.
lsa.download.data(study = "TIMSS", cycle = 2023, ISO = c("aus", "svn"), out.folder = "C:/temp")
Note that the POP
argument is omitted. As TIMSS 2023 has two populations – grade 4 (G4
) and grade 8 (G8
), by default the lsa.download.data
will take the first one (grade 4). Further, the append
argument takes its default value (TRUE
), so if there are already any files from other countries in TIMSS 2023 grade 4 downloaded in the folder, only the new files will be downloaded. Also, the convert argument
is omitted, so the default (TRUE
) value is used and after the files are downloaded, they will be converted to lsa.data
.RData
files. The same applies to the missing.to.NA
argument, it is omitted, so the default FALSE
value for it is used and all user-defined missing values for the variables in the data are preserved.
Note that the folder pointed in the out.folder
is C:/temp
. When downloading the data, a sub-folder with the study name, cycle and population will be created and the files will be placed there. That is, the downloaded and converted files will be in C:/temp/TIMSS_2023_G4
.
RStudio will output messages like this in the console:
The next example downloads data from TIMSS grade 8 for Australia and Cyprus where the original SPSS files will be retained instead of converting them to .RData
.
lsa.download.data(study = "TIMSS", cycle = 2023, POP = "G8", ISO = c("aus", "cyp"), convert = FALSE, out.folder = "C:/temp")
The output in the RStudio console will be similar to the one from above, but without the conversion of the downloaded files. The end of the output provides an instruction for using the conversion function later, if the user changes his or her mind
Downloading study data files using the GUI
To start the RALSA user interface, execute the following command in RStudio:
ralsaGUI()
When the GUI opens in your browser, select Data preparation > Download data from the menu on the left. When navigated to the Download data section in the GUI, the interface will show a drop-down menu Select study. Click on it and select TIMSS. A new drop-down menu, Select cycle, will be displayed next. Click on it and select 2023. A third drop-down menu, Select population, will be displayed next. Click on it and select Grade 4. The interface will now display two panels with the countries available in the study, cycle and population. The screen will look like the following:
Use the mouse to select and the single and double arrow buttons to move the countries between the lists of Available countries and Selected countries. The single arrow buttons can be used to select and move individual countries between the two panels. The double arrow buttons can be used to move all countries between the two panels, even if none of them are selected. Lets select Australia and Slovenia (again, Slovenia, not Slovakia!). A Choose destination folder button will appear underneath. Click on the button and navigate to the folder where the datasets shall be saved. The path where the files will be saved appears next to the button. Underneath the button and the path the Append only new files in the folder, Convert the files after downloading and Convert user-defined missings to NA checkboxes will appear with default selections. These can be changed depending on what is needed for the downloaded data. Under these checkboxes, the syntax and the button to execute it will appear at the bottom of the screen:
Press the Execute syntax button. A pop-up message will notify you that the file conversion started. A continuously updated console will appear at the bottom of the screen, showing the ongoing operations (partial output):
Scroll down if you don’t see it. When all operations are finished, a pop-up message will be shown on the screen notifying you about it.