Variable dictionaries

Table of contents

 

 

Introduction

When performing analysis, the properties of all included variables need to be known in advance. The lsa.vars.dict function produces variable dictionaries which include information on the variable names, classes (numeric, factor, or character), labels, as well as their levels (i.e. response categories in case of factor variables) or unique values (in case of numeric or character variables), and their user-defined missing values (if any). The function always prints the dictionaries on the screen in the R/RStudio console and has the option to save them into text file. We would strongly recommend using this option, saving the variable dictionaries in a file which can serve for further reference when recoding variables (covered in the next section) or make the settings an analysis. The output from the lsa.vars.dict function is concise, yet informative for the variable properties and provides sufficient details. Converted .RData files or files where different countries and/or respondent types can be used.

The variable dictionaries function and its arguments

The lsa.vars.dict function has the following arguments:

  • data.file – Full path to the .RData file containing lsa.data object. Either this or data.object shall be specified, but not both.
  • data.object – The object in the memory containing lsa.data object. Either this or data.file shall be specified, but not both.
  • var.names – Vector of variable names whose dictionaries shall be produced.
  • out.file – Optional, full path to a .txt file where the dictionaries shall be saved, if needed.
  • open.out.file – Optional, if file path is provided to out.file shall the produced file be open after the file is written?

Notes:

  1. The dictionaries for the variables in var.names will be printed as tables on the screen. For each variable the dictionaries contain the variable name, the variable class, the variable label, unique variable values (see below) and the user-defined missing values (if any).
  2. The unique values’ representation will depend on the variable class. If the variable is a factor, the factor levels will be displayed. If the variable is numeric or character, the unique values will be printed up to the sixth one.
  3. The user-defined missing values for factor variables will be as text strings. For the numeric variables these will be integers, followed by their labels in brackets.
  4. If a full file path is provided to the out.file, the same output will be written to a .txt file with a text on top which data file/object was used.

Displaying and saving variable dictionaries using the command line

In this example we will use the data file merged in the last example using the command line for all variables in the file (if we omit the var.names argument, the function will produce the dictionaries for all variables in the file). In RStudio execute the following syntax:

lsa.vars.dict(data.file = "C:/temp/merged/PIRLS_2016_ASG_ATG_AUS_SVN.RData")

The function loads the file, produces the dictionaries and prints them in the RStudio console. It is not possible to fit the entire output with the dictionaries for all 63 variables in the file, the screenshots below show the first and the last ones.

We can also limit the output to just the variables we are interested in. Lets produce the dictionaries for the country ID (IDCNTRY), student sex (ITSEX), and the first plausible value in overall reading (ASRREA01). Let’s this time use a lsa.data object in the memory. For this purpose, we will first load the “PIRLS_2016_ASG_ATG_AUS_SVN.RData” file and will use it in the function call through the data.object instead of the data.file argument. The whole code looks like this:

load("C:/temp/merged/PIRLS_2016_ASG_ATG_AUS_SVN.RData")

lsa.vars.dict(data.object = PIRLS_2016_ASG_ATG_AUS_SVN,
              var.names = c("IDCNTRY", "ITSEX", "ASRREA01"))

Note that we first load the file merged file. The file contains an object with the same name as the file name, without the “RData” file extension. In line 4 from above we use the object which is now located in the RAM, and not the file anymore. We pass the variables names for the variables we want the dictionaries for to the var.names argument. The output in the RStudio console looks like this:

For each variable a separate table presents its properties: name, class, label, levels/unique values and user-defined missing values, if any. Note the unique values for the last variable, which is the first plausible value in overall reading. This is a continuous variable which can take almost any value. The output would be rather lengthy if all available values are presented. Thus, just the first six are presented and the table lets us know how many more unique values are in this variable. Also note the difference in the user-defined missing values between the factor and numeric variables (ITSEX and ASRREA01). For factor variables one of the levels is assigned as missing value. For numeric variables, a named numeric value is assigned as a missing value.

All this works well. However, very often we would need to view the dictionaries for many variables, even all variables within a data file. It would be more convenient to store the dictionaries in a file and use it for further reference later. To do this, lets add the out.file and the open.out.file arguments to the call from above:

lsa.vars.dict(data.object = PIRLS_2016_ASG_ATG_AUS_SVN,
              var.names = c("IDCNTRY", "ITSEX", "ASRREA01"),
              out.file = "C:/temp/merged/dictionary.txt",
              open.out.file = TRUE)

The former (out.file) tells the function where to store the output file (a text file with all the dictionaries) by providing a path to it, the later (open.out.file) instructs the function to open the file after the all dictionaries for the requested variables have been produced. The text file will be opened in the default program associated with text files:

Displaying and saving variable dictionaries using the GUI

To start the RALSA user interface, execute the following command in RStudio:

ralsaGUI()

When the GUI opens in your browser, select Data preparation > Variable dictionaries from the menu on the left. When navigated to the Variable dictionaries in the GUI, click on the Choose data file button. Navigate to the folder containing the merged PIRLS_2016_ASG_ATG_AUS_SVN.RData file, select it and click the Select button.

Once the file is loaded, you will see the two panels with the available variables and selected variables (the latter is currently empty):

Use the mouse to select individual variables and the single arrow buttons to move them from the list of available variables to the list of selected variables and vice versa. Use the double arrow buttons to select all or no variables. You can use the filter boxes on the top of the panels to find the needed variables quickly. Let’s select the variables IDCNTRY (the country ID), ITSEX (the tracking variable for student sex), and ASRREA01 (the first plausible value for the overall student reading achievement). Once there are any variables in the Selected variables panel, the following elements will appear:

If you need to save the dictionaries in a file, check the Save the variable dictionaries in a file checkbox. Otherwise the dictionaries will appear in the console in the GUI only when you press the Execute syntax button. If you check the box, the interface will show another checkbox, asking you if you want the file to be automatically open in your default text editor after all operations are done. If the checkbox is ticked the interface will also display the Define output file name button. Click on it, navigate to the folder where you want the file to be saved, define the file name and click save in the file save dialog box. The final settings will look like this:

Click on the Execute syntax button. The syntax will be executed and the output will be shown in the console which will appear at the bottom of the screen:

After all operations are completed, the file with the dictionaries will open in your default text editor:

You can keep this file for futher reference when recoding variables (covered in the next section) or perform analyses.