VCE Data Analytics Resource Page

A resource for Heathmont College students studying VCE Applied Computing - Data Analytics Unit 3 & 4.

View My GitHub Profile

Spreadsheets and Data Cleansing

Overview

Spreadsheets Image - Created by ChatGPT4o/Dall-E

Spreadsheets are powerful tools for data analysis and management. This section focuses on using spreadsheet functions for calculations, data analysis, and data cleansing. You will learn how to manipulate data using formulas and functions, sort and filter data, and identify and fix errors in data. Additionally, you will explore techniques for statistically analyzing data to identify trends, relationships, and patterns.

Spreadsheets and Data Cleansing in the Study Design

Unit 3 Outcome 1

Key Knowledge:

Unit 4 Outcome 1

Key Knowledge:

Key Skills:

Study design key knowledge and key skills are taken verbatim from the VCE Applied Computing Study Design 2025-2028. © Victorian Curriculum and Assessment Authority. For current versions and related content visit www.vcaa.vic.edu.au

Learning Resources

Design Tools for Spreadsheets (and Databases)

In the problem solving methodology, a Solution Design is focused on answering two key questions:

To answer these questions, we use a range of design tools. For spreadsheets and databases, these tools include:

Data Dictionaries

A data dictionary describes the fields in a spreadsheet or database table. It includes the name of the field, the data type, and a description of the data. This helps users understand the data and how it should be used. Data dictionaries help clarify the meaning of data and ensure consistency in data entry. They may include information about the format of the data (eg. Date format, the structure of a phone number, etc.).

Data dictionaries help build the functional design of a spreadsheet or database.

Example Data Dictionary

Field Name Data Type Description
Student_ID Number Unique identifier for each student
First_Name Text Student’s first name
Last_Name Text Student’s last name
DOB Date Student’s date of birth, input as YYYY-MM-DD
Year_Level Text Student’s year level

This example could be used for a CSV file, Spreadsheet or Database table, showing the design of the fields for a student dataset.

Query Designs

Query designs specify the criteria for selecting data from a database or spreadsheet. They define the conditions that must be met for a record to be included in the results. Query designs are used to extract specific information from a dataset based on user requirements.

Example Query Design

These queries can be used to filter data in a spreadsheet or database to extract the required information.

Layout Diagrams

Layout diagrams are used in Data Analytics to show the structure of a spreadsheet, or possibly the user interface to a database application. Layout diagrams help visualize the data model and understand how different elements are connected. They can include information about the layout of data fields, tables, and relationships between them.

Input-Process-Output (IPO) Charts

IPO charts describe the inputs, processes, and outputs of a system. In the context of spreadsheets and databases, IPO charts can be used to show how data is entered, processed, and presented to users.

IPO charts help in understanding the flow of data and operations in a system, and they are useful for designing and documenting the functional design of the spreadsheet or database.

Example IPO Chart

Input Process Output
Recorded Temperatures from Bureau of Meteorology Calculate Average, Maximum and Minimum temperatures across each month Summary Table for each month
    Line graph showing temperature trends

This IPO chart describes the process of analyzing temperature data to create a summary table and a line graph.

Data Cleansing

Data cleansing is the process of identifying and correcting errors in a dataset. It involves removing or correcting inaccurate, incomplete, or irrelevant data to improve the quality and reliability of the data. It may include solving problems caused by an incorrect data type or Data cleansing is essential for accurate analysis and decision-making.

Techniques for Manipulating and Cleansing Data

Formulas and Functions

Formulas and functions are essential tools for manipulating data in spreadsheets. They allow you to perform calculations, apply logical operations, and transform data. Common functions include SUM, AVERAGE, IF, VLOOKUP, and CONCATENATE.

Descriptive Statistics

Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every analysis of data.

The descriptive statistics we study fall into three categories:

Each of these measures is applied to numerical data to provide a summary of the data set.

Average and Median

Average (Mean)

Median

Minimum and Maximum

Minimum

Maximum

Range and Standard Deviation

Measures of spread (such as range and standard deviation) provide information about the variability of the data set. Two data sets could have very similar averages but very different ranges or standard deviations, indicating that one data set is more spread out than the other.

Range

Standard Deviation

Count and Sum

Count and sum are measures of size. Count provides the number of values in a data set, while sum provides the total of all values in a data set. It is important to consider whether the sum or count is relevant for the particular data values being analysed. If you cannot explain what the sum or count represents, it is not useful.

Count

Sum

Pearson’s Correlation Coefficient (r)

Pearson’s correlation coefficient (r) is a measure of the strength and direction of the linear relationship between two numeric variables. It ranges from -1 to 1, where:

Pearson’s correlation coefficient is calculated using the CORREL function in Excel. For example, =CORREL(A1:A10, B1:B10) calculates the correlation between two sets of values in columns A and B.

Strength and Direction of Relationships

Strength

Direction