A resource for Heathmont College students studying VCE Applied Computing - Data Analytics Unit 3 & 4.
Spreadsheets are powerful tools for data analysis and management. This section focuses on using spreadsheet functions for calculations, data analysis, and data cleansing. You will learn how to manipulate data using formulas and functions, sort and filter data, and identify and fix errors in data. Additionally, you will explore techniques for statistically analyzing data to identify trends, relationships, and patterns.
Key Knowledge:
Key Skills:
Key Knowledge:
Key Skills:
Study design key knowledge and key skills are taken verbatim from the VCE Applied Computing Study Design 2025-2028. © Victorian Curriculum and Assessment Authority. For current versions and related content visit www.vcaa.vic.edu.au
In the problem solving methodology, a Solution Design is focused on answering two key questions:
To answer these questions, we use a range of design tools. For spreadsheets and databases, these tools include:
A data dictionary describes the fields in a spreadsheet or database table. It includes the name of the field, the data type, and a description of the data. This helps users understand the data and how it should be used. Data dictionaries help clarify the meaning of data and ensure consistency in data entry. They may include information about the format of the data (eg. Date format, the structure of a phone number, etc.).
Data dictionaries help build the functional design of a spreadsheet or database.
Field Name | Data Type | Description |
---|---|---|
Student_ID | Number | Unique identifier for each student |
First_Name | Text | Student’s first name |
Last_Name | Text | Student’s last name |
DOB | Date | Student’s date of birth, input as YYYY-MM-DD |
Year_Level | Text | Student’s year level |
This example could be used for a CSV file, Spreadsheet or Database table, showing the design of the fields for a student dataset.
Query designs specify the criteria for selecting data from a database or spreadsheet. They define the conditions that must be met for a record to be included in the results. Query designs are used to extract specific information from a dataset based on user requirements.
These queries can be used to filter data in a spreadsheet or database to extract the required information.
Layout diagrams are used in Data Analytics to show the structure of a spreadsheet, or possibly the user interface to a database application. Layout diagrams help visualize the data model and understand how different elements are connected. They can include information about the layout of data fields, tables, and relationships between them.
IPO charts describe the inputs, processes, and outputs of a system. In the context of spreadsheets and databases, IPO charts can be used to show how data is entered, processed, and presented to users.
IPO charts help in understanding the flow of data and operations in a system, and they are useful for designing and documenting the functional design of the spreadsheet or database.
Input | Process | Output |
---|---|---|
Recorded Temperatures from Bureau of Meteorology | Calculate Average, Maximum and Minimum temperatures across each month | Summary Table for each month |
Line graph showing temperature trends |
This IPO chart describes the process of analyzing temperature data to create a summary table and a line graph.
Data cleansing is the process of identifying and correcting errors in a dataset. It involves removing or correcting inaccurate, incomplete, or irrelevant data to improve the quality and reliability of the data. It may include solving problems caused by an incorrect data type or Data cleansing is essential for accurate analysis and decision-making.
Formulas and functions are essential tools for manipulating data in spreadsheets. They allow you to perform calculations, apply logical operations, and transform data. Common functions include SUM, AVERAGE, IF, VLOOKUP, and CONCATENATE.
Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every analysis of data.
The descriptive statistics we study fall into three categories:
Each of these measures is applied to numerical data to provide a summary of the data set.
AVERAGE
function in Excel eg. =AVERAGE(A1:A10)
MEDIAN
function in Excel eg. =MEDIAN(A1:A10)
MIN
function in Excel eg. =MIN(A1:A10)
MAX
function in Excel eg. =MAX(A1:A10)
Measures of spread (such as range and standard deviation) provide information about the variability of the data set. Two data sets could have very similar averages but very different ranges or standard deviations, indicating that one data set is more spread out than the other.
MAX
and MIN
functions in Excel eg. =MAX(A1:A10) - MIN(A1:A10)
(Not able to be calculated on its own)STDEV
function in Excel eg. =STDEV(A1:A10)
Count and sum are measures of size. Count provides the number of values in a data set, while sum provides the total of all values in a data set. It is important to consider whether the sum or count is relevant for the particular data values being analysed. If you cannot explain what the sum or count represents, it is not useful.
COUNT
function in Excel eg. =COUNT(A1:A10)
. This counts the number of cells in a range that contain numbers (ignores empty cells).SUM
function in Excel eg. =SUM(A1:A10)
Pearson’s correlation coefficient (r) is a measure of the strength and direction of the linear relationship between two numeric variables. It ranges from -1 to 1, where:
Pearson’s correlation coefficient is calculated using the CORREL
function in Excel. For example, =CORREL(A1:A10, B1:B10)
calculates the correlation between two sets of values in columns A and B.