A Passion Avenue For Science
Introduction
Diagram in this project are categorized by different variable, but mainly they are classified by time-based and non-time based. Non-time based graphs will show dependent variable with corresponding to independent variable to show how each variable is related. Time-based graphs will show the independent & dependent variable’s relationship with measurement at different time, 9am and 3pm. However, in real life, it is very challenging because you need to have more measurement in different time for time-based data. Also, since the data keeps getting updated, which requires more organization in order to synthesize the actual prediction.
Programming commands use
KnitR is used for dynamic report generations, which enables us to update the data as we run the code.
Caret creates regression model and prediction. It is the core of weather prediction process in this code.
Gmodel is used for model fitting, which creates the best fit curve based on the data points.
Lattice is used for visualisation of data through creating trellis grpahics, which is mathematical relationship between two different variables.
GGplot2 is used for generation of graph, it focuses on the visualisation of graph rather than creation of the function of graph itself.
GridExtra enables us to draw more than one graph and table on a single page. Therefore, if we run the code, there will be more than one table or graph in a page.
Kmisc is used for creation of functions and plot generation.
ROCR is used for visualisation of data, which creates 2D graph model.
Corrplot is used for not only graphical display of correlation matrix, but also colouring labels of the graph.
For historical data of this project, we used measurement of weather from Canberra, Australia for 367 days.
Program Flow
Collect data from the historical data file set.
Remove any non-numerical value and unused columns.
Organize data numerically and run prediction Algorithms.
Plot data and Analysis.
Result
From the original data obtained, through cleansing and organising data in such a way that we can conduct machine learning, we were able to make prediction of whether or not it is going to rain in Canberra next day or not, based on prediction conducted based on historical data.
Through using two different types of graph: box plot and line graph, it shows distribution of data and density’s level respectively. From box plot, we can evaluate the level of data’s reliability.
Box Plot was certainly useful for determining the reliability of data, but was not able to synthesize density’s level, vice versa. Using those two graphs made us to cover the disadvantages that each of the graph type had.
Future Outlook
I strongly believe that the algorithm itself is applicable for the real-life situation, but only if the coding system has more data cleansing process with more precise historical data available.
In order to use this as a real time, firstly the real-time data will be needed, and also more cleansing process in order to maintain the amount of data we obtain now.
In this work, Mingyu determined to be able to obtain a more accurate weather prediction using machine learning and regression model.
Weather Prediction Using Regression Model
2018