In this project, I analyzed asteroid data, from NASA's Jet Propulsion Laboratory, to assess their suitability for mining. By comparing factors such as size, material composition, and orbital paths, I identified promising asteroid groups. Leveraging deep learning models, I imputed missing values for geometric albedo and diameter parameters. Valuable insights were gained through statistical analysis.
Asteroid data was collected from NASA's Jet Propulsion Laboratory
Nearly 99.99% of light reflectance data is missing. And about 80% of geometric albedo and diameter values were missing.
Columns with relatively few missing values (< 5%) was imputed with their
Categorical columns were imputed first. Then they were used to group rows together for imputing by median.
Two attributes, albedo and diameter were extremely important for this project.
Two Multilayer Perceptrons trained on the cleaned data was used predict and impute missing values.
Used existing attributes to construct explainable and important features. For example:
Statistical tests are sensitive to outliers. Thus before proceeding with statistical analysis, I've removed outliers from the dataset. Outliers were identified using five number summary method.
Two-sample proportion test between metallic and non-metallic asteroids.
Factor plot and 2-way ANOVA test to see if orbital periods vary significantly for different compositions.
Welch t-test to identify if metallic asteroids have a lower orbital periods.
Outliers heavily affect the result of hypothesis tests like the Welch t-test.
Questions often needed to be reframed so that it aligns well with hypothesis tests.
Depending on the knowledge of data distribution, appropriate type of tests needs to be selected.