nathanielcekay.com - Portfolio Continued

CO2 Emissions Analysis -Analyzes global CO2 emissions trends using SQL

For this analysis, I employed SQL to examine global CO2 levels and temperatures from 1850 to 2022. The project shows trends such as the years with the highest temperatures and emissions by region. The analysis combined two datasets and created visualizations using PostgreSQL.

I joined the datasets with the CO2 emissions and the temperatures to make comparisons.

Here are the ten years showcasing the highest average temperatures and greatest average CO2 emissions between 1850 and 2022.

I identified the 10 entities with the greatest emission levels. Over half of the world's CO2 emissions came from high-income countries. There was a very strong correlation between year and global average temperature, and global temperatures appear to be increasing.

Next, I generated visual representations depicting the trends of the ten years with the most elevated average CO2 emissions, spanning the periods from 1900 to 1999 and from 2000 to 2022.

Upon examining the global average temperature alongside the average CO2 emissions per year, intriguing patterns emerge. There appears to be a discernible correlation between global temperature and CO2 emissions. The findings from our SQL analysis of CO2 emissions suggest a relationship between global temperatures and CO2 emissions. While acknowledging the presence of other contributing factors, the data strongly suggests a significant linkage between these two variables, and that both global temperatures and CO2 emissions are steadily increasing over time. This project looks at CO2 emissions using SQL, which could help track trends and see how different industries or areas contribute to emissions. It’s great for helping policymakers, researchers, or businesses seeking ways to reduce emissions and be more sustainable.

View the Project: CO2 Emissions Analysis using SQL

United States Census Analysis -Examines population trends using SQL

For this project, I analyzed the 2000 and 2010 United States population censuses. The data comes from the United States Census BigQuery dataset. The data analysis was performed using SQL.

To begin, I cleansed the data. The data was checked for missing values, formatting issues, and duplicate values.

I then looked for correlations in the data, found the mean population, and found the zip codes with the maximum and minimum populations.

I completed the analysis by viewing the top 90th percentile zip codes with the largest populations. The mean population size within the analyzed zip codes was 9,474 individuals. The zip code with the largest population was 10456, Bronx, New York, with 86547 people in 2010. The 5 zip codes with the largest populations were 10456 in 2010, 60620 in 2010, 60804 in 2010, 60505 in 2010, and 60640 in 2000.

This project uses SQL to analyze census data, revealing trends in population, demographics, and what communities need. It could help with things like planning schools, building infrastructure, or figuring out ways to improve public services.

View the Project: Analysis of United States Census Data with SQL

Analysis of Health and Lifestyle -Health predictions with 98% accuracy

In this exploratory data analysis, I looked at the relationships of various health and lifestyle indicators. The examination analyzed stress levels, sleeping patterns, physical activity, occupation, and other important lifestyle metrics. The analysis was completed using RStudio.

The dataset had 185 female and 189 male participants. There were noticeable differences in occupational preference between genders within the dataset. These observations were derived from a synthetic dataset, meaning the data was generated artificially.

Sleep duration had a very strong positive correlation with quality of sleep. Sleep duration and quality of sleep had a strong negative correlation with stress levels, indicating that individuals who slept more enjoyed lower stress levels.

People working as doctors and nurses reported the highest stress levels, and men exhibited greater stress levels than women. I concluded the analysis by predicting stress levels based on the other variables utilizing machine learning.

The model performed well on both the training and the test sets, with an average R-squared score of 0.97. The high R-squared values for both training and test sets indicate the model captured most of the variability in the target variable, stress levels. There is no evidence of severe overfitting or underfitting based on these scores. The model proved efficacy in predicting stress levels based on the other health and lifestyle metrics included in the dataset. The average R-squared score tells us that on average, the model explains 97.8% of the variance in the target variable. This project analyzes health and lifestyle data to spot patterns and trends. It could help people see how things like exercise or eating habits affect their health. Researchers, doctors, or wellness programs could use this type of project to encourage healthy living or create better care plans.

View the Project: Analysis of Health and Lifestyle Metrics

Home

Dashboards

Data Engineering

Portfolio

Web Apps

Page updated

Google Sites

Report abuse