Report - Data Analysis Project

Author

Aylla von Ermland

Authors

Author affiliations

  1. Department of Cellular Biology, University of Georgia, Athens, GA, USA.
  2. Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA, USA.

\(*\) These authors contributed equally to this work.

\(\land\) Corresponding author: ayllae@uga.edu

0.1 General Background Information

This analysis was performed as part of the READy Workflow Assesment.

1 Methods

We first updated the code to load a new dataset, exampledata2.xlsx, which includes additional variables beyond Height, Weight, and Gender. These new variables include Body_fat_percentage (numeric) and Exercise_type (categorical), where Exercise Type denotes an individual’s primary exercise routine such as “Cardio”, “Strength”, or “Mixed”.

The data were cleaned and processed using the script “processingfile-v1.qmd”. The cleaned data were saved as processeddata2.rds.

Subsequently, we generated exploratory plots directed by Anissa Del Valle. A boxplot was created showing Exercise_type on the x-axis and Height on the y-axis. Additionally, a scatterplot was produced with Weight on the x-axis and Body_fat_percentage on the y-axis to examine potential relationships.

For statistical modeling, Aylla Ermland performed the analysis using the statistical-analysis.R script located in /code/analysis-code/. We edited the script to fit a third linear model where Height is the outcome and both Exercise_type and Body_fat_percentage are predictors. The results of this model were saved to resulttable3.rds in the appropriate results folder.

2 Results

Boxplot showing Exercise_type on the x-axis and Height on the y-axis:

Figure 1: Boxplot showing the distribution of height across different exercise types.

Scatterplot was produced with Weight on the x-axis and Body_fat_percentage:

Figure 2: Scatterplot illustrating the relationship between weight and body fat percentage.

The third linear model assessed the relationship between Height (dependent variable), Body Fat Percentage, and Exercise Type. The results of this model are summarized in Table 3:

table_file3 = here("results", "tables", "resulttable3.rds")
resulttable3 <- readRDS(table_file3)
print(resulttable3)
                   term    estimate std.error  statistic    p.value
1           (Intercept) 169.0751880 54.596026  3.0968405 0.02694952
2   Body_fat_percentage  -0.4270677  2.530697 -0.1687550 0.87260373
3     Exercise_typenone  11.9278195 25.704536  0.4640356 0.66212433
4 Exercise_typestrength  10.7578947 21.040577  0.5112928 0.63091666

This table shows the estimated coefficients, standard errors, and significance levels for each predictor. The results indicate how body composition and type of physical activity are associated with height, after adjusting for their respective effects.

3 Discussion and conclusion:

The boxplot of Height by Exercise_Type suggests that individuals engaging in “strength-only” exercises tend to be taller, whereas those participating in “mixed” exercise types exhibit a wider range of heights. However, statistical analysis is necessary to confirm these observations.

The scatterplot does not indicate a potential positive correlation between Body_fat_percentage and Weight.

In the statistical analysis, there was no statistically significant association between Height and either Body_fat_percentage or Exercise_type in this dataset. All predictors had high p-values (p > 0.05), suggesting that their estimated effects on Height are not statistically meaningful in this sample.