Fixed a mistake in my analysis - www.eamoncaddigan.net - Content and configuration for https://www.eamoncaddigan.net

commit 99e2828657699fd7e2ab38fdb5b6ead7cb6c4c9d
parent ad75b4b388c6c0189bcc7ae69243de40e7a9afe0
Author: Eamon Caddigan <eamon.caddigan@gmail.com>
Date:   Tue, 22 Aug 2023 09:02:14 -0700

Fixed a mistake in my analysis

I realized I forgot that the data were aggregaged by height and width so
I needed to consider a `Count` variable. Fixing this totally changed the
conclusions and necessitated a new approach to clustering (which didn't
change anything). My first tagged update!

Diffstat:
M content/posts/viewports/index.Rmd  | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------
M content/posts/viewports/index.md  | 57 ++++++++++++++++++++++++++++++++++++++-------------------
A content/posts/viewports/index_cache/markdown/__packages  | 12 ++++++++++++
A content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.RData  | 0 
A content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.rdb  | 0 
A content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.rdx  | 0 
D content/posts/viewports/index_files/figure-markdown/aspect-ratio-1.png  | 0 
D content/posts/viewports/index_files/figure-markdown/cluster-exclude-1.png  | 0 
D content/posts/viewports/index_files/figure-markdown/height-width-1.png  | 0 
A content/posts/viewports/plot-aspect-ratio-1.png  | 0 
A content/posts/viewports/plot-clusters-1.png  | 0 
A content/posts/viewports/plot-height-width-1.png  | 0

12 files changed, 105 insertions(+), 49 deletions(-)
diff --git a/content/posts/viewports/index.Rmd b/content/posts/viewports/index.Rmd
@@ -2,10 +2,13 @@
 title: "Web viewports"
 author: "Eamon Caddigan"
 date: "2023-08-21T21:07:47-07:00"
+lastmod: "2023-08-22T08:58:51-07:00"
 draft: False
 categories:
 - Programming
 - Data Science
+tags:
+- Design
 output: 
   md_document:
     variant: markdown
@@ -15,7 +18,9 @@ output:
 ```{r setup, include=FALSE}
 knitr::opts_chunk$set(echo = FALSE,
                       fig.width = 5.5,
-                      fig.height = 3)
+                      fig.height = 3,
+                      dpi = 150,
+                      fig.path = "")
 ```
 
 Andy Bell from [Set Studio](https://set.studio/) shared [the outcome of a study
@@ -26,29 +31,31 @@ I'd take a quick look at it.
 ```{r packages}
 suppressPackageStartupMessages(library(readr))
 suppressPackageStartupMessages(library(dplyr))
+suppressPackageStartupMessages(library(purrr))
+suppressPackageStartupMessages(library(broom))
 suppressPackageStartupMessages(library(ggplot2))
 ```
 
-```{r load-data}
+```{r load-data, cache=TRUE}
 viewports <- suppressMessages(read_csv("https://viewports.fyi/data.csv",
                                        col_types = cols(
                                          Width = col_double(),
                                          Height = col_double(),
                                          Count = col_double(),
                                          ...4 = col_character()
-                                       ))) %>%
+                                       ))) |>
   mutate(Height = abs(Height),
          aspect_ratio = Width / Height,
          area = Width * Height)
 ```
 
 First let's check the distribution of aspect ratios. With mobile browsing more
-popular than desktop these days, I expect tall displays to dominate the wide
-ones, but how does the distribution look?
+popular than desktop, I expect tall displays to dominate the wide ones, but what
+do the data say?
 
-```{r aspect-ratio}
+```{r plot-aspect-ratio, warning=FALSE}
 ggplot(viewports, aes(aspect_ratio)) +
-  geom_density() +
+  geom_density(aes(weight = Count)) +
   scale_x_log10(breaks = c(1/3, 9/16, 1, 16/9, 3/1),
                 labels = c("1:3", "9:16", "1:1", "16:9", "3:1")) +
   coord_cartesian(xlim = c(1/4, 4/1)) +
@@ -63,15 +70,22 @@ ggplot(viewports, aes(aspect_ratio)) +
   labs(title = "Distribution of viewport aspect ratios")
 ```
 
-Surprisingly, wide viewports (with aspect ratios greater than 1:1) are slightly
-more common than narrow ones, the distribution of the latter are just more 
-tightly clustered (apparently around 9:16).
+As expected, tall (narrow) displays dominate the distribution, bu the peak also
+seems sharper. This isn't surprising; phone screens come in a few discrete
+sizes, and for each phone screen, there is a small number of viewports (but,
+importantly, more than one) associated with each one.
 
-How do the distribution of height and width relate to each other?
+How do the distribution of height and width relate to each other? I'll plot a
+point for each unique viewport, and set the alpha value of each point to the
+number of observations of that specific size. Points above and to the left of
+the dashed line are taller than they are wide (i.e., "portrait mode"), and those
+below and to its right are wider than tall ("landscape mode"). We confirmed that
+tall viewports are the most common by looking at aspect ratios, but how are the
+specific sizes distributed?
 
-```{r height-width}
+```{r plot-height-width}
 ggplot(viewports, aes(x = Width, y = Height)) +
-  geom_point(alpha = 0.1) +
+  geom_point(aes(alpha = Count)) +
   geom_abline(slope = 1, intercept = 0, linetype = "dashed", linewidth = 0.5) +
   scale_x_log10() +
   scale_y_log10() +
@@ -82,31 +96,42 @@ ggplot(viewports, aes(x = Width, y = Height)) +
        y = "Viewport height")
 ```
 
-I feel like my eye detects three clusters of height by width combinations, but
-k-means is clustering these unreliably, which suggests that it's not a clean
-clustering.
+I feel like my eye detects three clusters of height-by-width combinations, but
+k-means is clustering these unreliably—sometimes the "small wide" viewports get
+clustered alone together, and sometimes they're grouped with the "small tall"
+viewports. This suggests that three (and also four—I checked) clusters don't
+describe these data particularly well.
 
-I think there's more to look at here, and I'll update this if I come back to it!
+However, taking these two graphics together, we can see that while tall displays
+tend to be smaller than wide ones in general (which is what we may expect,
+knowing that mobile browsers are more prevalent than desktop browsers), there
+are plenty of exceptions in the data set which should be considered during
+design.
 
-```{r cluster-exclude, include=FALSE}
-viewports <- viewports %>%
-  mutate(log_width = log(Width),
-         log_height = log(Height))
+I think there's more to look at here, and I'll update this if I find anyting
+interesting!
 
-clust3 <- kmeans(viewports[, c("log_width", "log_height")], 3)
-viewports <- broom::augment(clust3, viewports)
+```{r cluster-viewports, include=FALSE}
+viewports_long <- viewports |>
+  mutate(log_width = log(Width),
+         log_height = log(Height)) |>
+  map_dfc(~ rep(.x, times = viewports$Count))
+viewports_clust <- kmeans(viewports_long[, c("Width", "Height")], 4)
+viewports_long <- augment(viewports_clust,
+                          viewports_long)
+viewports <- distinct(viewports_long)
+```
 
-ggplot(viewports, aes(x = Width, y = Height)) +
-  geom_point(aes(color = .cluster), alpha = 0.1) +
-  geom_abline(slope = 1, intercept = 0, linetype = "dashed", size = 0.5) +
+```{r plot-clusters, include=FALSE}
+ggplot(viewports_long, aes(Width, Height)) +
+  geom_point(aes(alpha = Count, color = .cluster)) +
+  geom_abline(slope = 1, intercept = 0, linetype = "dashed", linewidth = 0.5) +
   scale_x_log10() +
   scale_y_log10() +
-  scale_color_brewer(palette = "Dark2") +
+  scale_color_brewer(palette = "Paired") +
   theme_minimal() +
   theme(axis.title = element_text(size = 14),
-        axis.text = element_text(size = 12))+
+        axis.text = element_text(size = 12)) +
   labs(x = "Viewport width",
        y = "Viewport height")
 ```
-
-
diff --git a/content/posts/viewports/index.md b/content/posts/viewports/index.md
@@ -2,10 +2,13 @@
 title: "Web viewports"
 author: "Eamon Caddigan"
 date: "2023-08-21T21:07:47-07:00"
+lastmod: "2023-08-22T08:58:51-07:00"
 draft: False
 categories:
 - Programming
 - Data Science
+tags:
+- Design
 output: 
   md_document:
     variant: markdown
@@ -18,22 +21,38 @@ the screen (measured in pixels) available for web pages. They shared the
 data so I thought I'd take a quick look at it.
 
 First let's check the distribution of aspect ratios. With mobile
-browsing more popular than desktop these days, I expect tall displays to
-dominate the wide ones, but how does the distribution look?
-
-![](index_files/figure-markdown/aspect-ratio-1.png)
-
-Surprisingly, wide viewports (with aspect ratios greater than 1:1) are
-slightly more common than narrow ones, the distribution of the latter
-are just more tightly clustered (apparently around 9:16).
-
-How do the distribution of height and width relate to each other?
-
-![](index_files/figure-markdown/height-width-1.png)
-
-I feel like my eye detects three clusters of height by width
-combinations, but k-means is clustering these unreliably, which suggests
-that it's not a clean clustering.
-
-I think there's more to look at here, and I'll update this if I come
-back to it!
+browsing more popular than desktop, I expect tall displays to dominate
+the wide ones, but what do the data say?
+
+![](plot-aspect-ratio-1.png)
+
+As expected, tall (narrow) displays dominate the distribution, bu the
+peak also seems sharper. This isn't surprising; phone screens come in a
+few discrete sizes, and for each phone screen, there is a small number
+of viewports (but, importantly, more than one) associated with each one.
+
+How do the distribution of height and width relate to each other? I'll
+plot a point for each unique viewport, and set the alpha value of each
+point to the number of observations of that specific size. Points above
+and to the left of the dashed line are taller than they are wide (i.e.,
+"portrait mode"), and those below and to its right are wider than tall
+("landscape mode"). We confirmed that tall viewports are the most common
+by looking at aspect ratios, but how are the specific sizes distributed?
+
+![](plot-height-width-1.png)
+
+I feel like my eye detects three clusters of height-by-width
+combinations, but k-means is clustering these unreliably---sometimes the
+"small wide" viewports get clustered alone together, and sometimes
+they're grouped with the "small tall" viewports. This suggests that
+three (and also four---I checked) clusters don't describe these data
+particularly well.
+
+However, taking these two graphics together, we can see that while tall
+displays tend to be smaller than wide ones in general (which is what we
+may expect, knowing that mobile browsers are more prevalent than desktop
+browsers), there are plenty of exceptions in the data set which should
+be considered during design.
+
+I think there's more to look at here, and I'll update this if I find
+anyting interesting!
diff --git a/content/posts/viewports/index_cache/markdown/__packages b/content/posts/viewports/index_cache/markdown/__packages
@@ -0,0 +1,12 @@
+base
+methods
+datasets
+utils
+grDevices
+graphics
+stats
+readr
+dplyr
+purrr
+broom
+ggplot2
diff --git a/content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.RData b/content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.RData
Binary files differ.
diff --git a/content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.rdb b/content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.rdb
Binary files differ.
diff --git a/content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.rdx b/content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.rdx
Binary files differ.
diff --git a/content/posts/viewports/index_files/figure-markdown/aspect-ratio-1.png b/content/posts/viewports/index_files/figure-markdown/aspect-ratio-1.png
Binary files differ.
diff --git a/content/posts/viewports/index_files/figure-markdown/cluster-exclude-1.png b/content/posts/viewports/index_files/figure-markdown/cluster-exclude-1.png
Binary files differ.
diff --git a/content/posts/viewports/index_files/figure-markdown/height-width-1.png b/content/posts/viewports/index_files/figure-markdown/height-width-1.png
Binary files differ.
diff --git a/content/posts/viewports/plot-aspect-ratio-1.png b/content/posts/viewports/plot-aspect-ratio-1.png
Binary files differ.
diff --git a/content/posts/viewports/plot-clusters-1.png b/content/posts/viewports/plot-clusters-1.png
Binary files differ.
diff --git a/content/posts/viewports/plot-height-width-1.png b/content/posts/viewports/plot-height-width-1.png
Binary files differ.

	www.eamoncaddigan.net Content and configuration for https://www.eamoncaddigan.net
	git clone https://git.eamoncaddigan.net/www.eamoncaddigan.net.git
	Log \| Files \| Refs \| Submodules \| README

M	content/posts/viewports/index.Rmd	\|	85	+++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------
M	content/posts/viewports/index.md	\|	57	++++++++++++++++++++++++++++++++++++++-------------------
A	content/posts/viewports/index_cache/markdown/__packages	\|	12	++++++++++++
A	content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.RData	\|	0
A	content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.rdb	\|	0
A	content/posts/viewports/index_cache/markdown/load-data_34c33536adc2d5733b1aa60a4e85eb48.rdx	\|	0
D	content/posts/viewports/index_files/figure-markdown/aspect-ratio-1.png	\|	0
D	content/posts/viewports/index_files/figure-markdown/cluster-exclude-1.png	\|	0
D	content/posts/viewports/index_files/figure-markdown/height-width-1.png	\|	0
A	content/posts/viewports/plot-aspect-ratio-1.png	\|	0
A	content/posts/viewports/plot-clusters-1.png	\|	0
A	content/posts/viewports/plot-height-width-1.png	\|	0