Political Science Academic Job Market

An NLP Analysis of APSA eJobs Listings

Author

Knowledge Mining Workshop

Published

April 9, 2026

Executive Summary

This report analyzes the political science academic job market using all active listings published in APSA Political Science Jobs — the official monthly eJobs journal of the American Political Science Association. Subfield counts are sourced directly from each issue’s Table of Contents (page 2) for accuracy, while individual job records are scraped from body text and verified against those ground-truth counts.

0.1 Dataset at a Glance

113
Issues Processed
10,168
Unique Job Listings
5,576
Unique Institutions
2016–2026
Years Covered

0.2 Geographic Overview


1 Data Collection & Parsing

1.1 Methodology

The data pipeline operates in three stages:

  1. PDF Extractionpdftools::pdf_text() reads raw text from each monthly issue.
  2. Page 2 TOC Parsing — A regex pipeline targeting (N listings) on the Table of Contents extracts the official, ground-truth count per subfield per issue.
  3. Body Text Scraping — The parser walks the document line-by-line, tracking section headers to assign subfields, flushing a record each time it detects an eJobs ID: marker.

1.2 Verification Report


2 Subfield Analysis

2.1 Summary Statistics by Subfield

Summary Statistics by Subfield
Subfield N Jobs N w/ Salary Median Salary Mean Salary % TT % Visiting % Teaching Trk % Postdoc
Methods 1496 191 $65,000 $72,254 15.4% 3.9% 4.3% 6.5%
CP 1490 172 $65,000 $67,282 16.4% 2.9% 3.0% 7.0%
IR 1461 179 $60,000 $66,753 14.6% 2.7% 3.1% 8.5%
AP 946 105 $65,000 $68,731 16.4% 4.2% 5.5% 9.6%
Other 946 94 $65,000 $67,877 16.1% 3.1% 3.7% 6.0%
PT 942 78 $65,000 $64,371 16.2% 3.8% 3.9% 5.1%
PL 634 60 $75,000 $74,696 23.0% 4.1% 2.8% 4.1%
PP 574 61 $72,500 $78,408 15.9% 2.4% 3.5% 7.1%
Open 525 64 $65,000 $73,694 12.6% 1.5% 4.2% 7.4%
Non-Academic 501 54 $65,000 $72,906 12.4% 2.2% 3.4% 4.6%
Admin 494 55 $65,000 $76,880 14.4% 3.6% 4.0% 3.8%
PAdmin 159 15 $65,000 $67,880 17.6% 1.3% 5.0% 1.3%

3 Rank Analysis

3.1 Distribution of Rank Categories

Show Code
rank_order <- jobs %>%
  count(rank_category) %>% arrange(n) %>% pull(rank_category)

jobs %>%
  count(rank_category) %>%
  mutate(rank_category = factor(rank_category, levels = rank_order)) %>%
  ggplot(aes(x = n, y = rank_category, fill = n)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = comma(n)), hjust = -0.2,
            size = 3.4, family = PAL) +
  scale_fill_viridis_c(option = "E", direction = -1) +
  scale_x_continuous(expand = expansion(mult = c(0, .18))) +
  labs(title = "Job Listings by Rank Category",
       x = "Number of Listings", y = NULL)

3.2 Rank Composition by Subfield

Show Code
jobs %>%
  filter(rank_category %in% tt_ranks, !is.na(subfield)) %>%
  count(subfield, rank_category) %>%
  group_by(subfield) %>%
  mutate(pct = n / sum(n)) %>% ungroup() %>%
  ggplot(aes(x = reorder(subfield, -n, sum), y = pct,
             fill = factor(rank_category, levels = rev(tt_ranks)))) +
  geom_col() +
  scale_y_continuous(labels = percent_format()) +
  scale_fill_brewer(palette = "Spectral", name = "Rank") +
  labs(title = "Rank Composition by Subfield",
       x = NULL, y = "Share of Listings") +
  theme(axis.text.x = element_text(angle = 35, hjust = 1, family = PAL),
        legend.position = "bottom",
        legend.text = element_text(size = 9, family = PAL))

4 Geographic Distribution

4.1 Listings by US Region

Show Code
jobs %>%
  count(region) %>%
  arrange(desc(n)) %>%
  ggplot(aes(x = reorder(region, n), y = n, fill = region)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = comma(n)), hjust = -0.2,
            size = 3.5, family = PAL) +
  scale_fill_brewer(palette = "Set2") +
  scale_y_continuous(expand = expansion(mult = c(0, .15))) +
  coord_flip() +
  labs(title = "Job Listings by US Region",
       x = NULL, y = "Listings")

4.2 Top 20 States

Show Code
jobs %>%
  filter(!is.na(state_raw)) %>%
  count(state_raw, sort = TRUE) %>%
  slice_head(n = 20) %>%
  ggplot(aes(x = n, y = reorder(state_raw, n), fill = n)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = comma(n)), hjust = -0.2,
            size = 3.3, family = PAL) +
  scale_fill_viridis_c(option = "D", direction = -1) +
  scale_x_continuous(expand = expansion(mult = c(0, .15))) +
  labs(title = "Top 20 States by Job Listings",
       x = "Listings", y = NULL)

4.3 State Choropleth (Detail)


5 Salary Analysis

Warning

Only listings with an explicit numeric salary are included here. Most listings say “Competitive” or “Commensurate with experience” and are excluded. Interpret figures with caution.

5.1 Salary by Rank and Subfield

Show Code
sal_df <- jobs %>% filter(!is.na(salary_est), salary_est > 10000)

p_sal_rank <- sal_df %>%
  filter(rank_category %in% tt_ranks) %>%
  ggplot(aes(x = reorder(rank_category, salary_est, median),
             y = salary_est, fill = rank_category)) +
  geom_boxplot(outlier.shape = 21, outlier.size = 1.5, show.legend = FALSE) +
  scale_y_continuous(labels = dollar_format()) +
  scale_fill_brewer(palette = "Spectral") +
  coord_flip() +
  labs(title = "Salary Distribution by Rank",
       subtitle = "Listings with explicit numeric salary only",
       x = NULL, y = "Estimated Annual Salary")

p_sal_sf <- sal_df %>%
  filter(!is.na(subfield)) %>%
  ggplot(aes(x = reorder(subfield, salary_est, median),
             y = salary_est, fill = subfield)) +
  geom_boxplot(outlier.shape = 21, outlier.size = 1.5, show.legend = FALSE) +
  scale_y_continuous(labels = dollar_format()) +
  scale_fill_brewer(palette = "Paired") +
  coord_flip() +
  labs(title = "Salary Distribution by Subfield",
       x = NULL, y = "Estimated Annual Salary")

p_sal_rank / p_sal_sf

5.2 Salary Trend Over Time

Show Code
sal_df %>%
  filter(!is.na(year)) %>%
  group_by(year) %>%
  summarise(median_sal = median(salary_est),
            mean_sal   = mean(salary_est),
            n = n(), .groups = "drop") %>%
  ggplot(aes(x = year)) +
  geom_ribbon(aes(ymin = median_sal, ymax = mean_sal),
              alpha = 0.18, fill = "steelblue") +
  geom_line(aes(y = median_sal, colour = "Median"), linewidth = 1) +
  geom_line(aes(y = mean_sal,   colour = "Mean"),
            linewidth = 1, linetype = "dashed") +
  scale_y_continuous(labels = dollar_format()) +
  scale_x_continuous(breaks = pretty_breaks(n = 8)) +
  scale_colour_manual(values = c(Median = "steelblue", Mean = "tomato"),
                      name = NULL) +
  labs(title    = "Salary Trend Over Time",
       subtitle = paste0("n = ", comma(nrow(sal_df)),
                         " listings with explicit numeric salary"),
       x = NULL, y = "Annual Salary") +
  theme(legend.position = "bottom",
        legend.text = element_text(size = 9, family = PAL))


6 Text Mining

6.1 Top 30 Terms

Show Code
tidy_words %>%
  count(word, sort = TRUE) %>%
  slice_head(n = 30) %>%
  ggplot(aes(x = n, y = reorder(word, n), fill = n)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = comma(n)), hjust = -0.2,
            size = 3.3, family = PAL) +
  scale_fill_viridis_c(option = "D", direction = -1) +
  scale_x_continuous(labels = comma,
                     expand = expansion(mult = c(0, .15))) +
  labs(title    = "Top 30 Terms Across All Job Listings",
       subtitle = "After removing stopwords; based on rank + unit fields",
       x = "Frequency", y = NULL)

6.2 TF-IDF: Distinctive Terms by Subfield

TF-IDF (Term Frequency–Inverse Document Frequency) surfaces words that are unusually common in one subfield relative to all others — revealing the distinctive vocabulary of each field.

Show Code
tidy_words %>%
  filter(!is.na(subfield)) %>%
  count(subfield, word) %>%
  bind_tf_idf(word, subfield, n) %>%
  group_by(subfield) %>%
  slice_max(tf_idf, n = 8) %>% ungroup() %>%
  mutate(word = reorder_within(word, tf_idf, subfield)) %>%
  ggplot(aes(x = tf_idf, y = word, fill = subfield)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ subfield, scales = "free_y", ncol = 3) +
  scale_y_reordered() +
  scale_fill_brewer(palette = "Paired") +
  labs(title    = "Most Distinctive Terms by Subfield (TF-IDF)",
       subtitle = "Words uniquely associated with each subfield",
       x = "TF-IDF Score", y = NULL) +
  theme(axis.text.y = element_text(size = 8, family = PAL))

6.3 TF-IDF: Distinctive Terms by Rank

Show Code
tidy_words %>%
  filter(rank_category %in% tt_ranks) %>%
  count(rank_category, word) %>%
  bind_tf_idf(word, rank_category, n) %>%
  group_by(rank_category) %>%
  slice_max(tf_idf, n = 8) %>% ungroup() %>%
  mutate(word = reorder_within(word, tf_idf, rank_category)) %>%
  ggplot(aes(x = tf_idf, y = word, fill = rank_category)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ rank_category, scales = "free_y", ncol = 3) +
  scale_y_reordered() +
  scale_fill_brewer(palette = "Spectral") +
  labs(title = "Most Distinctive Terms by Rank (TF-IDF)",
       x = "TF-IDF Score", y = NULL) +
  theme(axis.text.y = element_text(size = 8, family = PAL))

6.4 Top Bigrams

Show Code
jobs %>%
  unnest_tokens(bigram, full_text, token = "ngrams", n = 2) %>%
  separate(bigram, into = c("w1","w2"), sep = " ") %>%
  filter(!w1%in% stop_words$word, !w2%in% stop_words$word,
         !w1%in% ps_stopwords$word, !w2%in% ps_stopwords$word,
         str_length(w1) > 2, str_length(w2) > 2,
         !str_detect(w1,"^\\d+$"), !str_detect(w2,"^\\d+$")) %>%
  unite(bigram, w1, w2, sep = " ") %>%
  count(bigram, sort = TRUE) %>%
  slice_head(n = 25) %>%
  ggplot(aes(x = n, y = reorder(bigram, n), fill = n)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = comma(n)), hjust = -0.2,
            size = 3.3, family = PAL) +
  scale_fill_viridis_c(option = "C", direction = -1) +
  scale_x_continuous(labels = comma,
                     expand = expansion(mult = c(0, .15))) +
  labs(title    = "Top 25 Bigrams in Job Listings",
       subtitle = "Common two-word phrases after stopword removal",
       x = "Frequency", y = NULL)

6.5 Word Clouds by Subfield


7 Browse All Jobs


8 Appendix

8.1 R Session Info

Show Code
sessionInfo()
R version 4.5.3 (2026-03-11)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggtext_0.1.2       plotly_4.11.0      DT_0.33            kableExtra_1.4.0  
 [5] knitr_1.50         patchwork_1.3.2    maps_3.4.3         ggridges_0.5.6    
 [9] viridis_0.6.5      viridisLite_0.4.2  wordcloud_2.6      RColorBrewer_1.1-3
[13] SnowballC_0.7.1    tidytext_0.4.2     lubridate_1.9.5    scales_1.4.0      
[17] ggplot2_4.0.0      tibble_3.3.0       stringr_1.6.0      tidyr_1.3.2       
[21] dplyr_1.2.0       

loaded via a namespace (and not attached):
 [1] janeaustenr_1.0.0 sass_0.4.10       generics_0.1.4    xml2_1.5.1       
 [5] stringi_1.8.7     lattice_0.22-9    digest_0.6.39     magrittr_2.0.4   
 [9] evaluate_1.0.5    grid_4.5.3        timechange_0.4.0  fastmap_1.2.0    
[13] jsonlite_2.0.0    Matrix_1.7-4      gridExtra_2.3     httr_1.4.7       
[17] purrr_1.2.1       crosstalk_1.2.1   jquerylib_0.1.4   codetools_0.2-20 
[21] lazyeval_0.2.2    textshaping_1.0.1 cli_3.6.5         rlang_1.1.7      
[25] tokenizers_0.3.0  cachem_1.1.0      withr_3.0.2       yaml_2.3.10      
[29] tools_4.5.3       vctrs_0.7.2       R6_2.6.1          lifecycle_1.0.5  
[33] htmlwidgets_1.6.4 pkgconfig_2.0.3   bslib_0.9.0       pillar_1.11.1    
[37] gtable_0.3.6      data.table_1.17.2 glue_1.8.0        Rcpp_1.1.0       
[41] systemfonts_1.2.3 xfun_0.54         tidyselect_1.2.1  rstudioapi_0.17.1
[45] farver_2.1.2      htmltools_0.5.8.1 labeling_0.4.3    svglite_2.2.1    
[49] rmarkdown_2.30    compiler_4.5.3    S7_0.2.0          gridtext_0.1.5   

8.2 Data Pipeline Summary

Stage Tool Output
PDF text extraction pdftools::pdf_text() Raw character vectors
TOC count parsing stringr regex on pages 1–3 ps_jobs_toc_counts.csv
Body text scraping Line-by-line section tracker + eJobs ID flush ps_jobs_all_raw.csv
Deduplication dplyr::distinct(ejobs_id) ps_jobs_all_unique.csv
Verification TOC count vs scraped count per issue × subfield ps_jobs_verification.csv
Analytics & Report ggplot2, tidytext, maps, Quarto This document

8.3 Subfield Code Reference

Code Full Name
AP American Government and Politics
CP Comparative Politics
IR International Relations
Methods Methodology
PT Political Theory
PL Public Law
PP Public Policy
PAdmin Public Administration
Admin Administration
Non-Academic Non-Academic Positions
Open Open Subfield
Other Other