Scrape, scrape, scrape, your website
Overview
Web-scraping is a powerful tool used to extract data from web pages. In this post I’ll review web-scraping generally using the rvest
package, and illustrate how these tools can be put together in two real-world examples.
Tools
Two basic tools are needed to get started with web-scraping in R: familiarity with the rvest
package and comfort with the SelectorGadget tool. A necessary introduction to the package is Hadley’s brief walk-through of the package, which briefly goes over the package and using the SelectorGadget. Additionally, I used this tutorial as a sanity check to verify I was using the tools correctly. Here’s a bit more detail on each piece.
SelectorGadget
Basically, the SelectorGadget is a super convenient point-and-click widget you can use to find appropriate CSS code for some part of a web-page. This means that it is not necessary to understand html or css code (I have done these two examples without any knowledge of these two languages) to scrape a page. You can get a feel for how it works in the link above. One important thing to know is that often there are multiple choices for a certain field on a webpage, so do not be discouraged if your first choice yields nothing. Scraping with the SelectorGadget is like crab-fishing, except much safer and with greater odds.
rvest
This nifty package is the main reason why scraping in R is so easy. It is really easy to gloss through, since there are roughly ten functions from which any scraping algorithm will be assembled. Since this package is basically an intuitive wrapper around xml2
and httr
packages, any fixed to the functins themselves requires html or css knowlege (again, I haven’t had any issues). For convenience here are some of the most important functions:
html_session
- simulates an internet session in an html browser.
read_html
- reads the html on the page. html_nodes
- in conjunction with the SelectorGadget, it selects the corresponding pieces from the html webpage.
html_text
- extracts attributes, text and tag name from the html; basically returns the text you’re trying to get.
These three functions are the bread-and-butter of easy scraping in R. To navigate a webpage, we use these functions:
follow_link
- takes a
jump_to
- takes a url (relative or absolute) to navigate to the webpage; useful if websites have a predictable url structure.
back
- reutrn to the previous page.
These functions become increasingly important as we start to write algorithms to scrape a series of webpages. Additionally, sometimes you may need to input values into a form on a page e.g. searching the website or inputting username and passwords. Here are some functions for that:
html_form
- parses a form on a page.
set_values
- set the values for all the required values on the form.
submit_form
- submits the form on the page; equivalent of clicking enter once we type in the values on a form.
There are a few other functions, but basically these are the core tools you’ll need to extract, navigate, and interact with a webpage.
Basic scrape
So what does it take to scrape a page? Well, suppose we save the url for the webpage we want to scrape in an object called url
and the css code in an object called css
. For the most general scraping of information on a webpage, the process is this:
- Use the SelectorGadget to identify the css code for the corresponding piece.
- Then it’s as easy as
read_html(url) %>% html_nodes(css) %>% html_text()
!
That’s all there is to it. With this in mind, it becomes easy to build on these tools for more complicated scraping. To demonstrate how to put these tools together, we’ll review two interesting examples; one briefly in word and another in code.
Examples
In the first example I’ll try to give a sense for how you can identify a good opportunity for web-scraping, and go through some pseudocode to accomplish this task I actually faced at work. In the second I review more of the code for an exhaustive web scrape, and show how the code was parallelized to work much faster.
Example 1: Scraping and NLP
Background: Coding requires extended periods of focus. At some point I became the point-guy for these analyses which were not hard, but would take 4-5 hours across a couple of days. Eventually I siloed-off some free time to automate this task, saving me the focus I needed to perform at my best.
Healthcare companies are required by the FDA to document all of the configurations of parameters they used when developing and evaluating their product. In the statistics part of this submission, there is a primary and qc (quality check) analysis done for most tasks. In this example, each configurations is stored in an individual file. I’ll walk-through the task for this specific example
Task: Document the total number of runs (files) executed in support of the internal studies for Product X. In this example, here are the values I must compare against:
To succesfully compare against the results from the primary analysis above, the procedure is roughly as follows:
- Input each ID into an online database, search for the files in one of a few places, and save the files in a folder. (2-3hrs)
- After all files have been downloaded, remove files with extensions .bio, .txt, or .out, and duplicate files. (1-2hrs)
- Compare against primary. If results agree you are done, otherwise resolve any discrepancies with the primary analyst. (10-30min)
The obvious bottleneck in this repetitive task was the searching and downloading of all of the files. For that reason this task was an ideal candidate for webscraping. Additionally, I would have to create and fill in folders correctly based on user input, and perform some light nlp to remove files with the above extensions.
There were two scripts I created to accomplish the task:
run_count_beta.R
- contains the prinary function (run_count
) to document and compare the total number of runs; split into three sections: folder creation, web-scraping, and file-removing / comparing.
run_count_template.R
- script to execute the function above to complete the task.
Below we reproduce the code of run_count_template.R
which can be run without the need for web-scraping given you would not have access to our database. Currently, the first run of the function creates all of the necessary folders, at which point you have to populate each folder manually. After that, you just re-run the function and it will compare and return the results table along with our computed counts for comparison.
#******************************************************************
#Setup for count --------------------------------------------------
#******************************************************************
source("path/to/run_count_beta.R")
ids <- c("3", "4", "5")
counts <- c(4, 6, 7)
path <- c("any/path/to/folder/")
folder_name <- c("dummy_data")
#******************************************************************
#Obtaining counts -------------------------------------------------
#******************************************************************
#run once to create folders
run_count(ids, counts, path, folder_name, folders_exist = F)
#run again after populating folders
run_count(ids, counts, path, folder_name, folders_exist = T)
setwd(path)
The actual function run_count
is split into those three parts I mentioned. The first is pretty explanatory in that you use ids
and folder_name
to create the folders if folder_exist == F
.
#******************************************************************
#Packages, folders ------------------------------------------------
#******************************************************************
run_count <- function(ids, counts, path = getwd(), folder_name, folders_exist = T, troubleshoot = F){
library(tidyverse)
setwd(path)
folder_ext <- str_c(folder_name, "files", sep = "/")
if(folders_exist == F){
dir.create(folder_name)
dir.create(folder_ext)
}
setwd(folder_ext)
if(folders_exist == F){
map(ids, function(x) dir.create(as.character(x)))
}
After this is the web-scraping portion which does all of the heavy lifiting, but would not be too meaningful without access to the database. Given my description of the task, you may see that we need conditional logic to continue searching for the database, along with the full gamut of functions we described above to input our ids, navigate forward and back a webpage, and extract the associated files.
The final part removes all of the appropriate files, and compares the results against the primary results. There is an option to receive additional information (e.g. each file name) to better help troubleshoot specified by setting troubleshoot = T
.
#******************************************************************
#Removing .bio, .xls, and duplicated files ------------------------
#******************************************************************
#included files should have only .log or .out extensions
out <- map(all_files, function(x) str_detect(x, "^.*.out$"))
log <- map(all_files, function(x) str_detect(x, "^.*.log$"))
#detecting and removing duplicate files
dups <- inc <- vector(mode = "list", length = length(edms_ids))
inc[[1]] <- all_files[[1]][ out[[1]] | log[[1]]]
names(dups) <- names(inc) <- files
for(i in 2:length(all_files)){
inc[[i]] <- all_files[[i]][ out[[i]] | log[[i]]]
temp <- unlist(inc[1:(i-1)])
dups[[i]] <- all_files[[i]] %in% temp
}
#run count excluding duplicates
run_count <- map_int(inc, length) - map_int(dups, sum)
names(run_count) <- files
names(which(counts != run_count))
sum(run_count)
#results table
dups <- map_int(dups, sum)
count_with_dups <- map_int(inc, length)
res_table <- rbind(counts, run_count, dups, count_with_dups)
rownames(res_table) <- c("exp_counts", "comp_counts", "duplicates", "count_with_dups")
res_table <- t(res_table)
res_table <- rbind(res_table, c(sum(counts), sum(run_count),
sum(res_table[,3]), sum(res_table[,4])))
rownames(res_table) <- c(rownames(res_table)[-nrow(res_table)], "total")
#creating output
results <- list()
results$id <- unlist(edms_ids)
results$table <- res_table
results$file_ext <- map(all_files, function(x) str_sub(x, start = -4L))
if(troubleshoot == T){results}
else{results$table}
At the end of this ,we have a results table comparing against the primary results. Scanning through the code above should give you a sense of the nlp needed, and how web-scraping is really the magic that makes automating a task such as this possible. But to actually get into some web-scraping you can reproduce, we’ll dive into our next example.
Example 2: Parallelized scraping
Background: As part of another larger project I’m working on, I needed to an easily updatable dataset containing all of the poems on the Poetry website. How many poems are currently on the website, and how many webpages contain these poems? This function results_range
will tell us:
#provides key indices for a search result
results_range <- function(url){
index <- url %>% read_html() %>%
html_nodes(".c-txt_starkMeta") %>%
html_text()
index <- unlist(str_extract_all(index, "[\\d,]+")) %>%
str_remove(",") %>% as.numeric()
last_page <- ceiling(index[3] / index[2]) -1
poems_on_last <- index[3] - last_page*index[2]
c(index, last_page, poems_on_last)
}
#running to count poems and pages on Poetry
url <- "https://www.poetryfoundation.org/poems/browse"
poems <- results_range(url)[3]
pages <- results_range(url)[4]
So according to the date at which this code was updated, there were 45471 poems stored across 2273 pages. Each page takes about 10-30 seconds to scrape all the poems, bringing our total run-time anywhere from 5-16 days of non-stop scraping for my poor cpu! But actually, the time it would take is closer to 2-3 days since only a fraction of the poems on the site are digitized. Before getting to how we reel this time back in, let’s get into some of the functions making up my scraping algorithm, apart from results_range
, with special emphasis on the scraping bits:
list_poems
- for a single webpage, extracts the title, type of document (other documents on the website besides poems), author, and first line; handles the scraping.
all_titles
- for multiple webpages, extracts the same information above; handles the navigation.
all_poems
- for multiple webpages, extracts the poem text and places into a new column in the data frame outputted from all_titles
; handles scraping and navigation to get poems.
While all of these functions are necessary to scrape this website, we see that the true scraping function is list_poems
. Let’s look a little closer at it and test it out.
list_poems <- function(url){
#scraping
page <- read_html(url) %>% html_nodes(".c-hdgSans_2 a") %>% html_text() #titles
type <- read_html(url) %>% html_nodes(".c-vList_bordered_anomaly .c-txt_catMeta") %>% html_text() #doc types
author <- read_html(url) %>% html_nodes(".c-vList_bordered_anomaly .c-feature-sub") %>% html_text() %>% str_remove("\n") %>% trimws() %>% str_remove("By ")#author
line <- read_html(url) %>% html_nodes("p") %>% html_text() %>% str_remove("\n")#doc types
#creating data sets to adjust type, lines, and authors in cases where there are NAs
test <- read_html(url) %>% html_nodes(".c-mix-feature_shrinkwrap") %>% html_text()
test <- test[1:20]
temp_type <- temp_line <- temp_author <- type2 <- line2 <- author2 <- vector(length = length(page))
#indices to adjust
for(i in 1:length(page)){
temp_type[i] <- any(str_detect(test[i], fixed(unique(type))))
temp_line[i] <- any(str_detect(test[i], fixed(paste0(unique(line), "\n\n")))) #fixes spurious from lax line e.g. 13 bystanders
temp_author[i] <- any(str_detect(test[i], fixed(unique(author)))) #bug fixed below
}
#adjusting vectors before putting in df
type2[which(temp_type)] <- type
type2[which(!temp_type)] <- NA
line2[which(temp_line)] <- line
line2[which(!temp_line)] <- NA
temp_author <- ifelse(type2 == "author", F, temp_author) #fixes author bug where name is in text
author2[which(temp_author)] <- author
author2[which(!temp_author)] <- NA
tibble(type = type2, title = page, author = author2, line = line2)
}
Only the first four lines are doing the real scraping; the rest is just formatting. As we see, the input going into html_nodes
was found using the SelectorGadget tool. Some trial and error was needed to get a feasible CSS selector. And in each of these 4 scraped pieces, we see the basic template for scraping we discussed before: read_html(url) %>% html_nodes(css) %>% html_text()
. To illustrate the output of this function, we use it below on one of the 2273 webpages:
url <- "https://www.poetryfoundation.org/poems/browse?page=247"
list_poems(url)
## # A tibble: 20 x 4
## type title author line
## <chr> <chr> <chr> <chr>
## 1 poem Breath Gary Short <NA>
## 2 poem BREATH Gigi Marks <NA>
## 3 poem The Breath-Holding Contest Rick Noguchi That boy, the cham…
## 4 poem The Breather Billy Colli… Just as in the hor…
## 5 poem Breathing Mark O'Brien Grasping for straw…
## 6 poem Breathing In David Baker <NA>
## 7 poem Breathing Landscape Muriel Ruke… <NA>
## 8 poem The Breathing Lesson David Wagon… <NA>
## 9 poem The Breathing, the Endless News Rita Dove <NA>
## 10 poem The Breathless Aisle Ray Smith <NA>
## 11 poem Breathless Love Bertha Ten … <NA>
## 12 poem The Breeder’s Cup David Lehman I. TO THE FATES
## 13 poem A Breton Night Ernest Rhys <NA>
## 14 poem Breton Oracles Thomas Macg… <NA>
## 15 poem Breton Song Abbie Husto… <NA>
## 16 poem Breughel Michael Col… The lump on his ne…
## 17 poem The Banquet and Other Poems by France… <NA> <NA>
## 18 poem The Brewers Claire Burch <NA>
## 19 poem Brewing Green Tea in a Glass Percolat… Molly Tenen… <NA>
## 20 poem Brian Age Seven Mark Doty Grateful for their…
Nice! Now, I’ll touch on all_titles
to discuss how we can parallelize it. Here is the function:
all_titles <- function(url, max_pages = 10){
sesh <- html_session(url)
ind <- results_range(url)
if(max_pages < 3000){ind[4] <- max_pages}
titles <- as_tibble((matrix(nrow = ind[3], ncol = 4)))
colnames(titles) <- c("type", "title", "author", "line")
i <- 1
while(ind[2] != ind[3] & i < max_pages){
titles[ind[1]:ind[2],] <- list_poems(sesh)
sesh <- sesh %>% follow_link(i = "Next Page")
ind <- results_range(sesh)
i <- i+1
}
titles[ind[1]:ind[2],] <- list_poems(sesh)
titles <- titles[!is.na(titles$title),]
titles
}
In the beginning we use the url
to create sesh
; an active online session. From there we use results_range
and list_poems
to index and retrieve the titles. The last key bit is that we continuing moving forward using sesh
and follow_link
to keep moving to the next page until we hit the user-specified maximum number of pages. Not too bad. Now, how does something like this get parallelized? Well, the key is to identify which piece of the code can be split and make the appropriate changes to make it happen. Here is the parallelized version of the above code:
all_titles_p <- function(url, max_pages = 10){
#setting up parallel run
ind <- results_range(url)
urls <- paste0(url, "?page=", seq(1, ind[4]))
if(is.null(max_pages)){temp <- urls
}else{temp <- head(urls, max_pages)}
#parallel run
cl <- makeCluster(detectCores()-1)
registerDoParallel(cl)
result <- foreach(i = seq_along(temp),
.packages = c("rvest", "tidyverse"),
.combine = "c",
.errorhandling='pass',
.export = 'list_poems') %dopar% {
# get the header for each page
title <- html_session(temp[i]) %>%
list_poems()
return(list(title))
}
stopCluster(cl)
bind_rows(result)
}
What we can split up are the scraping and navigating of the webpages. To do that we leverage the predictable nature of the url to fully specify the collection of urls in urls
. We can then split the navigation using a foreach
loop, and scrape the poems almost identically to the previous version within the %dopar%
call.
Since the computational process is roughly linear, this speeds my code up by one less of the amount of the cores on my cpu, which turns out to be 11. That means my code is roughly 11x faster! Similar to all_titles_p
, I created an all_poems_p
function which parallelizes the retrieval of poems, which is the longest part of the process.
At the end of the day, the final script to scrape all of the poems looks like so:
#*******************************************************************************
# final-scraper new ------------------------------------------------------------
#*******************************************************************************
library(tidyverse)
library(rvest)
library(foreach)
library(doParallel)
#scraping meta-data, then poems
url <- "https://www.poetryfoundation.org/poems/browse"
poems <- all_titles_p(url, max_pages = NULL)
poems <- all_poems_p(url, poems, max_pages = NULL)
Super “simple”, super sweet. This is pretty much all there is to it. Thanks to the user-friendly rvest
package and SelectorGadget tool, scraping data off the internet is a reality. Hopefully this can be of use to you when you’re trying to scrape together new insights of your own.
P.S.
Since my last example only included scraping and navigation, below you’ll find the code to submit a form on a webpage and retrieve the approriate text if there’s a match. Although the code is a good example of interacting with a webpage, currently the code just rips the raw, unformatted html.
search_poem <- function(title, author){
search <- html_form(read_html("https://www.poetryfoundation.org"))[[1]]
query <- set_values(search, query = paste(title, author))
sesh <- html_session("https://www.poetryfoundation.org")
sesh2 <- submit_form(sesh, query)
#checking whether query is found
page <- sesh2 %>% read_html() %>%
html_node(".o-article-bd .o-grid-col_9of12") %>%
html_text()
results <- str_locate_all(page, title)[[1]]
pings <- nrow(results)
if(pings >= 1){
sesh <- follow_link(sesh2, title)
sesh %>% read_html() %>% html_text()
}else{
print("Item not found. Most relevant search results below.")
list_poems(sesh2)}
}
search_poem("Baudelaire", "Delmore Schwartz")
## Submitting with '<unnamed>'
## Navigating to /poems/42643/baudelaire
## [1] "\n (function(d) {\n var config = {\n kitId: 'lhu6lte',\n scriptTimeout: 3000,\n async: true\n },\n h=d.documentElement,\n t=setTimeout(function(){\n h.className=h.className.replace(/\\bwf-loading\\b/g,\"\")+\" wf-inactive\";\n },\n config.scriptTimeout),\n tk=d.createElement(\"script\"),\n f=false,\n s=d.getElementsByTagName(\"script\")[0],\n a;\n h.className+=\" wf-loading\";\n tk.src='https://use.typekit.net/'+config.kitId+'.js';\n tk.async=true;\n tk.onload=tk.onreadystatechange=function(){\n a=this.readyState;if(f||a&&a!=\"complete\"&&a!=\"loaded\")return;f=true;clearTimeout(t);try{Typekit.load(config)}catch(e){}\n };\n s.parentNode.insertBefore(tk,s)\n })(document);\n Baudelaire by Delmore Schwartz | Poetry FoundationdataLayer = [];\n(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':\nnew Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],\nj=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=\n'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);\n})(window,document,'script','dataLayer','GTM-NTFVGQ3');\n\n agendaangle-downangle-leftangleRightarrow-downarrowRightbarscalendarcaret-downcartchildrenhighlightlearningResourceslistmapMarkeropenBookp1pinpoetry-magazineprintquoteLeftquoteRightslideshowtagAudiotagVideoteenstrash-o\n Skip to Content\n \n \n\n \n \n \n \n Show Menu\n \n \n \n \n \n Poetry Foundation\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n Poems\n \n \n \n Poems Home\n \n \n Poems for Children\n \n \n Poems for Teens\n \n \n Poem Guides\n \n \n Audio Poems\n \n \n Poem of the Day\n \n \n \n \n \n Poets\n \n \n Prose\n \n \n \n Prose Home\n \n \n Harriet Blog\n \n \n \n \n \n Collections\n \n \n Listen\n \n \n Learn\n \n \n \n Learn Home\n \n \n Children\n \n \n Teens\n \n \n Adults\n \n \n Educators\n \n \n Glossary of Poetic Terms\n \n \n \n \n \n Visit\n \n \n \n Visit Home\n \n \n Events\n \n \n Exhibitions\n \n \n Library\n \n \n \n \n \n Poetry Magazine\n \n \n \n Poetry Magazine Home\n \n \n Current Issue\n \n \n Poetry Magazine Archive\n \n \n Subscriptions\n \n \n About the Magazine\n \n \n How to Submit\n \n \n Advertise with Us\n \n \n \n \n \n About Us\n \n \n \n About Us Home\n \n \n Foundation Awards\n \n \n Media Partnerships\n \n \n Poetry Out Loud\n \n \n People\n \n \n Press Releases\n \n \n Contact Us\n \n \n \n \n \n \n Newsletter\n \n Subscribe\n Give\n \n \n \n \n \n \n \n \n \n \n Search\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n\n \n \n \n \n Search\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n About Us \n \n \n About Us Home\n \n \n Foundation Awards\n \n \n Media Partnerships\n \n \n Poetry Out Loud\n \n \n People\n \n \n Press Releases\n \n \n Contact Us\n \n \n \n Events\n \n \n Newsletters\n \n \n Give\n \n \n Subscribe\n \n\n Poetry Foundation\n \n\n \n \n\n Poems\n \n Poems Home\n \n \n Poems for Children\n \n \n Poems for Teens\n \n \n Poem Guides\n \n \n Audio Poems\n \n \n Poem of the Day\n \n \n \n Poets\n \n \n Prose\n \n Prose Home\n \n \n Harriet Blog\n \n \n \n Collections\n \n \n Listen\n \n \n Learn\n \n Learn Home\n \n \n Children\n \n \n Teens\n \n \n Adults\n \n \n Educators\n \n \n Glossary of Poetic Terms\n \n \n \n Visit\n \n Visit Home\n \n \n Events\n \n \n Exhibitions\n \n \n Library\n \n \n \n Poetry Magazine\n \n Poetry Magazine Home\n \n \n Current Issue\n \n \n Poetry Magazine Archive\n \n \n Subscriptions\n \n \n About the Magazine\n \n \n How to Submit\n \n \n Advertise with Us\n \n \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n Back to Previous\n \n\n \n \n \n \n\n\n\n \n \n \n \n Baudelaire\n \n \n\n \n \n \n By Delmore Schwartz\n \n \n \n \n \n\n \n\n \n \n \n \n When I fall asleep, and even during sleep,\r I hear, quite distinctly, voices speaking\r Whole phrases, commonplace and trivial, \r Having no relation to my affairs. \r Dear Mother, is any time left to us\r In which to be happy? My debts are immense.\r My bank account is subject to the court’s judgment.\r I know nothing. I cannot know anything. \r I have lost the ability to make an effort.\r But now as before my love for you increases. \r You are always armed to stone me, always: \r It is true. It dates from childhood.\r For the first time in my long life\r I am almost happy. The book, almost finished, \r Almost seems good. It will endure, a monument\r To my obsessions, my hatred, my disgust. \r Debts and inquietude persist and weaken me. \r Satan glides before me, saying sweetly:\r “Rest for a day! You can rest and play today. \r Tonight you will work.” When night comes, \r My mind, terrified by the arrears,\r Bored by sadness, paralyzed by impotence, \r Promises: “Tomorrow: I will tomorrow.”\r Tomorrow the same comedy enacts itself \r With the same resolution, the same weakness. \r I am sick of this life of furnished rooms. \r I am sick of having colds and headaches: \r You know my strange life. Every day brings\r Its quota of wrath. You little know\r A poet’s life, dear Mother: I must write poems, \r The most fatiguing of occupations.\r I am sad this morning. Do not reproach me.\r I write from a café near the post office,\r Amid the click of billiard balls, the clatter of dishes, \r The pounding of my heart. I have been asked to write \r “A History of Caricature.” I have been asked to write \r “A History of Sculpture.” Shall I write a history\r Of the caricatures of the sculptures of you in my heart?\r Although it costs you countless agony,\r Although you cannot believe it necessary,\r And doubt that the sum is accurate,\r Please send me money enough for at least three weeks.\n \n \n \n \n \n \n \n \n \n \n Delmore Schwartz, “Baudelaire” from Selected Poems (1938-1958): Summer Knowledge. Copyright © 1967 by Delmore Schwartz. Reprinted with the permission of New Directions Publishing Corporation, www.wwnorton.com/nd/welcome.htm.\n \n \n \n \n \n \n \n Source:\n Selected Poems (1938-1958): Summer Knowledge\n (New Directions Publishing Corporation, 1967)\n \n \n \n \n \n\n\n\n\n \n Share on Twitter\n \n \n \n Share on Facebook\n \n \n \n Print this page\n \n \n \n Email this page\n \n \n\n \n \n\n \n More About this Poem\n \n \n \n\n \n \n\n \n \n \n More Poems by Delmore Schwartz\n \n \n \n \n \n Poem (\"You, my photographer...\")\n \n \n \n By Delmore Schwartz\n \n \n \n\n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n \n \n \n \n \n \n \n Poem (\"Old man...\")\n \n \n \n By Delmore Schwartz\n \n \n \n\n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n \n \n \n \n \n \n \n Two Poems\n \n \n \n By Delmore Schwartz\n \n \n \n\n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n \n \n \n \n \n \n \n Hope like the Phoenix Breast Rises Again\n \n \n \n By Delmore Schwartz\n \n \n \n\n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n \n \n \n \n \n \n \n My Love, My Love, My Love, Why Have You Left Me Alone?\n \n \n \n By Delmore Schwartz\n \n \n \n\n \n\n \n \n \n See All Poems by this Author\n \n \n \n \n\n \n \n \n \n \n \n \n \n\n \n Poems\n \n Poems for Children\n \n \n Poems for Teens\n \n \n Poem Guides\n \n \n Audio Poems\n \n \n \n \n \n Poets\n \n \n \n Prose\n \n Harriet Blog\n \n \n \n \n \n Collections\n \n \n \n \n Listen\n \n \n \n \n Learn\n \n Children\n \n \n Teens\n \n \n Adults\n \n \n Educators\n \n \n Glossary of Poetic Terms\n \n \n \n \n \n Visit\n \n Events\n \n \n Exhibitions\n \n \n Library\n \n \n \n \n \n Poetry Magazine\n \n Current Issue\n \n \n Poetry Magazine Archive\n \n \n Subscriptions\n \n \n About the Magazine\n \n \n How to Submit\n \n \n Advertise with Us\n \n \n \n \n \n About Us\n \n Give\n \n \n Foundation Awards\n \n \n Media Partnerships\n \n \n Poetry Out Loud\n \n \n People\n \n \n Jobs\n \n \n \n\n \n \n \n \n .st0{fill:none;stroke:#ED1C24;stroke-width:3.64;stroke-miterlimit:10;} .st1{fill:#ED1C24;} .st2{fill:#FFFFFF;}\n \n \n \n \n\n \n \n TwitterFind us on Twitter\n \n \n \n \n FacebookFind us on Facebook\n \n \n \n \n InstagramFind us on Instagram\n \n \n \n \n FacebookFind us on FacebookPoetry Foundation Children\n \n \n \n \n \n TwitterFind us on TwitterPoetry Magazine\n \n \n\n \n \n Contact Us\n \n \n Newsletters\n \n \n Press\n \n \n Privacy Policy\n \n \n Policies\n \n \n Terms of Use\n \n \n Poetry Mobile App\n \n\n \n \n 61 West Superior Street, Chicago, IL 60654\n \n \n Hours: Monday-Friday 11am - 4pm\n \n \n © 2020 Poetry Foundation\n \n\n \n See a problem on this page?\n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n More About This Poem\n \n \n \n \n \n \n \n Baudelaire\n \n By Delmore Schwartz\n \n \n \n \n \n \n \n \n About this Poet\n \n \n \n \n \n \n \n \n \n Delmore Schwartz was born in Brooklyn, New York, the son of Jewish Romanian immigrants. Schwartz offered a number of explanations for what John Ashbery once characterized as his “artificially English-sounding name Delmore.” According to Schwartz’s biographer, James Atlas, “sometimes he would insist he had been...\n \n \n Read Full Biography\n \n \n \n \n \n \n \n \n \n Quick Tags\n \n \n \n Arts & Sciences\n \n \n Poetry & Poets\n \n \n \n \n \n \n \n\n \n window.GLOBAL = {\n VERSION: '1.2.9',\n ENDPOINTS: {\n FILTERS: {\n POEM: 'https://www.poetryfoundation.org/ajax/poems',\n POET: 'https://www.poetryfoundation.org/ajax/poets'\n },\n SEARCH: 'https://www.poetryfoundation.org/ajax/search/autocomplete'\n },\n API_KEY: {\n FB: '112997417630'\n },\n SEARCH: {\n ALGOLIA_APPLICATION_ID: 'F49RVKECKZ',\n ALGOLIA_SEARCH: '30fc951f144dddda94e81f1b7b44fe87',\n ALGOLIA_INDEX: 'prod_POFO',\n },\n };\n \n (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){\n (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');\n ga('create', 'UA-4659065-1', 'auto');\n ga('set', 'anonymizeIp', true);\n\n ga('send', 'pageview');\n {\"@context\":\"http://schema.org\",\"@graph\":[{\"@type\":\"CreativeWork\",\"author\":{\"@type\":\"Person\",\"name\":\"Delmore Schwartz\"},\"copyrightHolder\":{\"@type\":\"Person\",\"name\":\"Delmore Schwartz\"},\"creator\":{\"@type\":\"Person\",\"name\":\"Delmore Schwartz\"},\"datePublished\":\"1967\",\"description\":\"poem\",\"genre\":\"poetry\",\"inLanguage\":\"en\",\"name\":\"Baudelaire\",\"text\":\"When I fall asleep, and even during sleep, I hear, quite distinctly, voices speaking Whole phrases, commonplace and trivial, Having no relation to my affairs. Dear Mother, is any time left to us In which to be happy? My debts are immense. My bank account is subject to the court’s judgment. I know nothing. I cannot know anything. I have lost the ability to make an effort. But now as before my love for you increases. You are always armed to stone me, always: It is true. It dates from childhood. For the first time in my long life I am almost happy. The book, almost finished, Almost seems good. It will endure, a monument To my obsessions, my hatred, my disgust. Debts and inquietude persist and weaken me. Satan glides before me, saying sweetly: “Rest for a day! You can rest and play today. Tonight you will work.” When night comes, My mind, terrified by the arrears, Bored by sadness, paralyzed by impotence, Promises: “Tomorrow: I will tomorrow.” Tomorrow the same comedy enacts itself With the same resolution, the same weakness. I am sick of this life of furnished rooms. I am sick of having colds and headaches: You know my strange life. Every day brings Its quota of wrath. You little know A poet’s life, dear Mother: I must write poems, The most fatiguing of occupations. I am sad this morning. Do not reproach me. I write from a café near the post office, Amid the click of billiard balls, the clatter of dishes, The pounding of my heart. I have been asked to write “A History of Caricature.” I have been asked to write “A History of Sculpture.” Shall I write a history Of the caricatures of the sculptures of you in my heart? Although it costs you countless agony, Although you cannot believe it necessary, And doubt that the sum is accurate, Please send me money enough for at least three weeks.\"},{\"@id\":\"https://www.poetryfoundation.org#identity\",\"@type\":\"Organization\",\"address\":{\"@type\":\"PostalAddress\",\"addressCountry\":\"US\",\"addressRegion\":\"Chicago\",\"postalCode\":\"60654\",\"streetAddress\":\"61 West Superior Street\"},\"contactPoint\":[{\"@type\":\"ContactPoint\",\"contactType\":\"customer support\",\"telephone\":\"+1-312-787-7070\"}],\"description\":\"The Poetry Foundation, publisher of POETRY magazine, is an independent literary organization committed to a vigorous presence for poetry in our culture. It exists to discover and celebrate the best poetry and to place it before the largest possible audience.\",\"email\":\"info@poetryfoundation.org\",\"foundingDate\":\"1912\",\"foundingLocation\":\"Chicago, IL\",\"image\":{\"@type\":\"ImageObject\",\"height\":\"1391\",\"url\":\"https://storage.googleapis.com/pofo-tedflynn-tpf/uploads/contentImages/poetry-foundation-meta-image.png\",\"width\":\"2826\"},\"logo\":{\"@type\":\"ImageObject\",\"height\":\"60\",\"url\":\"https://assets.poetryfoundation.org/uploads/contentImages/_600x60_fit_center-center_82_none/poetry-foundation-meta-image.png?mtime=1497301423\",\"width\":\"122\"},\"name\":\"Poetry Foundation\",\"sameAs\":[\"https://twitter.com/poetryfound\",\"https://www.facebook.com/poetryfoundation\",\"https://en.wikipedia.org/wiki/Poetry_Foundation\",\"https://www.linkedin.com/company/poetry-foundation\",\"https://www.youtube.com/channel/UCUwZOEPg3MrcM5gE4Lbkr-A\",\"https://www.instagram.com/poetryfoundation/\",\"https://www.pinterest.com/poetryfound/\",\"https://vimeo.com/user67230415\"],\"telephone\":\"312-787-7070\",\"url\":\"https://www.poetryfoundation.org\"},{\"@id\":\"https://poetryfoundation.org/#creator\",\"@type\":\"Organization\",\"description\":\"The Poetry Foundation, publisher of POETRY magazine, is an independent literary organization committed to a vigorous presence for poetry in our culture. It exists to discover and celebrate the best poetry and to place it before the largest possible audience.\",\"name\":\"Poetry Foundation\",\"url\":\"https://poetryfoundation.org/\"},{\"@type\":\"BreadcrumbList\",\"description\":\"Breadcrumbs list\",\"itemListElement\":[{\"@type\":\"ListItem\",\"item\":\"https://www.poetryfoundation.org/\",\"name\":\"Homepage\",\"position\":1},{\"@type\":\"ListItem\",\"item\":\"https://www.poetryfoundation.org/poems\",\"name\":\"Poems\",\"position\":2}],\"name\":\"Breadcrumbs\"},{\"@type\":\"Place\",\"address\":{\"@type\":\"PostalAddress\",\"addressCountry\":\"US\",\"addressRegion\":\"Chicago\",\"postalCode\":\"60654\",\"streetAddress\":\"61 West Superior Street\"},\"description\":\"The Poetry Foundation, publisher of POETRY magazine, is an independent literary organization committed to a vigorous presence for poetry in our culture. It exists to discover and celebrate the best poetry and to place it before the largest possible audience.\",\"geo\":{\"@type\":\"GeoCoordinates\",\"latitude\":\"41.8954002\",\"longitude\":\"-87.63025809999999\"},\"image\":{\"@type\":\"ImageObject\",\"height\":\"1200\",\"url\":\"https://media.poetryfoundation.org/uploads/media/default/0001/22/dd785c2bfd95d164f433ace4136a73e8a61f41fa.jpeg?w=1200&h=1200&fit=max\",\"width\":\"1200\"},\"logo\":{\"@type\":\"ImageObject\",\"height\":\"1391\",\"url\":\"https://storage.googleapis.com/pofo-tedflynn-tpf/uploads/contentImages/poetry-foundation-meta-image.png\",\"width\":\"2826\"},\"name\":\"Poetry Foundation\",\"sameAs\":[\"https://twitter.com/poetryfound\",\"https://www.facebook.com/poetryfoundation\",\"https://en.wikipedia.org/wiki/Poetry_Foundation\",\"https://www.linkedin.com/company/poetry-foundation\",\"https://www.youtube.com/channel/UCUwZOEPg3MrcM5gE4Lbkr-A\",\"https://www.instagram.com/poetryfoundation/\",\"https://www.pinterest.com/poetryfound/\",\"https://vimeo.com/user67230415\"],\"telephone\":\"312-787-7070\",\"url\":\"https://www.poetryfoundation.org\"},{\"@type\":\"WebSite\",\"about\":\"The Poetry Foundation, publisher of POETRY magazine, is an independent literary organization committed to a vigorous presence for poetry in our culture. It exists to discover and celebrate the best poetry and to place it before the largest possible audience.\",\"copyrightHolder\":{\"@type\":\"Organization\",\"name\":\"Poetry Foundation\"},\"image\":{\"@type\":\"ImageObject\",\"height\":\"1391\",\"url\":\"https://storage.googleapis.com/pofo-tedflynn-tpf/uploads/contentImages/poetry-foundation-meta-image.png\",\"width\":\"2826\"},\"inLanguage\":\"en\",\"isFamilyFriendly\":true,\"sameAs\":[\"https://twitter.com/poetryfound\",\"https://www.facebook.com/poetryfoundation\",\"https://en.wikipedia.org/wiki/Poetry_Foundation\",\"https://www.linkedin.com/company/poetry-foundation\",\"https://www.youtube.com/channel/UCUwZOEPg3MrcM5gE4Lbkr-A\",\"https://www.instagram.com/poetryfoundation/\",\"https://www.pinterest.com/poetryfound/\",\"https://vimeo.com/user67230415\"],\"url\":\"https://www.poetryfoundation.org\"}]}"
Here we search for a poem I like, which you scroll a little less than halfway to the right you should see.