Initial feedback:
-
I use innerText with body selector because “detect article” skips too much of the content I need. The result is a broken CSV unfortunately, I guess because of the quotes + commas + semicolons inside the scraped content which results in new rows. I’m currently testing this with Google Sheets output, but seeing that it first writes everything into a CSV as well instead of an XLSX or something, I assume I’ll run into the same issue. Edit: yes, same issue. While the Google export does work fine, I end up with way more rows than input URLs because depending on the scraped content it creates a new row.
-
I can’t figure out how to have the input URL in column A of the output file and the scraped content for that URL in column B. This is to be able to later merge the scraped data into existing sheets by matching the URLs within the existing sheets to parse the scraped content into the correct rows.