Issues using Dynamic Page Scraper on modern websites (Investing, Yahoo, Redfin)

1. The problem

Hi SCM team,

I’m having consistent issues with the Dynamic Page Scraper. It works fine for light or simple HTML pages, but fails on many modern, heavily protected websites.

Here are some concrete examples:

  1. Investing.com
    https://www.investing.com/equities/nvidia-corp
  • Page loads visually
  • When trying to add selectors, it throws errors
  • Selectors never capture the actual text (empty or incorrect values)
  1. Redfin
    https://www.redfin.com/CA/Burbank/2031-N-Pass-Ave-91505/home/5284386
  • Returns a 403 error
  • Likely blocking bots / automation tools
  1. Yahoo Finance
    https://finance.yahoo.com/quote/NVDA/
  • Page loads, but element picker cannot capture any real content
  • Any selected element results in empty selectors / no actual text

There are more websites I’d like to scrape, but many of them fail in similar ways.

I understand this is challenging because these sites use strong protections (bot detection, automation blocking, JS rendering, etc.). However, it would be very helpful if SCM could support:

  • Custom User-Agent configuration
  • Additional browser fingerprint options
  • Better handling of JS-rendered content
  • Anti-bot mitigation options

Some scrapers are experimenting with open-source AI browser agents, which may provide ideas or inspiration:

I’m not suggesting copying these tools directly, but they may offer useful concepts for improving dynamic scraping reliability.
It would be good to know how to make the Dynamic Page Scraper work on the websites mentioned above.

Thank you

2. Screenshot or task log of the problem




There is 2 limitations.

1- right now the selector tools does fail on some complex sites. Ideally we would find a way to use chromes selection tool. FYI the one we use right now is a couple of years old.

2- as you pointed out some sites have anti scraping code that stops browsers from opening it.

I’m looking at puppeteer for 2 and will need to look for solution’s to 1.

Puppeteer might have some anti scraping mitigation code to reduce fingerprints.

I already have puppeteer working in scm it just requires some user testing.

Trial by fire
:white_check_mark: 1. Investing.com

Finds classes

Pre run validation

Run validation

:cross_mark_button: 2. Redfin
https://www.redfin.com/CA/Burbank/2031-N-Pass-Ave-91505/home/5284386

Failed because cloudflare (even though I completed captcha)
Strong anti bot here

:white_check_mark: 3. Yahoo Finance

Finds classes

Data

Validation

So thats 2/3!

Much better than before.

Patch in next update

Thanks, Tim, for testing this and checking all those cases. I really appreciate the detailed breakdown and the improvements already made. 2/3 working on these modern sites is honestly a big step forward compared to before. Looking forward to trying the next update and seeing how the Redfin/Cloudflare handling improves as well.

Yes the solution was to ignore puppeteer because it gets footprints super easily and banned.

Use native Chrome electron browser.

We loose true thread separation though, ie if scraping browser code gets oom it will take down the entire app with it.

Final piece was to hack a way to get Chrome internal dev tools picker to work via calling Chrome dev tools protocols.

Let me know if you get it working or it needs more polishing.

Imho it needs a few easier ways to automate things like login, button clicking paging maybe?