Individual URL scraping

Takeru · June 19, 2024, 2:00am

I am new to SCM.
Is there a way to scrape content within individual URLs?

get the URL of each job page on the job list page
extract the content in each URL obtained in 1.

I was able to do 1, but I don’t know how to do 2.
I would also like to know if it is possible to execute JavaScript when doing 2.
Specifically, I want to retrieve the text contain the @ symbol in the page text for 2. If there is a way to do this without using JavaScript, I would like to know that as well.

Tim · June 19, 2024, 2:01am

Just paste the url of those pages into the google maps scraper

Takeru · June 19, 2024, 3:15am

Does that mean it is impossible to do it all at once?

Tim · June 19, 2024, 4:38am

Yep, just paste in as many urls as you need

Takeru · June 19, 2024, 6:25am

Understood.
It would be helpful to be able to get a list and scrape individual pages, like a loop function.

Tim · June 19, 2024, 6:29am

Not sure what you mean.

You want something to generate list of pages to scrape?

Takeru · June 19, 2024, 6:39am

Sorry for the lack of clarity.
As a result, I want to retrieve the list and the content of the individual items in the list at once.
For example, scraping the content of the Amazon product list page and the individual pages for each of those products at once.
By looping, I mean going from the list page to the individual page, then back to the list page and then to the individual page of the next item.
Is this clear to you?

Tim · June 19, 2024, 6:48am

Gotcha,

Unfortunately it will have to be a 2 step process, that exists in 2 tasks.

Gather all individual pages
Process each page individually

The dynamic scraper doesn’t have any flow control or looping

The proposal would be to allow this via webhooks

eg:

Task runs and scrapes a list of urls
Output set to webhook → post json data
The webhook calls SCM api to duplicate or edit existing task and just update the list of target urls using the output
Webhook runs new task

I mention web hooks because after the google sheets integration is finished I want to allow SCM to start integrating with other external tools via web hooks
eg Webhooks's triggers, queries, and actions - IFTTT

Of course webhooks is complicated and like programming, so I will need to find ways to make it as pain less as possible.

Takeru · June 19, 2024, 6:54am

I see.
It wouldn’t take that much work to get them separately, so I’ll work on them separately now.
Thanks for the support!

Takeru · June 19, 2024, 6:54am

By the way, can I use Webhook?
Is there a documentation page?

Tim · June 19, 2024, 7:14am

not yet ready!

But it will be here:

Feature is here: