How can I set timeout option on every request in page.goto function


How can I set timeout option on every request in page.goto function



I use puppeteer to scrab resources in a page. but one of the js request can not success because of connection timeout, and it will block the page.goto('url') function for a long time. I want to skip this js request and continue to request next. so i need to set timeout on every request, but not a total timeout option on page.goto function.



Follow is my code test.js:


const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on('request', request => {
console.log(request.url())
})
await page.goto(process.argv[2], {timeout: 10000}).then( () => {
}, () => {
console.log("timeout");
});
browser.close();



node test.js http://ipv6ready.wanwuyunlian.com:8080/



http://ipv6ready.wanwuyunlian.com:8080/ http://ipv6ready.wanwuyunlian.com:8080/js/bootstrap.min.js
http://ipv6ready.wanwuyunlian.com:8080/js/echarts/echarts.min.js
https://www.google-analytics.com/analytics.js http://ipv6ready.wanwuyunlian.com:8080/js/echarts/macarons.js
https://www.google-analytics.com/analytics.js
(the analytics.js request is very slow becauseof connection-timeout ,this will block page.goto for a longtime, resources left will not be requested, I want to abort this js request and request resources left continue)




2 Answers
2



There are two ways you can tackle this. The first is to use networkidle2 ("consider navigation to be finished when there are no more than 2 network connections for at least 500 ms") instead of the default networkidle0 so that up to two requests can be slow without affecting your code:


networkidle2


networkidle0


const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(process.argv[2], {waitUntil: "networkidle2"}).then( () => {
}, (e) => {
console.error("Error", e);
});
browser.close();



Alternatively, to implement timeouts on individual page requests, I would suggest using a timeout module such as p-timeout:


p-timeout


const pTimeout = require("p-timeout");

const shorterTimeout = 10000;
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', async (request) => {
if (!shouldImplementTimeout(request.url())) {
await request.continue();
}

await pTimeout(request.continue(), shorterTimeout)
.catch((e) => {
console.error(request.url(), "failed:", e);
await request.abort("timedout");
});
})
await page.goto(process.argv[2]).then( () => {
}, (e) => {
console.error("Error", e);
});
browser.close();



You would need to write shouldImplementTimeout, which should return true if the request needs a shorter timeout.


shouldImplementTimeout


true



If you want to just cancel requests based on their URL, there is a mode in puppeteer for that: page.setRequestInterception. A sample from the docs adapted to your use case:


const puppeteer = require('puppeteer');

puppeteer.launch().then(async browser => {
const page = await browser.newPage();

// turn on requests intercepting and cancellation capability
await page.setRequestInterception(true);

page.on('request', interceptedRequest => {

console.log(interceptedRequest.url());

if (interceptedRequest.url().includes("google-analytics.com"))
{
console.log("cancelled!");
interceptedRequest.abort();
}
else
{
interceptedRequest.continue();
}
});
await page.goto('http://ipv6ready.wanwuyunlian.com:8080/');
await browser.close();
});






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

how to run turtle graphics in Colaboratory

Export result set on Dbeaver to CSV