BS4/Python3 can't open other href while scrapping on google


BS4/Python3 can't open other href while scrapping on google



My job is with a startup and they're calling some businesses but they're buying the contacts. So I had the idea to scrape them from Google, like some hotels, etc...



I can already get the link that opens the Googlemaps with lots of companies but can't take the information inside this link because the program crashes.


import json
from bs4 import BeautifulSoup as bs
from collections import namedtuple
from pprint import pprint
from requests import get
import requests

def remove_escape(s):
return ' '.join(s.split())

def get_jobs(url):
vagas = get(url, headers=headers)
vagas_page = bs(vagas.text, 'html.parser')
boxes = vagas_page.find_all('div', {'class': 'idQ6DBVUh1_8- ptqfrjbX76M'})
for box in boxes:
titulo = box.find('span', {'class': 'ellip'}).text
empresa = box.find('span', {'class': 'rllt__details'}).text
yield vaga(
remove_escape(titulo),
remove_escape(empresa)
)

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}

vaga = namedtuple('Vaga', 'Titulo Empresa')
base_url = 'https://www.google.com.br/'
base_url2 = 'https://www.google.com.br'
job = 'empresas+de+telemarketing'
jobs = '{}search?q={}'.format(base_url, job)
vagas = get(jobs, headers=headers)
vagas_page = bs(vagas.text, 'html.parser')
linke = vagas_page.select('.H93uF a')
esse = (linke[0]['href'])
urls = '{}{}'.format(base_url2, esse)

for url in urls:
print(list(get_jobs(url)))



I don't know if I was clear enough, but you can look at the base link and the target of the urls string.



Also just don't look at the name of the strings, if anyone can help me to make it run.



EDIT 1 : link below with the bug, sorry i've forgotten to do that earlyer
https://imgur.com/a/VrAMgVL





Can you post the error message?
– Winston Yang
Jun 29 at 17:50





See this video youtube.com/watch?v=kktO7IOjpgs and this tutorial and github.com/dunossauro/live-de-python/tree/master/codigo/Live21 refactor your code.
– Regis da Silva
Jun 30 at 3:52









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Opening a url is failing in Swift

Possible Unhandled Promise Rejection (id: 0): ReferenceError: user is not defined ReferenceError: user is not defined