Output for scrapped data with Text value in CSV file


Output for scrapped data with Text value in CSV file



I am new to web scrapping using Python and need help on extracting Sub category name (Title) and the Page Title (Main Category Header) with the URLs that is being scrapped with my Python code. I tried .text with beautifulsoup but I think there could be better option to do this task as I am getting error and no output once used.



Help would be appreciated. Please take a look at the code and help with Output stored in csv file with URL t Sub category title t Main Category header.



Example: Subcategory URL
Required:


http://www.medicalexpo.com/medical-manufacturer/neonatal-incubator-2963.html Neonatal incubators Pediatrics
http://www.medicalexpo.com/medical-manufacturer/infant-radiant-warmer-13522.html
Infant radiant warmers Pediatrics
http://www.medicalexpo.com/medical-manufacturer/infant-phototherapy-lamp-44327.html Infant phototherapy lamps Pediatrics



Something like this



Code:


from bs4 import BeautifulSoup
import requests
import unicodecsv
import time
import random

def get_soup(url):
return BeautifulSoup(requests.get(url).content, "lxml")

url = 'http://www.medicalexpo.com/'
soup = get_soup(url)
raw_categories = soup.select('div.univers-main li.category-group-item a')
print(raw_categories)
category_links = {}

for cat in (raw_categories):
t0 = time.time()
response_delay = time.time() - t0
time.sleep(10*response_delay)
time.sleep(random.randint(2,5))
soup = get_soup(cat['href'])
links = soup.select('#category-group li a')

category_links[cat.links] = [link['href'] for link in links]
print(category_links)









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Opening a url is failing in Swift

Export result set on Dbeaver to CSV