Output for scrapped data with Text value in CSV file
Output for scrapped data with Text value in CSV file
I am new to web scrapping using Python and need help on extracting Sub category name (Title) and the Page Title (Main Category Header) with the URLs that is being scrapped with my Python code. I tried .text with beautifulsoup but I think there could be better option to do this task as I am getting error and no output once used.
Help would be appreciated. Please take a look at the code and help with Output stored in csv file with URL t Sub category title t Main Category header.
Example: Subcategory URL
Required:
http://www.medicalexpo.com/medical-manufacturer/neonatal-incubator-2963.html Neonatal incubators Pediatrics
http://www.medicalexpo.com/medical-manufacturer/infant-radiant-warmer-13522.html
Infant radiant warmers Pediatrics
http://www.medicalexpo.com/medical-manufacturer/infant-phototherapy-lamp-44327.html Infant phototherapy lamps Pediatrics
Something like this
Code:
from bs4 import BeautifulSoup
import requests
import unicodecsv
import time
import random
def get_soup(url):
return BeautifulSoup(requests.get(url).content, "lxml")
url = 'http://www.medicalexpo.com/'
soup = get_soup(url)
raw_categories = soup.select('div.univers-main li.category-group-item a')
print(raw_categories)
category_links = {}
for cat in (raw_categories):
t0 = time.time()
response_delay = time.time() - t0
time.sleep(10*response_delay)
time.sleep(random.randint(2,5))
soup = get_soup(cat['href'])
links = soup.select('#category-group li a')
category_links[cat.links] = [link['href'] for link in links]
print(category_links)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Comments
Post a Comment