remove characters in xpath / scrapy
remove characters in xpath / scrapy
I use scrapy to extract data and it generates the field (typeFacture) with (''), I want to extract the text and delete ('') to insert it into a database, I want to do that to help from XPATH
HTML code:
<td class="tNorm tSmall-xs">
<b>FACTURE</b>
<br>
''
Commission
''
</td>
my code:
item['typeFacture'] = [item.strip() for item in sel.xpath('//tbody/tr/td[5]/text()').extract()]
result:
'typeFacture': ['',
'',
'Commission',
'',
'',
'Commission',
'',
'',
'Commission',
'',
'',
'Commission',
'',
'',
'Abonnement']}
Please paste your html code, desired and actual result as text instead of links to images.
– running.t
Jun 26 at 10:42
thank you for your advice, I changed, do you have an idea about this problem?
– user_1330
Jun 26 at 10:49
something like this maybe
item['typeFacture'] = [item.strip() for item in sel.xpath('//tbody/tr/td[5]/text()').extract() if item]
– bobrobbob
Jun 26 at 12:16
item['typeFacture'] = [item.strip() for item in sel.xpath('//tbody/tr/td[5]/text()').extract() if item]
Why is commision repeating ? Your html doesn't have it multiple times?
– Tarun Lalwani
Jun 26 at 12:43
1 Answer
1
I found the solution, but not with XPATH.
I use it in a simple python code, before inserting it in the database
item['typeFacture'] = list(filter(None, item['typeFacture']))
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Don't put images, put the code.
– Mathieu
Jun 26 at 10:39