Extracting quantity elements from name in R
Extracting quantity elements from name in R
I have a Dataset in which one of the column name is "Name" which contains name of the product including the quantity (size) of the product as shown below.
Alkabeer Paratha Plain 400 GM
Almarai Fresh Laban Baladi 2 L
Americana Breaded Chicken Burger 1 KG
Dac Glass Cleaner 4 L
Duru Body Soap Fruity 125 GM - 4 Pcs
Lux Liquid Handwash Soft Touch 250 ML
Lux Liquid Handwash Magical Beauty 250 ML
Lusine Sliced Bread Multi Grain 600 GM
Orinex Containers Bowl 25 Oz - 4 Pcs
Betty Crocker Frosting Vanilla 400 GM
Freshly Microwave Popcorn 3.5 Oz
Gandour Potato Chips 145 Gm
Galaxy Chocolate Milk 40 GM
Nahool Jumbo Roll Strawberry 75 GM - 6 Pcs
Nestle Sweetened Condensed Milk 397 GM
Puck Cheese Triangle Value Pack 120 GM - 5 Pcs
Betty Crocker Super Moist Cake Mix Choco Fudge 500 GM
Some of the products are packed in a crate, for example "Duru Body Soap Fruity 125 GM - 4 Pcs"
I would like to extract both the quantity and the size of the crate ( 0 in case its not a crate).
Quantity is defined by GM, KG,ML,L,Oz and size of Crate by Pcs
Edit :
I would like add some more examples which are producing complication to the process as mentioned by Onyambu.
Signal Complete8 Actions White Toothpaste 120Ml
Fresh Plums Red Per KG
Blemil Plus Baby Milk #2 800 GM
7Up Drink Can 330 ML
Lipton Chai Latte 3 In 1 Classic 25.7 Gm - 7 Pcs
Lusine 6 Burger Buns Plain 400 GM
Farleys Baby Food 3 Fruits 120 GM
Clorox Regular + 40% Extra 3.7 L
Clorox 5 In 1 Disinfectant Cleaner Orange 3 L
Almarai Cheese 6 Portions 108 GM - 2+1 Pcs
3 Cow Feta Cheese Low Salt 200 GM
S-26 Pro Gold Baby Milk #1 900 GM
Additionally, it would be good to know, if it's always the case that the second number always is an indicator on how many products there are per package.
– hannes101
Jun 29 at 8:17
1 Answer
1
library(tidyverse)
dat%>%mutate(s=gsub(".*?(d+.*)","1",V1))%>%
separate(s,c("quantity","crate_size")," - ",fill="right")%>%
replace_na(list(crate_size=0))
V1 quantity crate_size
1 Alkabeer Paratha Plain 400 GM 400 GM 0
2 Almarai Fresh Laban Baladi 2 L 2 L 0
3 Americana Breaded Chicken Burger 1 KG 1 KG 0
4 Dac Glass Cleaner 4 L 4 L 0
5 Duru Body Soap Fruity 125 GM - 4 Pcs 125 GM 4 Pcs
6 Lux Liquid Handwash Soft Touch 250 ML 250 ML 0
7 Lux Liquid Handwash Magical Beauty 250 ML 250 ML 0
8 Lusine Sliced Bread Multi Grain 600 GM 600 GM 0
9 Orinex Containers Bowl 25 Oz - 4 Pcs 25 Oz 4 Pcs
10 Betty Crocker Frosting Vanilla 400 GM 400 GM 0
11 Freshly Microwave Popcorn 3.5 Oz 3.5 Oz 0
12 Gandour Potato Chips 145 Gm 145 Gm 0
13 Galaxy Chocolate Milk 40 GM 40 GM 0
14 Nahool Jumbo Roll Strawberry 75 GM - 6 Pcs 75 GM 6 Pcs
15 Nestle Sweetened Condensed Milk 397 GM 397 GM 0
16 Puck Cheese Triangle Value Pack 120 GM - 5 Pcs 120 GM 5 Pcs
17 Betty Crocker Super Moist Cake Mix Choco Fudge 500 GM 500 GM 0
doing this in Base R:
read.table(sep="-",text=gsub(".*?(d+.*)","1",dat$V1),fill=T,h=F,
col.names = c("Quantity","Crate_Size"),na.strings = "",strip.white = T)
Quantity Crate_Size
1 400 GM <NA>
2 2 L <NA>
3 1 KG <NA>
4 4 L <NA>
5 125 GM 4 Pcs
6 250 ML <NA>
7 250 ML <NA>
8 600 GM <NA>
9 25 Oz 4 Pcs
10 400 GM <NA>
11 3.5 Oz <NA>
12 145 Gm <NA>
13 40 GM <NA>
14 75 GM 6 Pcs
15 397 GM <NA>
16 120 GM 5 Pcs
17 500 GM <NA>
It worked as mentioned, Thanks a lot
– shreekant das
Jun 29 at 9:07
But it would be great if I can extract numeric data before Specific Strings (e.g GM) as it will make it more robust.
– shreekant das
Jun 29 at 9:30
@shreekantdas do you need the numbers only? That would have been much easier..
– Onyambu
2 days ago
Yes, I only need the numbers against the quantity
– shreekant das
4 hours ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
What did you try? What is the expected output? Please post a reproducible example
– Sotos
Jun 29 at 8:09