How To Web Scrape

Step 1

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import pandas as pd

Step 2

url = 'https://en.wikipedia.org/wiki/The_Bachelorette_(season_16)'
results = requests.get(url)
results

Step 3

soup = BeautifulSoup(results.text, "html.parser")
print(soup.prettify())

Step 4

Step 5

Step 6

Step 7

soup.findAll('table', class_='wikitable')

Step 8

soup = soup.findAll('table', class_='wikitable')
soup = soup[0].find('tbody')
soup

Step 9

x = soup.find_all('tr')[1]
x

Step 10

x.find_all('td')[0].find('a')['title']

Step 11

soup.find_all('tr')[2].find_all('td')[0].find('b').text

Step 12

Step 13

soup.find_all('tr')[3].find_all('td')[0].text.replace('[22]\n','')

Step 14

names = []
for n in range(3,36):
names.append(soup.find_all('tr')[n].find_all('td')[0].text.replace('\n',''))
names
for i in range(len(names)):
num = names[i].index('[')
names[i] = names[i][:num]
names

Step 15

hometown = []
for x in range(3,36):
hometown.append(soup.find_all('tr')[x].find_all('td')[2].text.replace('\n',''))
jobs = []
for n in range(3,36):
jobs.append(soup.find_all('tr')[n].find_all('td')[3].get_text().replace('\n',''))
ages = []
for age in range(3,36):
ages.append(soup.find_all('tr')[age].find_all('td')[1].text.replace('\n',''))
list(map(lambda x: int(x), ages))

Step 16

import pandas as pd
from pandas import DataFrame
df = pd.DataFrame(ages, columns=['ages'])
df['names'] = names
df['hometowns'] = hometown
df['jobs'] = jobs
df

Step 17

new_row1 = {'ages': [31, 36], 'names':['Dale Moss', 'Zac Clark'], 'hometowns': ['Brandon, South Dakota', 'Haddonfield, New Jersey'], 'jobs': ['Former Pro Football Wide Receiver', 'Addiction Specialist']}top_row = pd.DataFrame(new_row1)df = pd.concat([top_row, df]).reset_index(drop = True)
df
All names included

--

--

--

Aspiring Data Scientist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

8 Reasons to use Agile

Encoding, the Doctolib Crash Course

Bricksflow: Databricks development made convenient

Snowflake Connector with Apache Spark(scala)- Source-Sink Connectivity

Stop Doing Coding Tutorials

I am back with Unity DevLog(maybe)

Develop with Java SDK

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Raizel Bernstein

Raizel Bernstein

Aspiring Data Scientist

More from Medium

Create Multiple Pages Screens Using Python 3 Qt Designer

How to Scrape Google Maps in Python?

Mapping an image to pixel art on Excel

Understanding Regex in Depth with Examples Part 1