Topic:Oriental Wealth Stock Interface Illustration
Crawling all information reports on the corporate information section of the Oriental Fortune Online Market
Media reports have an information intermediary role and public opinion in the capital market The role of supervision and access to the number and content of media reports will help us analyze the ins and outs of hot events in the capital market and related public opinion dynamics. Oriental Fortune.com is a professional Internet financial media, which gathers a full range of comprehensive financial information and financial market information. Today we introduce how to crawl all the information content of the Oriental Fortune Internet stock bar section
The recent "Xiaomi car making" incident has aroused heated discussions in the capital market and ordinary online platforms. Take "Xiaomi" as an example.
p>
The first step, enter the Xiaomi Group page of Oriental Fortune.com, click on the "Hong Kong Stock Bar" option to enter the Xiaomi Stock Bar channel
The second step, in the "Information" channel of Xiaomi stock bar, turn the page to observe the URL changes
Not difficult to find! ! !
The URL of the first page of information: ,hk01810,1,f _ 1.html
The URL of the second page of information: ,hk01810,1,f _ 2.html
The information URL of page n:,hk01810,1,f _ n.html
Among them, is the fixed part, hk01810 is the Xiaomi code, f _ 1 is the information page number
Through the above rules, we can construct a url list of different pages of the information channel of any listed company
Directly upload the code
def
get _ url
span>(code
,pages
)
:
''' Get the link list code of Oriental Fortune Internet Stock Bar Refers to the company code page is the number of crawled pages''' url _ list
=
[
]
for page
in
range
(
1
,pages
+
1
)
: url
= f
",{code},1,f _ {page}.html" url _ list
.append
(url
)
return url _ list
The third step, analyze the html law of each page of information content
Obviously, the reading, comment, title, author, and posting events of each information are arranged in similar In the span tag of class = l1 a1, you can easily grab it through BeautifulSoup's css selector
Code implementation
def
get _ news
(url _ list
)
:
' '' Get the news list of Oriental Fortune.com to the local xls url _ list refers to the link list''' headers
=
{
'User-Agent'
: UserAgent
(verify _ ssl
=
False
)
.random
,
'cookie '
:'Your cookie'
}
# Save crawl content outwb
= openpyxl
.Workbook
(
)
# Open a file to be written outws
= outwb
.create _ sheet
(index
=
0
)
# In the file to be written Create sheet outws
.cell
(row
=
1
, column
=
1
, value
=
"read"
) outws
.cell
(row
=
1
, column
=
2
, value
=
"comment"
) outws
.cell
(row
=
1
, column
=
3
, value
=
"title"
) outw s
.cell
(row
=
1
, column
=
4
, value
=
"author"
) outws
. cell
(row
=
1
, column
=
5
, value
=
"renew"
) outws
.cell
(row
=
1
, column
=
6
, value
=
"link"
) index
=
2
for i
in
range
(
len
(url _ list
)
)
: url
= url _ list
[i
] res
= re
.get
(url
,headers
= headers
) res
.encoding
= res
. apparent _ encoding html
= res
.text soup
= BeautifulSoup
(html
,
"html.parser"
) read _ list
= soup
.select
(
".l1.a1"
)
[
1
:
] comment _ list
= soup
.select
(
".l2.a2"
)
[
1
:
] title _ list
= soup
.select
(
".l3.a3"
)
[
1
:
] author _ list
= soup
.select
(
".l4 .a4"
)
[
1
:
] renew _ list
= soup
.select
(
".l5.a5"
)
[
1
:
]
for k
in
range
(
len
(title _ list
)
)
: outws
.cell
(row
= index
, column
=
1
, value
=
str
(read _ list
[k
]
.text
.strip
(
)
)
) outws
.cell
(row
= index
, column
=
2
, value
=
str
(comment _ list
[k
]
.text
.strip
(
)
)
) outws
.cell
( row
= index
, column
=
3
, value
=
str
(title _ list
[k
]
.select
(
'a'
)
[
0
]
[
"title"
]
)
) outws
.cell
(row
= index
, column
=
4
, value
=
str
(author _ list
[k
]
.text
.strip
(
)
)
) outws
. cell
(row
= index
, column
=
5
, value
=
str
(renew _ list
[k
]
.text
.strip
(
)
)
) outws
. cell
(row
= index
, column
=
6
, value
=
str
(title _ list
[k
]
.select
(
'a'
)
[
0
]
[
"href"
]
)
) index
+=
1
(title _ list
[k
]
.select
(
'a'
)
[
0
]
[
"title"
]
, renew _ list
[k
]
.text
.strip
(
)
) sleep
(random
.uniform
(
3
,
4
)
) outwb
.save
(
"Eastern Fortune Network Information.xlsx"
)
Step 3: Run the main program
if _ _ name _ _ == "_ _ main _ _ ": code = "hk01810" pages = 75 url _ list = get _ url(code,pages) get _ news(url _ list) print("Run complete")
The result of running the interface:
The result of saving the file:
So far, we have crawled all the information of Xiaomi stock bar, a total of 5788 pieces of Xiaomi stock bar information have been obtained.
The project is complete The code can be obtained by replying to the keyword "Stock Bar Information Crawler" in the backstage of the official account
Zhihu and public account: Notes of accounting programmers (ID: wylcfy2014)
Irregular push: Python +Stata | Text Analysis + Machine Learning | Finance + Accounting
Label group:[technology news] [Millet]
2022-04-15
2022-04-14
2022-04-14
A-share analysis and research——Xinzhoubang
2022-04-14
2022-04-14
2022-04-14
2022-04-14
2022-04-14
2022-04-13
Why did Apple keep secret from acquiring 23 companies in 16 months?
2022-04-13
2021-03-20
First-hand upgrade of Dongguan Communication General Package
2021-03-19
The bank securities account opening is legal_PAH Securities Co., Ltd.
2021-03-10
The first share of WeChat ecology, "Youzan", the shovel seller at the gold mine
2021-03-19
"In and Out" Oriental Gardens He Qiaonu was in debt to rob someone to expand the site
2021-03-14
The rare monster stock in history_lowest position
2021-03-11
Reading and visualization of stock data based on python (K-line chart)
2021-04-01
2021-03-14
Sit on $9.6 billion in stocks is not enough! Buffett’s new "prey" may be airlines
2021-03-11
2021-03-10