Web Scraping With Golang



Scraping & Parsing HTML (goquery) Golang mempunyai package net/html, isinya digunakan untuk keperluan parsing HTML. Pada bab ini kita akan belajar parsing HTML dengan cara yang lebih mudah, tidak memanfaatkan package net/html, melainkan menggunakan goquery.Library ini. While it is possible to parse HTML using Go’s standard library, this involves writing a lot of code. So instead we are going to be using the very popular Golang library, Goquery which supports JQuery style selection of HTML elements. Go is a programming language built to resemble a simplified version of the C programming language. It compiles at the machine level. Go was created at Google in 2007 by Robert Griesemer, Rob Pike, and Ken Thompson. #golang #go #scraping What follows is a high-level analysis of screen scraping / web scraping strategies and frameworks for golang, current as of July 2019. State of Affairs.

Today, we’re looking at how you can build your first web application in Go, so open up your IDEs and let’s get started.

Golang web api

GoLang Web App Basic Setup

We’ll have to import net/http, and setup our main function with placeholders.

http.HandleFunc is a function that handles the paths in a url. For example http://www.golangdocs.com/2020/08/23.

  • Here, the index page is linked to the homepage of our site.
  • ListenAndServe listens to the port in the quotes, which is 8000. Once we run our web app, you can find it at localhost:8000.

Next, we need to configure the index page, so let’s create that:

Similar to if you’ve ever worked on Django, our function for index page takes input as a request to a url, and then responds with something. Replace the inside of the index_page function with anything of your choice (the w implies we want to write something), say,

Save this file as “webApp.go”, and we can run the following command in the terminal:

The following page comes up at localhost:8000 –

ResponseWriter Output HTML too

With Golang ResponseWriter, it is possible to directly format the text using HTML tags.

and that gives us the desired output:

This is still just one page, but say you wanted to make your site so that it will not return an error when you type localhost:8000/something_else.

Let’s code for that !

Output:

Voila !

Gorilla Mux for ease of web app development

Let me introduce you to a package named Gorilla Mux, which will make your web development much easier. So first, let’s install it using go get in the terminal.

We’ll do a few changes to our above code and use gorilla mux router instance instead of our indexHandler:

GoLang web application HTML templates

The hard coded design is quite plain and unimpressive. There is a method to rectify that – HTML templates. So let’s create another folder called “templates”, which will contain all the page designs.

We’ll also add a new file here called “index.html”, and add something simple:

Let’s switch back to our main .go file, and import our “html/template” package. Since our templates must be accessible from all handlers, let’s convert it to a global object:

Now we need to tell golang to parse our index.html for the template design and instantiate into our templates object:

Then modify the indexPage handler to contain:

And now if we run it, we’ll have exactly what we wanted.

Using Redis with Go web app

As a brief introduction to Redis, which we’ll be using as our database, they describe themselves best:

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams.

https://redis.io/

So first download and install Redis: https://redis.io/download

Import the go-redis package and declare a global object:

Instantiate the redis client in main function:

and we need to grab some data from the redis server:

and then render into the index.html file:

We’re done configuring our html, which will take the elements from the comments array in our redis client, and place them in our web app.

So now we can open our command line and type in redis-cli to enter the redis shell, where we can push comments into the empty array:

Then if we run our app, you can see that it is now fetching the comments from the server. It would be able to do the same for, say, an AWS server.

Ending Notes

Making a web application can take anywhere from a few days to a few months depending on the complexity of the application. For every button or functionality, there is help in the official Golang documentation, so definitely check that out.

Command

References

In general programming interfaces are contracts that have a set of functions to be implemented to fulfill that contract. Go is no different. Go has great support for interfaces and they are implemented in an implicit way. They allow polymorphism in Go. In this post, we will talk about interfaces, what they are, and how they can be used.

What is an Interface?

An interface is an abstract concept which enables polymorphism in Go. A variable of that interface can hold the value that implements the type. Type assertion is used to get the underlying concrete value as we will see in this post.

Declaring an interface in GoLang

An interface is declared as a type. Here is the declaration that is used to declare an interface.

type interfaceName interface{}

Zero-value of an interface

The zero value of an interface is nil. That means it holds no value and type. The code below shows that.

The empty interface in Go

An interface is empty if it has no functions at all. An empty interface holds any type. That’s why it is extremely useful in many cases. Below is the declaration of an empty interface.

var i interface{}

Implementing an interface in GoLang

An interface is implemented when the type has implemented the functions of the interface. Here is an example showing how to implement an interface.

Implementing multiple interfaces in Go

Multiple interfaces can be implemented at the same time. If all the functions are all implemented then the type implements all the interfaces. Below the type, the bird type implements both the interfaces by implementing the functions.

Composing interfaces together

Interfaces can be composed together. The composition is one of the most important concepts in software development. When multiple interfaces are implemented then the type has performed composition. This is really helpful where polymorphism is needed.

Values in an interface

Interface values have a concrete value and a dynamic type.

In the code above chirper is of type Bird but has a concrete value of {Chirpir}.

Type assertion using the interface

Golang web development

Type assertion is a way to get the underlying value an interface holds. This means if an interface variable is assigned a string then the underlying value it holds is the string. Here is an example showing how to use type assertion using interfaces.

Type switch using an interface

Web Scraper Golang

Golang

Type switches are an extremely similar control structure like the switch-cases, the only difference is here the interface type is used to switch between different conditions.

Equality of interface values

The interface values can be equal if any of the conditions shown below are true.

  • They both are nil.
  • They have the same underlying concrete values and the same dynamic type.

Using interfaces with functions

Golang Web Development Tutorial

Interfaces can be passed to functions just like any other type. Here is an example showing the usage of the interface with functions. A great advantage when using an interface is that it allows any type of argument as we can see in this code below.

Web Scraping With Golang Free

Uses of an interface

Interfaces are used in Go where polymorphism is needed. In a function where multiple types can be passed an interface can be used. Interfaces allow Go to have polymorphism.

Web Scraping With Golang

Golang Web Tutorial

Interfaces are a great feature in Go and should be used wisely.