Smallpearl - Search Engine Optimization

These are some of the notes from my research on SEO techniques for a website. 'Notes', and not a definitive guide. This is based on the guidelines outlined in Google's own docs and therefore is primarily geared towards improving a site's ranking in Google search.

Make sure your site is registered with google crawler. Though Google crawler will eventually find your site, it's always better to register it formally so the process of indexing your site can begin immediately.

To verify search for "site:yoursite.com" and see if you get results that lists the links to permanent pages within your site as separate search results. If you do, site is registered. If not, goto Search Console and register your site. Registering a site requires making a TXT record on DNS with a unique string that the search console will provide. This process is to verify that you're indeed the owner of the domain that you're trying to register.
Provide a sitemap file with links that point to the important pages in your site.
Make sure your server handles If-Modified-Since request header and returns 304 response code if the site has not been modified since the date value provided to the If-Modified-Since header. To test this, make a regular request to your site:
```
$ curl -I https://www.yoursite.com
HTTP/1.1 200 OK
Server: nginx/1.19.0
Date: Thu, 22 Oct 2020 01:45:53 GMT
Content-Type: text/html
Content-Length: 19963
Last-Modified: Wed, 21 Oct 2020 08:12:22 GMT
Connection: keep-alive
Vary: Accept-Encoding
ETag: "5f8fed66-4dfb"
Accept-Ranges: bytes
```
The Last-Modified response header indicates that last time the site's files were modified. Now make another request supplying this time in the request header.
```
$ curl -I https://www.yoursite.com  -H "If-Modified-Since:Wed, 21 Oct 2020 08:12:22 GMT"
HTTP/1.1 304 Not Modified
Server: nginx/1.19.0
Date: Thu, 22 Oct 2020 01:47:27 GMT
Last-Modified: Wed, 21 Oct 2020 08:12:22 GMT
Connection: keep-alive
ETag: "5f8fed66-4dfb"
```
Server should now respond with status code 304 indicating that the site has not changed since last modification. When crawler sees this response, it'll stop crawling the site again thereby reducing needless bandwidth consumption -- both for the server as well as for the search engine crawler.
If your server has portions that are not to be index by the search engine and you can't control access to them via user authentication, use robots.txt file hints to inform the crawler to skip these pages. Typically this will not be required, but it's good to know. You can also robots.txt to stop the crawler from indexing your site's static assets such as images & scripts.
Avoid pages that require URL parameters. Instead pages should use simple, clear and well terminated URLs.