Find opportunities for embedding machine learning into your processes
Lazarina says: "There are three main components of embedding machine learning into your processes. Firstly, the 'What' - which is to increase your efficiency exponentially. Secondly, the 'Why' - which is to focus on building systems that skyrocket performance. And thirdly, the 'How' - which is finding opportunities to utilise machine learning.
Increasing your efficiency is something that every SEO needs to do, especially as the industry becomes more competitive and search engines continuously improve their offering and algorithms. There's a lot of hunger for good SEOs because businesses and individuals are becoming a lot more aware of the importance of developing their online brands and digital presence. They need allies to help them do this efficiently, and it makes sense to think about how to increase your efficiency as an SEO because it can save you a lot of time - which is a very valuable commodity. You need to build systems internally that can free up your time to focus on strategy, and develop scalable systems for your clients.
The most exciting way to become more efficient is to seek opportunities for embedding machine learning into your processes, and there are a couple of different steps you can take to do this. First of all, you need to become familiar with the different types of models, scripts, and tools available out there. Most of them don't even require any coding experience, you just have to get started and get your hands dirty. Then, when you're encountering a new task or project, you just have to think about how you can break this down into things that are more manageable. Finally, you just have to assess the characteristics of the task, and identify which machine learning models, libraries, or scripts can become your allies when completing these tasks. This will give you more time to focus on more scalable initiatives."
What specific SEO tasks are currently not particularly efficient, and can be aided with machine learning?
"Every task that requires you to pull exports from different systems and tools. Most of the time, we get these large chunks of data that need to be audited. The process of data science and analysis is the number one area where you can get a machine learning model or script to become your ally in identifying opportunities. Furthermore, you can easily measure the time you save after implementing a particular script model.
For things like internal linking or technical audits, you can create scripts that actually identify the top opportunities based on machine learning libraries, or even clustering content based on the similarity of this content. Obviously, it will be a lot easier for a Natural Language Processing (NLP) library, or model, to go through the content of your website and cluster it as opposed to you reading the articles and trying to make sense of them. These are great opportunities to rapidly scale your auditing processes."
Is this something that existing SEO auditing tools can offer or does this have to be set up manually without a pre-existing platform?
"It's a mixture of both, and it depends on the access to the tools you have. For instance, if you are working in-house, you might have access to very advanced, expensive tooling, which allows you to get more insights than you'd normally get from doing simple analysis yourself with basic tooling. There are some great tools out there that provide very insightful comments. If you want to do a very in-depth analysis - especially on larger websites - it's always good for you to know how the tools created the insights, so you can replicate it in your processes as well."
Can you recommend any processes and machine learning tools?
"It's quite easy to find all of the Python stuff from the SEO community on Twitter - so I'd definitely recommend following them to get a lot of amazing resources.
In terms of the processes, you can very quickly look into automating things like keywords clustering, extraction of the main keywords for a particular topic, labelling search intent based on the content of the article or based on the title, and looking into how to cluster different content using topic modelling and algorithms. If you get a massive website audit and have a huge export from a tool like Screaming Frog, exploring this data in Python is a great starting point, before incorporating different models based on what the analysis shows you.
A couple of very quick libraries to get started doing this are pandas and NumPy. For visualisation, you can incorporate things like Matplotlib and for natural language processing, things like NLTK's fuzzy matching techniques. Also, there are different clustering algorithms, but k-nearest neighbors (KNN) is the one that works well for clustering different texts.
When you have a particular task, the main thing is to break down what data you are trying to analyse. Is it numerical or is it text? Then, label the task you're trying to do. For instance, if you're analysing text data, are you trying to generate new text, cluster it, or maybe label or classify it? Once you have these two things, you can start searching for algorithms, libraries, or scripts that can help you achieve this task."
Does this mean you could use machine learning to assist you with identifying content opportunities, and determining what you should be writing about next?
"Yes - but this is the second step of topic clustering. The first step is analysing your website and content using machine learning libraries to provide embeddings for all the words in the text. They work by considering the inter-exchangeability of words and topics. For instance, if you have a lot of keywords in your content, you might imagine this is a topic that you are trying to target as well. This is the same assumption most of the tools like Semrush make when they provide you with the list of parent, seed, and 'broad match' keywords.
This analysis during the first step will show you the definitive clusters of content, and where the similarities between these clusters lie. This can give you a lot of opportunities to find out which clusters you can link together, and which are the main keywords for each cluster - so you can guide your users to discover new content.
After this, you can seek out where your topical authority is - based on the content you have. For instance, which topic is the most represented on your website? Does this align with your business proposition? If it doesn't, then you know where to expand, and invest in more content development."
Should SEOs be concerned that quality may deteriorate when so many different tasks are automated?
"This is definitely something I consider when trying to implement any sort of machine learning algorithm. People should consider these scripts, tools, and libraries as allies rather than replacements to a particular process. Imagine this is like someone on their first day in SEO, because these algorithms are not typically designed for our work. There would definitely be a quality checking step after any implementation.
Normally, you will be implementing pre-trained models. It will take additional time to fine-tune it based on the data you have if you're doing that. If you're just looking for a simple output, such as automating meta descriptions and generating them in bulk, then you'd need to quality check the output after using a machine learning tool. Similarly, you'd need to sense check things like automated image alt text generation and captions."
Can you think of anything that shouldn't be automated?
"At this point, I really don't think we should be automating content generation. The creators of GPT-3 and other models have trained them on historical data that is not updated in real-time. Of course, this will probably change in the future. The other issue is that it has a lot of biases and makes a lot of assumptions. Furthermore, when you are generating text, there is no authoritativeness. There is no trust in the text, because it's not fact-checked or providing references.
As SEOs, we know that search engines and users want authoritativeness, trust, and expertise, and we cannot safely say that our automated models can provide this yet. Until they do, I'm not sure how they can be used as the main driver of a content strategy. Things like content and user experience really need the human touch."
How far away are we from a significant percentage of the content on a website being generated automatically?
"I've seen a lot of SEOs currently running experiments with their sites, and they never put the name of the site there. You never know, it might already be the case! I really don't know - but it's an interesting future to think about. I'm sure that Google still states in their guidelines that automatically generated content is not something that is aligned with best practice. Automated content requires editing, fact-checking, and sense-checking. Hopefully, we are far away from this - but you never know."
What's one thing SEOs should stop doing to spend more time investigating the potential opportunities of machine learning?
"If you imagine an on-page optimisation project, you'll have many different mini-workstreams - such as optimising the meta descriptions, optimising the titles, and writing image captions. Break it down into chunks and try to test out all of the different scripts and models that already exist for things like generating titles, meta descriptions, H1 headings, and alt text. See how much easier it is to work with this output. Then, just edit and sense check it, as opposed to trying to generate it yourself. You can now use the time you saved from doing this to create better strategies for scaling sites."
You can find Lazarina Stoy over at LazarinaStoy.com.