Self Join In SQL
Each table in a database typically holds a certain sort of data. As a result, in a database, there are sometimes hundreds of tables that are connected to one another. This means that joins are required. Using the Link keyword, you may join various tables based on their common columns. A table can also be joined to itself, which is known as a self join.
What is SQL?
Structured Query Language (SQL) is a programming language that is used to maintain relational databases and execute different operations on the data contained inside them. SQL, which was developed in the 1970s, is now widely used not just by database administrators but also by developers building data integration scripts and data analysts wanting to set up and run analytical queries.
In this blog, we will define a self-join, explain how it works, and when you should use it in your SQL queries.
Table of Contents:
If you are interested in SQL databases, Refer to this youtube video
What is Self Join in SQL
As the names indicate, the self join joins a table to itself. To utilize a self join, the table must include a column (let’s call it X) that operates as the primary key and another column (let’s call it Y) that holds data that can be matched up with the values in Column X. Columns X and Y do not have to have the same value for every given row, and Column Y may even be null.
Using table abbreviations, we can do Self Join. The table abbreviations prevent us from using the same table name again in a single sentence. It will produce an error if we use the same table name more than once in a single query without using table aliases.
Table abbreviations allow us to utilize the temporary name of the table that will be used in the query. The following explanation will help us know the table abbreviations.
Want to Ace the Job Interview of SQL, Here’s a Jackpot for you Top 72 SQL Interview Questions and Answers for 2022
Syntax and Query in Self Join SQL
You may link a table to itself using a self join, It is useful for querying hierarchical data or comparing rows inside the same database.
In a self join, the inner join or left join clause is applied. Because the self join query refers to the same table, the table alias is used to give multiple identities to the same table within the query.
The syntax of a self join is identical to that of connecting two separate tables. We utilize aliases names for tables here because the table names are the same.
To learn more about Self Join SQL, visit our SQL Tutorial!
Examples of Self Join SQL
- When evaluating a hierarchy, the self join is utilized generally. A hierarchy, as we saw previously, assigns one row in a table to another row inside the same table. You may consider it to have parent and child rows.
Here’s an example of the employees and their management. Let’s have look at the Employees table:
Each employee has his/her unique id, which we call “Column X.” The column Manager Id holds the Id of the employee or manager; this is our “Column Y.” If we use these columns to track the employee-manager relationships in this table:
Id | Full Name | Salary | Manager Id |
1 | Chirs Hemsworth | 200000 | 5 |
2 | Tom Holland | 250000 | 5 |
3 | Ben Affleck | 120000 | 1 |
4 | Christian Bale | 150000 | |
5 | Gal Gadot | 300000 | 4 |
- The manager of the employee Chirs Hemsworth is the employee with Id 5, i.e., Gal Gadot.
- The manager of the employee Tom Holland is the employee with Id 5, i.e., Gal Gadot.
- The manager of the employee Ben Affleck is the employee with Id 1, i.e., Chirs Hemsworth.
- The employee Christian Bale does not have a manager; his ManagerId is null.
- The manager of the employee Gal Gadot is the employee with Id 1, i.e., Christian Bale.
In hierarchies, this form of table arrangement is particularly common. Now, we can execute the following query to display the manager’s name for each employee in the same row:
Code:
SELECT employee.Id, employee.FullName, employee.ManagerId, manager.FullName as ManagerName FROM Employees employee JOIN Employees manager ON employee.ManagerId = manager.Id
which returns the following result:
Id | FullName | ManagerId | ManagerName |
1 | Chirs Hemsworth | 5 | Gal Galdot |
2 | Tom Holland | 5 | Gal Galdot |
3 | Ben Affleck | 1 | Chirs Hemsworth |
5 | Gal Galdot | 4 | Christian Bale |
The query collects the columns Id, FullName, and ManagerId from the employee table. It also chooses the FullName column of the table changes are taking place manager and labels it as ManagerName. As an outcome, the ID and name of every employee who has a manager are produced.
If you want to learn about SQL, do not miss the opportunity to enroll in Microsoft SQL Certification Training
The Employees table is connected with itself in this query and performs two functions:
- Role1: It stores employee information (alias employee).
- Role2: It saves management data (alias manager).
By doing so, we are treating the two copies of the Employees table as if they were two separate tables, one for the employees and one for the supervisors.
- A self join may be used to generate pairs of rows based on the criteria in the ON clause. Let’s Begin with a basic example that creates all possible pairings of coworkers. Consider the table below, Colleagues:
Id | FullName | Age |
1 | Gal Galdot | 25 |
2 | Chirs Evans | 70 |
3 | Tom Holland | 35 |
4 | Jon Snow | 38 |
Assume we need to find all possible pairings of coworkers so that everyone gets an opportunity to meet everyone else at the business introducing the event. The SQL code is as follows:
Code:
SELECT teammate1.FullName as Teammate1FullName, teammate1.Age as Teammate1Age, teammate2.FullName as Teammate2FullName, teammate2.Age as Teammate2Age FROM Colleagues teammate1 CROSS JOIN Colleagues teammate2 ON teammate1.FullName <> teammate2.FullName
The result:
Teammate1FullName | Teammate1Age | Teammate2FullName | Teammate2Age |
Chirs Evans | 70 | Gal Galdot | 25 |
Tom Holland | 35 | Gal Galdot | 25 |
Jon Snow | 38 | Gal Galdot | 25 |
Gal Galdot | 25 | Chirs Evans | 70 |
Tom Holland | 35 | Chirs Evans | 70 |
Jon Snow | 38 | Chirs Evans | 70 |
Gal Galdot | 25 | Tom Holland | 35 |
Chirs Evans | 70 | Tom Holland | 35 |
Jon Snow | 38 | Tom Holland | 35 |
Gal Galdot | 25 | Jon Snow | 38 |
Chirs Evans | 70 | Jon Snow | 38 |
Tom Holland | 35 | Jon Snow | 38 |
Every single individual in the table is matched with the outcome. We don’t want anyone partnered with himself or herself, therefore we have the ON clause condition teammate1.FullName <> teammate2.FullName.Because there are four colleagues at this event, each participant will be partnered with three other colleagues.
- In SQL, a self join can be used in connection with one or more separate tables. While this is not a clean self join, it is quite frequent in practice.
A real-world example is flight information in airports, which generates massive amounts of data each hour. Assume we want to look for a flight identification number as well as information about the departure and destination airports. Consider the tables below:
AirportId | Country | City |
1 | India | Delhi |
2 | Germany | Hamburg |
3 | Canada | ottawa |
4 | France | Paris |
5 | Bangladesh | Dhaka |
Table Flight:
FlightId | AirplaneId | StartTimesStamp | EndTimesStamp | StartAirportId | EndAirportId |
1 | 889191 | 2020-01-14 13:00:00 | 2020-01-14 15:00:00 | 3 | 4 |
2 | 868821 | 2020-02-04 01:00:00 | 2020-02-04 16:00:00 | 1 | 5 |
3 | 836755 | 2020-02-15 09:00:00 | 2020-02-15 12:00:00 | 5 | 4 |
4 | 897867 | 2020-02-24 03:00:00 | 2020-02-24 19:00:00 | 4 | 2 |
5 | 834567 | 2020-03-25 10:00:00 | 2020-03-25 12:00:00 | 2 | 1 |
6 | 899905 | 2020-04-01 00:00:00 | 2020-04-01 17:00:00 | 3 | 1 |
7 | 989898 | 2020-01-01 02:00:00 | 2020-01-01 04:00:00 | 1 | 2 |
Make mention that the Airport table’s AirportId column is the foreign key to the Flight table’s StartAirportId and EndAirportId columns. We will connect the Airport table to the Flight table twice, as follows:
- Airport serves as the table with the beginning airports in the first JOIN.
- Airport serves as the table with the destination airports in the second JOIN.
The inquiry is as follows:
SELECT flight.FlightId, flight.AirplaneId, flight.StartAirportId, startairport.Country as StartAirportCountry, startairport.City as StartAirportCity, flight.EndAirportId, endairport.Country as EndAirportCountry, endairport.City as EndAirportCity FROM Flight flight JOIN Airport startairport ON flight.StartAirportId = startairport.AirportId JOIN Airport endairport ON flight.EndAirportId = endairport.AirportId
And the result of the query looks like this:
FlightId | AirplaneId | StartAirportId | StartAirportCountry | StartAirportcity | EndAirportId | EndAirportCountry | EndAirportcity |
1 | 889191 | 3 | Canada | Ottawa | 4 | France | Paris |
2 | 868821 | 1 | India | Delhi | 5 | Bangladesh | Dhaka |
3 | 836755 | 5 | Bangladesh | Dhaka | 4 | France | Paris |
4 | 897867 | 4 | France | Paris | 2 | Germany | Hamburg |
5 | 834567 | 2 | Germany | Hamburg | 1 | India | Delhi |
6 | 899905 | 3 | Canada | Ottawa | 1 | India | Delhi |
7 | 989898 | 1 | India | Delhi | 2 | Germany | Hamburg |
Let us now review the outcome. The first three columns are from a simpleSELECT of the Flight table. The following two columns are from Airport, which serves as the starting airport table; The rows are matched using in AirportId and StartAirportId from the Airport and Flight tables, respectively.
Conclusion
As we’ve learned, the self join SQL is a significant subset of the join. We’ve seen examples of self join applications ranging from processing a hierarchy in a database to matching rows inside a table. We can join the same table numerous times, but each reference must have an alias that defines its purpose. These table abbreviations are used to retrieve columns from this single table based on the role that it is associated with.
You can visit the SQL Community page to get your doubts resolved!
The post Self Join In SQL appeared first on Intellipaat Blog.
Blog: Intellipaat - Blog
Leave a Comment
You must be logged in to post a comment.