banner



How To Find Duplicate Records In Oracle Using Rowid

I am quite certain I have addressed this earlier. But I could not notice it readily and I needed it today. Then here information technology goes:

I am loading information from some external source. Well actually, from several external sources. The loading of the data is done in a better prophylactic than pitiful way: it is very important that all records come in. At the risk of loading duplicates – we volition get them all!

However, once they are all in, there are more in than we want: I have loaded quite a few duplicate records. That will hurt the later on, and so the duplicates have to go. With Oracle SQL Analytical Functions, that is quite easy. In two steps:

1) identify the duplicates

2) remove the duplicates

The data I am dealing with is Olympic Medals – over the period 1984-2008. I accept found several sources on the internet. I take written a elementary Coffee programme that uses JSoup for screenscraping and later invokes a PL/SQL package to create database records.

The key tabular array – the one that is potentially loaded with the duplicate records – is chosen MEDAL_SCORE. It records for a country (cty_id) how many medals of a certain metallic (aureate, silver, bronze; column medal) it has won during a specific edition of the Summer Olympics (oly_id).

Image

The query to identify duplicates is fairly easy: when the partition defined past oly_id, cty_id and medal has more one record – we have a indistinguishable. We need to say simply one time how many medals of a medal type a state has i in an Olympiad. Therefore the duplicate counting query becomes:

select count(*) from   ( select id          ,      row_number() over ( partition by oly_id, evt_id, cty_id,medal                                     order past id) rn          from   medal_score        ) where  rn>1              

When I execute this query, I discover that I am currently at 1098 duplicates – out of a total of 2544 records. Well, as long as the important ones are all in that location, that is no issue. Peculiarly since removing the duplicates is equally simple equally:

delete from medal_score where  rowid in ( select rwid   from ( select rowid rwid          ,      row_number() over ( partition by oly_id, evt_id, cty_id,medal                                     lodge by id) rn          from   medal_score        )   where  rn>one )              

Image

Screenscraping from Coffee using jsoup - effective data gathering from websites

In a recent article I discussed screenscraping in a in hindsight fairly clumsy way (https://technology.amis.nl/weblog/12786/building-coffee-object-graph-with-tour-de-france-results-using-screen-scraping-java-util-parser-and-contrasted-facilities). While preparing for a series of articles on information visualizations, I had need of statistics regarding the Olympic Games – more specifically: the overall medal count per country during the 2008 Bejing Olympic Games. This […]

Source: https://technology.amis.nl/oracle/remove-duplicate-rows-with-oracle-sql-row_number-intelligent-and-efficient-data-cleansing/

Posted by: paulinoalonese.blogspot.com

0 Response to "How To Find Duplicate Records In Oracle Using Rowid"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel