Preserving ORDER BY in SELECT INTO

Question

I have a T-SQL query that takes data from one table and copies it into a new table but only rows meeting a certain condition:

SELECT VibeFGEvents.* 
INTO VibeFGEventsAfterStudyStart 
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON 
    CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
    AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
    AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id

The code using the table relies on its order, and the copy above does not preserve the order I expected. I.e. the rows in the new table VibeFGEventsAfterStudyStart are not monotonically increasing in the VibeFGEventsAfterStudyStart.id column copied from VibeFGEvents.id.

In T-SQL how might I preserve the ordering of the rows from VibeFGEvents in VibeFGEventsStudyStart?

@RoyiNamir shorter names = less meaningful, so will it be clearer. — Tony Hopkinson, Jan 20 '13 at 13:30
@TonyHopkinson seeking for `VibeFGEventsStudyStart.MIN_TitleInstID` and seeking if all others are the same name , is painful. — Royi Namir, Jan 20 '13 at 13:31
What does it have to do with Sql server ? I'm talking about asking questions. we dont care about his actual names. we care about his problem. That's why he is here. to ask a question.(clearer = for us , the SO users) — Royi Namir, Jan 20 '13 at 13:39
Royi - I could have shortened the names for the post, sorry, though in my own code I prefer them long for the reason Tony gives. The data is an archive of study data so I can get away with inefficient queries as there's not that much data and it isn't changing. — dumbledad, Jan 20 '13 at 13:40
@RoyiNamir. I see what you are saying, though I (an other SO user) didn't have the same problem. As as far as I can see the column names are irrelevant to the underlying issue, so I ignored them. — Tony Hopkinson, Jan 20 '13 at 13:47
Since relational databases per se really don't have any concept of *order* - what's the point of *preserving* the order upon insert? In general, any relational table **is not ordered** by default; a result set can be ordered **if** you explicitly define an `ORDER BY` clause in your `SELECT` — marc_s, Jan 20 '13 at 13:49
Thanks marc_s. I'll put ORDER BY into the calling code and live without the order I expect in the DB itself. — dumbledad, Jan 20 '13 at 13:56
There is no other way as a database has no concept of order. Never had in SQL. — TomTom, Jan 20 '13 at 13:58
Yeah I read this as the order was being overritten, not that it was never there in the first place. — Tony Hopkinson, Jan 20 '13 at 14:01

score 62 · Answer 1 · answered Apr 23 '14 at 07:17

I know this is a bit old, but I needed to do something similar. I wanted to insert the contents of one table into another, but in a random order. I found that I could do this by using select top n and order by newid(). Without the 'top n', order was not preserved and the second table had rows in the same order as the first. However, with 'top n', the order (random in my case) was preserved. I used a value of 'n' that was greater than the number of rows. So my query was along the lines of:

insert Table2 (T2Col1, T2Col2)
  select top 10000 T1Col1, T1Col2
  from Table1
  order by newid()

Thank you for the actual answer – faddison Jan 24 '20 at 19:37 — faddison, Jan 24 '20 at 19:37

score 22 · Accepted Answer · edited Jun 20 '19 at 16:35

22

What for?

Point is – data in a table is not ordered. In SQL Server the intrinsic storage order of a table is that of the (if defined) clustered index.

The order in which data is inserted is basically "irrelevant". It is forgotten the moment the data is written into the table.

As such, nothing is gained, even if you get this stuff. If you need an order when dealing with data, you HAVE To put an order by clause on the select that gets it. Anything else is random - i.e. the order you et data is not determined and may change.

So it makes no sense to have a specific order on the insert as you try to achieve.

SQL 101: sets have no order.

edited Jun 20 '19 at 16:35

Luke Girvin

13,221
9
64
84

answered Jan 20 '13 at 13:53

TomTom

61,059
10
88
148

2

It is a core concept in SQL - pretty much SQL is set based. The order HAS tobe imposed when materialzing a set (in a select). Unless one defines an order in a SELECT; the results are arbitrary and can theoretically change between calls. The order of the data or inserts is lost the moment the data is in the table. THere is no "hidden natural" order. THis is the core of set based operations. – TomTom Jan 20 '13 at 15:55
Is it more accurate to say that the order is not guaranteed without an order by? Say I bulk insert data from a file into Table1 and then run select * from Table1 - I get the data back in exactly the same order that it was inserted, no?. When will that change? This is important for 3d party apps that migrate / import data via odbc. If the 3d party app doesn't let you apply an order clause, and if you need the data imported "in order" (to avoid record locks, etc.), then you would be well advised to make sure the data is inserted into the table in the desired order? – spioter Sep 09 '14 at 21:21
2

No, not guaranteed. It MAY happen, it may not. If there is a clustered index on table 1 with another order it likely comes in that order. If there is another index due to a where clause it comes in a random order depending how sql server decides to search for things. In a more complex query you may find paralellism using differen threads then merging results. Not guaranteed means you rely on side effects which MAY break. Called super crappy programming. – TomTom Sep 10 '14 at 00:21
13

Sometimes it's just nice to see the data in a certain order by default. It's not needed, you're right, but some people prefer it. – David Wilson Jun 08 '16 at 20:39
3

How about tables without clustered indexes, like those with UUID primary keys ? – NielsK Jun 10 '16 at 10:56
3

Or dealing with code that uses the sequence of primary keys in its functionality. – Bon Jul 20 '16 at 18:26
Order seems to matter if the place you are inserting into has a constraint and you are using INSERT IGNORE – William Entriken Jul 01 '19 at 19:24
@NielsK the clustered index will only cause data to be returned in order if the *query plan* uses it. For a simple `SELECT * FROM JustThisOneTable` it will work as expected. It is still a code smell however, simply because of the complexities we have seen with it leading to nontransparent behavior if you omit an `ORDER BY` clause (in other words, write readable code by not relying on sophisticated assumptions). Also, a query hint to force use the clustered index is another bad idea, because you should leverage the statistics SQL computes to estimate an optimal query plan. – Elaskanator Oct 11 '19 at 19:42
17

"What for" doesn't help anyone. There were many scenarios where I'd needed to save data into a temporary table so I can repair the data a few days later. It wasn't meant as a permanent storage but as a temporary solution for data repair. Anyway, the next answer should be vote up. – jjthebig1 Oct 27 '19 at 19:55
Literally ran into this writing inserts with expected ordering, such that the consumer of a temp table expects to iterate over it in a certain order. Ordering the insert means less intermediary and redundant code to complete the same operation. – Captain Prinny Nov 19 '19 at 16:56
3

Thing that is getting me is that, when using columnstore, insert order _does_ matter... it allows for rowgroup omission and much faster queries... in fact leveraging insert order is recommended by MS here for that very purpose: https://learn.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-performance?view=sql-server-ver15#1-organize-data-to-eliminate-more-rowgroups-from-a-full-table-scan – Kram Jul 16 '20 at 23:34
3

This is not an answer, it's a comment. It's disappointing that it got 25 votes as such. – Juan Perez Oct 08 '21 at 19:52
2

I need the auto-generated primary keys to have the same order as what is being copied into the table, so that I can then select them and know what they mapped to, for further processing. So there's a reason for you. :-) – Brian Birtle Feb 23 '22 at 12:41
Sorry but pure logic-wise this makes no sense: "data in a table is not ordered...///... the intrinsic storage order of a table is that of the...." seems very comparable to "In X, Y==false, the essential value of Y==true." How can you say "data in a table is not ordered" then say "the intrinsic storage order of a table". Seems you are a true SQL expert, and definitely appreciate your contribution, am sure there is a LOT of (mainly) useful truth here you are trying to impart... but... – MemeDeveloper Jun 15 '23 at 20:02
Think the "take home" here I think is "not guaranteed" (GREAT ADVICE). Which is v. v. important, but am not sure that always: "not guaranteed" == "super crappy programming". The web is founded on "likely but not guaranteed"... packets get lost, arrive in different orders etc etc. Just try hand coding an complex HTML email template that renders perfectly in every modern email client... "not guaranteed" is basically what every developer intention comes up against. Maybe in the world of top level purely enterprise DBAs you just deal with "not guaranteed" vs "guaranteed" but... – MemeDeveloper Jun 15 '23 at 20:16
...in my world it's pretty much a mix moment to moment. We as developers endeavour to "guarantee" as much as possible, but end of the day how would your Unit/Integration/Functional/End-to-End/smoke tests perform when this happens: https://www.theguardian.com/world/2011/apr/06/georgian-woman-cuts-web-access. We live in a world of probability. Surely increasing the probability that your data is ("intrinsic"ally) ordered in a particular way is I think the "What for?" – MemeDeveloper Jun 15 '23 at 20:16
Or is your point that matching the "intrinsic storage order" in a table is actually not going to improve (the likelihood of improving) the optimizer's ability to quickly query by said same order. If that's really true - I am genuinely surprised. – MemeDeveloper Jun 15 '23 at 20:19
But I am certainly no SQL expert - just a user/dev. Genuinely interested in your post - guessing you know all sorts of things about SQL server I have no inkling of... sometimes things are truly surprising under the bonnet. Basically will you please expand on your thoroughly intriguing post. I am hooked... thanks – MemeDeveloper Jun 15 '23 at 20:24
N.B. reason I ended up here is super basic/simple/lame: in this case I have tables with Id bigint IDENTITY(1,1) NOT NULL, and timebased data where Timestamp is vital to most of the queries. I imported 200GB of said data from an AWS RDS Postgres instance into a "staging" table using multiple concurrent threads/DB connections. Now I would like to INSERT INTO ...//... SELECT FROM and end up with the data ordered in the final table so that Timestamp and incrementing ID are aligned. – MemeDeveloper Jun 15 '23 at 20:27
I guess your point is that my wish to do this is naïve and highlights a level of ignorance that is to you surprising. To me... seems logical to want these things to line up/these ducks to be in a(n ordered) row! ;) – MemeDeveloper Jun 15 '23 at 20:29
This "answer" shows a lack of understanding of the vast complexity of situations that developers need to deal with. – Matt Small Jun 20 '23 at 14:28

score 10 · Answer 3 · answered Dec 13 '20 at 12:57

10

Just add top to your sql with a number that is greater than the actual number of rows:

SELECT top 25000 * 
into spx_copy
  from SPX
  order by date

answered Dec 13 '20 at 12:57

Greg Gum

33,478
39
162
233

2

By adding TOP, I get the SQL to run. However, they are still loaded into the destination table as if the "ORDER BY" was not there .... I tested the command separately, and as a select it still works correctly, and orders it. Any ideas? – JosephDoggie Feb 01 '21 at 17:24
1

I could swear it worked for me, and then now it doesn't. Adding an identity column ("SELECT ..., _dummy = identity(int) INTO ...") fixed it for me. – bwperrin Apr 19 '22 at 23:21
1

1. This is an ABSOLUTELY STUPID PROBLEM TO HAVE MICROSOFT 2. Thank you Greg :) – Illegal Operator Nov 28 '22 at 07:27

score 5 · Answer 4 · answered May 06 '19 at 12:46

I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:

Amount of rows is very big (from 200 to 2000 millions of rows), so we are using SELECT INTO instead of CREATE TABLE + INSERT because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.
We need the data to be ordered by specific columns for query performances, so we are creating a CLUSTERED INDEX just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.

It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO is faster than loading without order and creating the clustered index later.

The problem is that SQL Server 2012+ will ignore the ORDER BY column list when doing INSERT INTO or when doing SELECT INTO. It will consider the ORDER BY columns if you specify an IDENTITY column on the SELECT INTO or if the inserted table has an IDENTITY column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.

A trick we have found is that doing a SELECT INTO with the result of a UNION ALL makes the engine perform a SORT (not always an explicit SORT operator, sometimes a MERGE JOIN CONCATENATION, etc.) if you have an ORDER BY list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.

So you can rewrite this query:

SELECT
    FirstColumn = T.FirstColumn,
    SecondColumn = T.SecondColumn
INTO
    #NewTable
FROM
    VeryBigTable AS T
ORDER BY            -- ORDER BY is ignored!
    FirstColumn,
    SecondColumn

to

SELECT
    FirstColumn = T.FirstColumn,
    SecondColumn = T.SecondColumn
INTO
    #NewTable
FROM
    VeryBigTable AS T

UNION ALL

-- A "fake" row to be deleted
SELECT
    FirstColumn = 0,
    SecondColumn = 0

ORDER BY
    FirstColumn,
    SecondColumn

We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.

score 2 · Answer 5 · answered Oct 17 '18 at 18:28

2

You cannot do this with ORDER BY but if you create a Clustered Index on VibeFGEvents.id after your SELECT INTO the table will be sorted on disk by VibeFGEvents.id.

answered Oct 17 '18 at 18:28

Cyndi Baker

670
8
15

score 2 · Answer 6 · answered Aug 25 '20 at 08:16

I'v made a test on MS SQL 2012, and it clearly shows me, that insert into ... select ... order by makes sense. Here is what I did:

create table tmp1 (id int not null identity, name sysname);
create table tmp2 (id int not null identity, name sysname);

insert into tmp1 (name) values ('Apple');
insert into tmp1 (name) values ('Carrot');
insert into tmp1 (name) values ('Pineapple');
insert into tmp1 (name) values ('Orange');
insert into tmp1 (name) values ('Kiwi');
insert into tmp1 (name) values ('Ananas');
insert into tmp1 (name) values ('Banana');
insert into tmp1 (name) values ('Blackberry');

select * from tmp1 order by id;

And I got this list:

1 Apple
2 Carrot
3 Pineapple
4 Orange
5 Kiwi
6 Ananas
7 Banana
8 Blackberry

No surprises here. Then I made a copy from tmp1 to tmp2 this way:

insert into tmp2 (name)
select name
from tmp1
order by id;

select * from tmp2 order by id;

I got the exact response like before. Apple to Blackberry. Now reverse the order to test it:

delete from tmp2;

insert into tmp2 (name)
select name
from tmp1
order by id desc;

select * from tmp2 order by id;

9 Blackberry
10 Banana
11 Ananas
12 Kiwi
13 Orange
14 Pineapple
15 Carrot
16 Apple

So the order in tmp2 is reversed too, so order by made sense when there is a identity column in the target table!

score 1 · Answer 7 · answered Feb 07 '19 at 02:45

1

The reason why one would desire this (a specific order) is because you cannot define the order in a subquery, so, the idea is that, if you create a table variable, THEN make a query from that table variable, you would think you would retain the order(say, to concatenate rows that must be in order- say for XML or json), but you can't. So, what do you do? The answer is to force SQL to order it by using TOP in your select (just pick a number high enough to cover all your rows).

answered Feb 07 '19 at 02:45

MC9000

2,076
7
45
80

[Another answer](https://stackoverflow.com/a/23237448/2799848) mentioned this already. Please elaborate why using `TOP` makes it work – Elaskanator Oct 11 '19 at 21:05
2

This is by design (and it makes intuitive sense since TOP would make no sense without an ordering of the return set). Like I clarified: "... force SQL to order it by using TOP ..." in my comment above (yes, this was mentioned already, but I hoped to clarify it for others. – MC9000 Oct 16 '19 at 02:30

score 1 · Answer 8 · answered Aug 15 '19 at 14:50

I have run into the same issue and one reason I have needed to preserve the order is when I try to use ROLLUP to get a weighted average based on the raw data and not an average of what is in that column. For instance, say I want to see the average of profit based on number of units sold by four store locations? I can do this very easily by creating the equation Profit / #Units = Avg. Now I include a ROLLUP in my GROUP BY so that I can also see the average across all locations. Now I think to myself, "This is good info but I want to see it in order of Best Average to Worse and keep the Overall at the bottom (or top) of the list)." The ROLLUP will fail you in this so you take a different approach.

Why not create row numbers based on the sequence (order) you need to preserve?

    SELECT OrderBy = ROW_NUMBER() OVER(PARTITION BY 'field you want to count' ORDER BY 'field(s) you want to use ORDER BY')
    , VibeFGEvents.*  
    FROM VibeFGEvents
    LEFT OUTER JOIN VibeFGEventsStudyStart
    ON 
        CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
        AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
        AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
    WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL

Now you can use the OrderBy field from your table to set the order of values. I removed the ORDER BY statement from the query above since it does not affect how the data is loaded to the table.

score 1 · Answer 9 · answered Aug 29 '22 at 20:31

1

I found this approach helpful to solve this problem:

WITH ordered as
(
    SELECT TOP 1000
    [Month]
    FROM SourceTable
    GROUP BY [Month]
    ORDER BY [Month]
)

INSERT INTO DestinationTable (MonthStart)
(
    SELECT  * from ordered
)

answered Aug 29 '22 at 20:31

richardprocter

125
1
2

score -1 · Answer 10 · edited Jan 20 '13 at 15:50

-1

Try using INSERT INTO instead of SELECT INTO

INSERT INTO VibeFGEventsAfterStudyStart 
SELECT VibeFGEvents.* 
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON 
    CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
    AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
    AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id`

edited Jan 20 '13 at 15:50

dumbledad

16,305
23
120
273

answered Jan 20 '13 at 14:03

Geo2013

11
1

1

A table has no order. See the other comments in this question. – usr Jan 20 '13 at 14:05
1

I do understand table has no order in this scenario. The point is rows can be inserted using INSERT INTO with a sub query that can be ordered. – Geo2013 Jan 20 '13 at 14:38
A simple observation: In this case, the `ORDER BY` clause is for `SELECT VibeFGEvents.* FROM ...` statement and not for `INSERT` statement. – Bogdan Sahlean Jan 20 '13 at 14:58
2

It does not matter what order you insert in. When selecting, that order is gone. You cannot get it "out" again. – usr Jan 20 '13 at 15:13
Except there is implicit ordering by the first column which often can be an sequential integer identity primary key. – Bon Jul 20 '16 at 18:28
Using an `ORDER BY` with an `INSERT` statement [appears to only matter](https://stackoverflow.com/a/56005568/2799848) when populating the `IDENTITY` column. – Elaskanator Oct 11 '19 at 21:09

Preserving ORDER BY in SELECT INTO

10 Answers10

Linked

Related