問題描述
我正在使用 MaxMind 免費數據庫進行 IP 查找.我將數據轉換成下表:
I am using MaxMind free databases to do IP lookups. I convert the data to the following table:
CREATE TABLE [dbo].[GeoBlocks](
[StartIPNum] [varchar](50) NULL,
[EndIPNumb] [varchar](50) NULL,
[LocationNum] [varchar](50) NULL,
[PostalCode] [varchar](50) NULL,
[Latitude] [varchar](50) NULL,
[Longitude] [varchar](50) NULL)
這個查找表中大約有 350 萬條記錄.
There are about 3.5M records in this lookup table.
我的目標是通過查找 IP 在 StartIPNum 和 EndIPNum 之間的記錄來確定 IP(十進制形式)的 LocationNum
My goal is to determine the LocationNum for an IP(decimal form) by finding the record where the IP is between StartIPNum and EndIPNum
我的存儲過程如下所示:參數:@DecimalIP bigint
My stored procedure looks like this: Parameter: @DecimalIP bigint
select GeoBlocks.StartIPNum ,@DecimalIP as DecimalIp
,GeoBlocks.Postalcode ,GeoBlocks.Latitude as Latitude
,GeoBlocks.Longitude as Longitude
from GeoBlocks
where @DecimalIP between GeoBlocks.StartIPNum and GeoBlocks.EndIPNumb
我在 StartIPNum 和 EndIPNum 上創建了唯一索引.
I have created unique indexes on StartIPNum and EndIPNum.
但是,當我運行它時,SQL Server 會對查詢的 Where 部分進行表掃描.此查詢需要 650-750 毫秒.(我服務器上的大多數查詢需要 0-2 毫秒)
However, when I run this, SQL server does a table scan for the Where portion of the query. This query takes 650-750ms. (Most queries on my server take 0-2ms)
如何加快查詢速度?
添加示例數據:
StartIPNum EndIPNumb LocationNum PostalCode Latitude Longitude
1350218632 1350218639 2782113 48.2000 16.3667
1350218640 1350218655 2782113 48.2000 16.3667
1350218656 1350218687 2782113 48.2000 16.3667
1350218688 1350218751 2782113 48.2000 16.3667
1350218752 1350218783 2782113 48.2000 16.3667
推薦答案
更新:
總結分散在各種評論中的信息:
To summarize information scattered among various comments:
IP 地址列是
VarChar(50)
字符串,包含沒有左填充的十進制值.這些列上的索引將按字母順序而不是數字順序對它們進行排序,即10"<2".(使用左填充,排序在數字上也是正確的:10">02".)
The IP address columns are
VarChar(50)
strings containing decimal values without left padding. An index on those columns will sort them alphabetically, not numerically, i.e. "10" < "2". (With left padding the sort will be correct numerically as well: "10" > "02".)
WHERE
子句( where @DecimalIP 在 GeoBlocks.StartIPNum 和 GeoBlocks.EndIPNumb
之間)使用混合數據類型.@DecimalIP
是一個 BIGINT
而兩列是 VarChar(50)
.SQL 通過實現數據類型優先級方案來處理混合數據類型之間的操作.(Ref.)這會導致每行中的 IP 地址被轉換從字符串到 BIGINT
值,因此比較以數字方式完成,并且以相當大的成本返回預期"結果.在這種情況下,索引(幾乎)毫無用處.
The WHERE
clause (where @DecimalIP between GeoBlocks.StartIPNum and GeoBlocks.EndIPNumb
) uses mixed datatypes. @DecimalIP
is a BIGINT
while the two columns are VarChar(50)
. SQL handles operations among mixed datatypes by implementing a data type precedence scheme. (Ref.) This causes the IP addresses in each row to be converted from strings to BIGINT
values, hence the comparison is done numerically and the "expected" results are returned at a considerable cost. The indexes are (all but) useless in this case.
將列更改為 BIGINT
將允許使用索引來提高性能并確保比較按數字而不是按字母順序進行.包含 StartIPNum
和 EndIPNumb
列的單個索引將大大提高性能.請注意,如果不允許重疊地址范圍,則索引在 StartIPNum
上將有效地唯一,并且可以用 StartIPNum
上的索引替換為 EndIPNumb
作為包含列的性能.
Changing the columns to BIGINT
will allow the use of an index to improve performance and ensure that comparisons are done numerically rather than alphabetically. An single index containing both the StartIPNum
and EndIPNumb
columns will greatly improve performance. Note that if overlapping address ranges are not allowed then the index will effectively be unique on StartIPNum
and could be replaced with an index on StartIPNum
with EndIPNumb
as an included column for performance.
原答案:
如果您使用點號表示的 IPV4 地址,例如192.168.0.42",您可以使用此 UDF 將字符串轉換為 BIGINT
值:
If you are using IPV4 addresses in dotted notation, e.g. "192.168.0.42", you can convert the strings into BIGINT
values with this UDF:
create function [dbo].[IntegerIPV4Address]( @IPV4Address VarChar(16) )
returns BigInt
with SchemaBinding
begin
declare @Dot1 as Int = CharIndex( '.', @IPV4Address );
declare @Dot2 as Int = CharIndex( '.', @IPV4Address, @Dot1 + 1 );
declare @Dot3 as Int = CharIndex( '.', @IPV4Address, @Dot2 + 1 );
return Cast( Substring( @IPV4Address, 0, @Dot1 ) as BigInt ) * 0x1000000 +
Cast( Substring( @IPV4Address, @Dot1 + 1, @Dot2 - @Dot1 - 1 ) as BigInt ) * 0x10000 +
Cast( Substring( @IPV4Address, @Dot2 + 1, @Dot3 - @Dot2 - 1 ) as BigInt ) * 0x100 +
Cast( Substring( @IPV4Address, @Dot3 + 1, Len( @IPV4Address ) * 1 ) as BigInt );
end
您可以根據函數結果存儲整數值或在計算列上創建索引.請注意,您需要更改查詢以引用 WHERE
子句中的整數列.
You can either store the integer values or create an index on a computed column based on the functions result. Note that you need to change your query to reference the integer column in the WHERE
clause.
如果您將值存儲為整數,以下函數會將它們轉換回規范化字符串,其中地址的每個部分都是三位數.這些值可用于比較,因為它們將按字母順序和數字順序排序.
If you store the values as integers the following function will convert them back to normalized strings where each part of the address is three digits. These values can be used in comparisons since they will sort the same way both alphabetically and numerically.
create function [dbo].[NormalizedIPV4Address]( @IntegerIPV4Address as BigInt )
returns VarChar(16)
with SchemaBinding -- Deterministic function.
begin
declare @BinaryAddress as VarBinary(4) = Cast( @IntegerIPV4Address as VarBinary(4) );
return Right( '00' + Cast( Cast( Substring( @BinaryAddress, 1, 1 ) as Int ) as VarChar(3) ), 3 ) +
'.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 2, 1 ) as Int ) as VarChar(3) ), 3 ) +
'.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 3, 1 ) as Int ) as VarChar(3) ), 3 ) +
'.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 4, 1 ) as Int ) as VarChar(3) ), 3 )
end
您可以對表中的字符串值進行往返,將它們全部轉換為規范化"形式,以便使用這兩個函數對它們進行正確排序.不是一個理想的解決方案,因為它需要對所有未來的插入和更新進行規范化,但目前可能會有所幫助.
You could round-trip the string values in your table to get them all into "normalized" form so that they sort correctly by using both functions. Not an ideal solution since it requires that all future inserts and updates be normalized, but it may help for the moment.
這篇關于在搜索之間加速的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!