Remove invisible null characters a string's ending
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
For an unknown reason, many strings in one of my VARCHAR(1000) columns have been terminated with invisible characters.
declare @BrokenString varbinary(max)=0x6D0079002000620075006700670065006400200073007400720069006E00670000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003F003F00;
select cast(@BrokenString as nvarchar(max)) -- returns 'my bugged string'
select cast(@BrokenString as nvarchar(max)) + ' is bugged' -- still returns 'my bugged string' !
declare @BrokenStringTable table (Brokey nvarchar(max));
insert into @BrokenStringTable
select cast(@BrokenString as nvarchar(max));
select * from @BrokenStringTable for json auto;
The output from the select * from @BrokenStringTable for json auto;
statement looks like this :
["Brokey":"my bugged stringu0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000??"]
How can I detect which records in my table contains these characters ? It seems using charindex, where+like and any other normal solutions just don't work with these.
sql-server sql-server-2016 varchar
add a comment |
up vote
2
down vote
favorite
For an unknown reason, many strings in one of my VARCHAR(1000) columns have been terminated with invisible characters.
declare @BrokenString varbinary(max)=0x6D0079002000620075006700670065006400200073007400720069006E00670000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003F003F00;
select cast(@BrokenString as nvarchar(max)) -- returns 'my bugged string'
select cast(@BrokenString as nvarchar(max)) + ' is bugged' -- still returns 'my bugged string' !
declare @BrokenStringTable table (Brokey nvarchar(max));
insert into @BrokenStringTable
select cast(@BrokenString as nvarchar(max));
select * from @BrokenStringTable for json auto;
The output from the select * from @BrokenStringTable for json auto;
statement looks like this :
["Brokey":"my bugged stringu0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000??"]
How can I detect which records in my table contains these characters ? It seems using charindex, where+like and any other normal solutions just don't work with these.
sql-server sql-server-2016 varchar
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
For an unknown reason, many strings in one of my VARCHAR(1000) columns have been terminated with invisible characters.
declare @BrokenString varbinary(max)=0x6D0079002000620075006700670065006400200073007400720069006E00670000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003F003F00;
select cast(@BrokenString as nvarchar(max)) -- returns 'my bugged string'
select cast(@BrokenString as nvarchar(max)) + ' is bugged' -- still returns 'my bugged string' !
declare @BrokenStringTable table (Brokey nvarchar(max));
insert into @BrokenStringTable
select cast(@BrokenString as nvarchar(max));
select * from @BrokenStringTable for json auto;
The output from the select * from @BrokenStringTable for json auto;
statement looks like this :
["Brokey":"my bugged stringu0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000??"]
How can I detect which records in my table contains these characters ? It seems using charindex, where+like and any other normal solutions just don't work with these.
sql-server sql-server-2016 varchar
For an unknown reason, many strings in one of my VARCHAR(1000) columns have been terminated with invisible characters.
declare @BrokenString varbinary(max)=0x6D0079002000620075006700670065006400200073007400720069006E00670000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003F003F00;
select cast(@BrokenString as nvarchar(max)) -- returns 'my bugged string'
select cast(@BrokenString as nvarchar(max)) + ' is bugged' -- still returns 'my bugged string' !
declare @BrokenStringTable table (Brokey nvarchar(max));
insert into @BrokenStringTable
select cast(@BrokenString as nvarchar(max));
select * from @BrokenStringTable for json auto;
The output from the select * from @BrokenStringTable for json auto;
statement looks like this :
["Brokey":"my bugged stringu0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000u0000??"]
How can I detect which records in my table contains these characters ? It seems using charindex, where+like and any other normal solutions just don't work with these.
sql-server sql-server-2016 varchar
sql-server sql-server-2016 varchar
asked 3 hours ago
A_V
642715
642715
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
3
down vote
One of the reasons I vastly prefer convert()
over cast()
is that convert()
is much more extensible. For example, you can use a style number to convert a binary value to a string as is. So if 3F00
is always the problematic character:
SELECT CASE
WHEN CONVERT(nvarchar(max), @BrokenString, 1) LIKE N'%3F00'
THEN 'borked' END;
Result:
borked
So you can find all the offending rows (this will not set any speed records) using:
SELECT ... FROM dbo.table
WHERE CONVERT(nvarchar(max), column) LIKE N'%3F00';
Thanks I managed to detect them, off to clean them
– A_V
2 hours ago
add a comment |
up vote
0
down vote
After looking at Aaron's answer, I found a way to remove all of the 0x000 u0000 null characters. This technique uses a bunch of converts as varchar(max) and as Aaron said won't set any speed records but it worked fine.
I wrapped it inside two functions so it ends up being used like this :
select BrokenStringColumn
,dbo.RemoveNullCharacters(BrokenStringColumn) 'FixedColumn'
from BrokenTable
where dbo.ContainsNullCharacters(BrokenStringColumn)=1
The FixedColumn
will contain the entire string without the null characters which previously broke the correct display of my string.
Here is an example
declare @BadString nvarchar(max)= convert(nvarchar(max),0x6d0000007900)
,@GoodString nvarchar(max)= convert(nvarchar(max),0x6d007900)
select @BadString 'badstring' -- result : 'm'
,@GoodString 'goodstring' -- result : 'my'
,@BadString+ 'test' 'failed_concat' -- result : 'm'
,dbo.ContainsNullCharacters(@BadString) 'cb' -- result : 1
,dbo.ContainsNullCharacters(@GoodString) 'cg' -- result : 0
,dbo.RemoveNullCharacters(@BadString)+' test' -- result : 'my test'
Here is the code for the functions
CREATE FUNCTION [dbo].RemoveNullCharacters
(
@String NVARCHAR(MAX)
)
RETURNS nvarchar(max)
AS
BEGIN
RETURN convert(nvarchar(max),convert(varbinary(max),replace(convert(varbinary(max),@String),0x0000,0x)))
END
GO
CREATE FUNCTION [dbo].ContainsNullCharacters
(
@String NVARCHAR(MAX)
)
RETURNS bit
AS
BEGIN
RETURN case when CHARINDEX(0x0000,convert(varbinary(max),@String))>0 then 1 else 0 end
END
1
Simply looking for and deleting0x0000
may result in data corruption. I think the search needs to be a little more sophisticated.
– Andriy M
1 hour ago
Arggh. You're right i.imgur.com/2oIq2mm.png Now can I fix this
– A_V
50 mins ago
A 0x0000 sequence is a null character only if the position it's found at is an odd number. So the check function will be easy to fix. Edit: actually no, it probably won't be, I was too optimistic there. The removal function, not so much.
– Andriy M
13 mins ago
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
One of the reasons I vastly prefer convert()
over cast()
is that convert()
is much more extensible. For example, you can use a style number to convert a binary value to a string as is. So if 3F00
is always the problematic character:
SELECT CASE
WHEN CONVERT(nvarchar(max), @BrokenString, 1) LIKE N'%3F00'
THEN 'borked' END;
Result:
borked
So you can find all the offending rows (this will not set any speed records) using:
SELECT ... FROM dbo.table
WHERE CONVERT(nvarchar(max), column) LIKE N'%3F00';
Thanks I managed to detect them, off to clean them
– A_V
2 hours ago
add a comment |
up vote
3
down vote
One of the reasons I vastly prefer convert()
over cast()
is that convert()
is much more extensible. For example, you can use a style number to convert a binary value to a string as is. So if 3F00
is always the problematic character:
SELECT CASE
WHEN CONVERT(nvarchar(max), @BrokenString, 1) LIKE N'%3F00'
THEN 'borked' END;
Result:
borked
So you can find all the offending rows (this will not set any speed records) using:
SELECT ... FROM dbo.table
WHERE CONVERT(nvarchar(max), column) LIKE N'%3F00';
Thanks I managed to detect them, off to clean them
– A_V
2 hours ago
add a comment |
up vote
3
down vote
up vote
3
down vote
One of the reasons I vastly prefer convert()
over cast()
is that convert()
is much more extensible. For example, you can use a style number to convert a binary value to a string as is. So if 3F00
is always the problematic character:
SELECT CASE
WHEN CONVERT(nvarchar(max), @BrokenString, 1) LIKE N'%3F00'
THEN 'borked' END;
Result:
borked
So you can find all the offending rows (this will not set any speed records) using:
SELECT ... FROM dbo.table
WHERE CONVERT(nvarchar(max), column) LIKE N'%3F00';
One of the reasons I vastly prefer convert()
over cast()
is that convert()
is much more extensible. For example, you can use a style number to convert a binary value to a string as is. So if 3F00
is always the problematic character:
SELECT CASE
WHEN CONVERT(nvarchar(max), @BrokenString, 1) LIKE N'%3F00'
THEN 'borked' END;
Result:
borked
So you can find all the offending rows (this will not set any speed records) using:
SELECT ... FROM dbo.table
WHERE CONVERT(nvarchar(max), column) LIKE N'%3F00';
edited 1 hour ago
answered 2 hours ago
Aaron Bertrand♦
148k18279476
148k18279476
Thanks I managed to detect them, off to clean them
– A_V
2 hours ago
add a comment |
Thanks I managed to detect them, off to clean them
– A_V
2 hours ago
Thanks I managed to detect them, off to clean them
– A_V
2 hours ago
Thanks I managed to detect them, off to clean them
– A_V
2 hours ago
add a comment |
up vote
0
down vote
After looking at Aaron's answer, I found a way to remove all of the 0x000 u0000 null characters. This technique uses a bunch of converts as varchar(max) and as Aaron said won't set any speed records but it worked fine.
I wrapped it inside two functions so it ends up being used like this :
select BrokenStringColumn
,dbo.RemoveNullCharacters(BrokenStringColumn) 'FixedColumn'
from BrokenTable
where dbo.ContainsNullCharacters(BrokenStringColumn)=1
The FixedColumn
will contain the entire string without the null characters which previously broke the correct display of my string.
Here is an example
declare @BadString nvarchar(max)= convert(nvarchar(max),0x6d0000007900)
,@GoodString nvarchar(max)= convert(nvarchar(max),0x6d007900)
select @BadString 'badstring' -- result : 'm'
,@GoodString 'goodstring' -- result : 'my'
,@BadString+ 'test' 'failed_concat' -- result : 'm'
,dbo.ContainsNullCharacters(@BadString) 'cb' -- result : 1
,dbo.ContainsNullCharacters(@GoodString) 'cg' -- result : 0
,dbo.RemoveNullCharacters(@BadString)+' test' -- result : 'my test'
Here is the code for the functions
CREATE FUNCTION [dbo].RemoveNullCharacters
(
@String NVARCHAR(MAX)
)
RETURNS nvarchar(max)
AS
BEGIN
RETURN convert(nvarchar(max),convert(varbinary(max),replace(convert(varbinary(max),@String),0x0000,0x)))
END
GO
CREATE FUNCTION [dbo].ContainsNullCharacters
(
@String NVARCHAR(MAX)
)
RETURNS bit
AS
BEGIN
RETURN case when CHARINDEX(0x0000,convert(varbinary(max),@String))>0 then 1 else 0 end
END
1
Simply looking for and deleting0x0000
may result in data corruption. I think the search needs to be a little more sophisticated.
– Andriy M
1 hour ago
Arggh. You're right i.imgur.com/2oIq2mm.png Now can I fix this
– A_V
50 mins ago
A 0x0000 sequence is a null character only if the position it's found at is an odd number. So the check function will be easy to fix. Edit: actually no, it probably won't be, I was too optimistic there. The removal function, not so much.
– Andriy M
13 mins ago
add a comment |
up vote
0
down vote
After looking at Aaron's answer, I found a way to remove all of the 0x000 u0000 null characters. This technique uses a bunch of converts as varchar(max) and as Aaron said won't set any speed records but it worked fine.
I wrapped it inside two functions so it ends up being used like this :
select BrokenStringColumn
,dbo.RemoveNullCharacters(BrokenStringColumn) 'FixedColumn'
from BrokenTable
where dbo.ContainsNullCharacters(BrokenStringColumn)=1
The FixedColumn
will contain the entire string without the null characters which previously broke the correct display of my string.
Here is an example
declare @BadString nvarchar(max)= convert(nvarchar(max),0x6d0000007900)
,@GoodString nvarchar(max)= convert(nvarchar(max),0x6d007900)
select @BadString 'badstring' -- result : 'm'
,@GoodString 'goodstring' -- result : 'my'
,@BadString+ 'test' 'failed_concat' -- result : 'm'
,dbo.ContainsNullCharacters(@BadString) 'cb' -- result : 1
,dbo.ContainsNullCharacters(@GoodString) 'cg' -- result : 0
,dbo.RemoveNullCharacters(@BadString)+' test' -- result : 'my test'
Here is the code for the functions
CREATE FUNCTION [dbo].RemoveNullCharacters
(
@String NVARCHAR(MAX)
)
RETURNS nvarchar(max)
AS
BEGIN
RETURN convert(nvarchar(max),convert(varbinary(max),replace(convert(varbinary(max),@String),0x0000,0x)))
END
GO
CREATE FUNCTION [dbo].ContainsNullCharacters
(
@String NVARCHAR(MAX)
)
RETURNS bit
AS
BEGIN
RETURN case when CHARINDEX(0x0000,convert(varbinary(max),@String))>0 then 1 else 0 end
END
1
Simply looking for and deleting0x0000
may result in data corruption. I think the search needs to be a little more sophisticated.
– Andriy M
1 hour ago
Arggh. You're right i.imgur.com/2oIq2mm.png Now can I fix this
– A_V
50 mins ago
A 0x0000 sequence is a null character only if the position it's found at is an odd number. So the check function will be easy to fix. Edit: actually no, it probably won't be, I was too optimistic there. The removal function, not so much.
– Andriy M
13 mins ago
add a comment |
up vote
0
down vote
up vote
0
down vote
After looking at Aaron's answer, I found a way to remove all of the 0x000 u0000 null characters. This technique uses a bunch of converts as varchar(max) and as Aaron said won't set any speed records but it worked fine.
I wrapped it inside two functions so it ends up being used like this :
select BrokenStringColumn
,dbo.RemoveNullCharacters(BrokenStringColumn) 'FixedColumn'
from BrokenTable
where dbo.ContainsNullCharacters(BrokenStringColumn)=1
The FixedColumn
will contain the entire string without the null characters which previously broke the correct display of my string.
Here is an example
declare @BadString nvarchar(max)= convert(nvarchar(max),0x6d0000007900)
,@GoodString nvarchar(max)= convert(nvarchar(max),0x6d007900)
select @BadString 'badstring' -- result : 'm'
,@GoodString 'goodstring' -- result : 'my'
,@BadString+ 'test' 'failed_concat' -- result : 'm'
,dbo.ContainsNullCharacters(@BadString) 'cb' -- result : 1
,dbo.ContainsNullCharacters(@GoodString) 'cg' -- result : 0
,dbo.RemoveNullCharacters(@BadString)+' test' -- result : 'my test'
Here is the code for the functions
CREATE FUNCTION [dbo].RemoveNullCharacters
(
@String NVARCHAR(MAX)
)
RETURNS nvarchar(max)
AS
BEGIN
RETURN convert(nvarchar(max),convert(varbinary(max),replace(convert(varbinary(max),@String),0x0000,0x)))
END
GO
CREATE FUNCTION [dbo].ContainsNullCharacters
(
@String NVARCHAR(MAX)
)
RETURNS bit
AS
BEGIN
RETURN case when CHARINDEX(0x0000,convert(varbinary(max),@String))>0 then 1 else 0 end
END
After looking at Aaron's answer, I found a way to remove all of the 0x000 u0000 null characters. This technique uses a bunch of converts as varchar(max) and as Aaron said won't set any speed records but it worked fine.
I wrapped it inside two functions so it ends up being used like this :
select BrokenStringColumn
,dbo.RemoveNullCharacters(BrokenStringColumn) 'FixedColumn'
from BrokenTable
where dbo.ContainsNullCharacters(BrokenStringColumn)=1
The FixedColumn
will contain the entire string without the null characters which previously broke the correct display of my string.
Here is an example
declare @BadString nvarchar(max)= convert(nvarchar(max),0x6d0000007900)
,@GoodString nvarchar(max)= convert(nvarchar(max),0x6d007900)
select @BadString 'badstring' -- result : 'm'
,@GoodString 'goodstring' -- result : 'my'
,@BadString+ 'test' 'failed_concat' -- result : 'm'
,dbo.ContainsNullCharacters(@BadString) 'cb' -- result : 1
,dbo.ContainsNullCharacters(@GoodString) 'cg' -- result : 0
,dbo.RemoveNullCharacters(@BadString)+' test' -- result : 'my test'
Here is the code for the functions
CREATE FUNCTION [dbo].RemoveNullCharacters
(
@String NVARCHAR(MAX)
)
RETURNS nvarchar(max)
AS
BEGIN
RETURN convert(nvarchar(max),convert(varbinary(max),replace(convert(varbinary(max),@String),0x0000,0x)))
END
GO
CREATE FUNCTION [dbo].ContainsNullCharacters
(
@String NVARCHAR(MAX)
)
RETURNS bit
AS
BEGIN
RETURN case when CHARINDEX(0x0000,convert(varbinary(max),@String))>0 then 1 else 0 end
END
edited 1 hour ago
answered 1 hour ago
A_V
642715
642715
1
Simply looking for and deleting0x0000
may result in data corruption. I think the search needs to be a little more sophisticated.
– Andriy M
1 hour ago
Arggh. You're right i.imgur.com/2oIq2mm.png Now can I fix this
– A_V
50 mins ago
A 0x0000 sequence is a null character only if the position it's found at is an odd number. So the check function will be easy to fix. Edit: actually no, it probably won't be, I was too optimistic there. The removal function, not so much.
– Andriy M
13 mins ago
add a comment |
1
Simply looking for and deleting0x0000
may result in data corruption. I think the search needs to be a little more sophisticated.
– Andriy M
1 hour ago
Arggh. You're right i.imgur.com/2oIq2mm.png Now can I fix this
– A_V
50 mins ago
A 0x0000 sequence is a null character only if the position it's found at is an odd number. So the check function will be easy to fix. Edit: actually no, it probably won't be, I was too optimistic there. The removal function, not so much.
– Andriy M
13 mins ago
1
1
Simply looking for and deleting
0x0000
may result in data corruption. I think the search needs to be a little more sophisticated.– Andriy M
1 hour ago
Simply looking for and deleting
0x0000
may result in data corruption. I think the search needs to be a little more sophisticated.– Andriy M
1 hour ago
Arggh. You're right i.imgur.com/2oIq2mm.png Now can I fix this
– A_V
50 mins ago
Arggh. You're right i.imgur.com/2oIq2mm.png Now can I fix this
– A_V
50 mins ago
A 0x0000 sequence is a null character only if the position it's found at is an odd number. So the check function will be easy to fix. Edit: actually no, it probably won't be, I was too optimistic there. The removal function, not so much.
– Andriy M
13 mins ago
A 0x0000 sequence is a null character only if the position it's found at is an odd number. So the check function will be easy to fix. Edit: actually no, it probably won't be, I was too optimistic there. The removal function, not so much.
– Andriy M
13 mins ago
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f222191%2fremove-invisible-null-characters-a-strings-ending%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password