Author Topic: Bugs and things to fix  (Read 149874 times)

JoB

  • Mage of the Great Restructuring
  • Admiral of a Sunken Ship
  • ******
  • Posts: 4100
Re: Bugs and things to fix
« Reply #180 on: December 30, 2014, 12:30:39 PM »
New datapoint: I had another post with an URL triggering the problem and tried truncating front and end until the post showed again. These two truncated versions work:

24.de/fan-artikel-und-praemien/ 791/splat-chili-spezial-zahnpasta-75ml?c=82

http://www.chili-shop24.de/fan-artikel-und-praemien/ 791/splat-chili-sp

(Hmmmm ... leaving the part after the "?" off leads to two strings of the exact same length ...) One more character and the post turns empty.

P.S.: Putting both into this post leads to an empty post, but the text reappeared when I clicked "edit" ... !?

P.P.S.: Same effect after adding the P.S.. I put an extra space into the two truncated links in hopes that the post'll turn readable then ...

P.P.P.S.: Presto. ::) I'ld say that there's some unintended (word) length limit in some string handling within SMF ...
« Last Edit: December 30, 2014, 12:37:55 PM by JoB »
native: :de: secondary: :us: :fr:
:artd: :book1+: :book2: :book3: :book4: etc.
PGP Key 0xBEF02A15, Fingerprint C12C 53DC BB92 2FE5 9725  C1AE 5E0F F1AF BEF0 2A15

hushpiper

  • Slayer of Silence
  • Scout
  • *
    • Tumblr
  • steam engenius, you see
  • Posts: 327
Re: Bugs and things to fix
« Reply #181 on: December 30, 2014, 01:09:03 PM »
And the post I kept editing in and out of existence turned out to have a link making the difference. Which would IMHO point to a (parsing?) problem in SMF, rather than a corrupt DB ...

Good DB (backup) with current SMF vs. "broken" DB under a downgraded SMF ... ? Or did the updates come with a DB schema change preventing that?

I did check both database dumps, the database structure and schema are identical except for the auto_increment, which I think shouldn't affect anything. I already tested two side-by-side installations with the exact same files that we have on the live forum: the working database works, and the non-working one doesn't. The only difference is the database.

So we're down to a setting or something stored somewhere in the database. My question is, if there's some kind of character limit at work, why's it showing up now instead of earlier...?

hushpiper

  • Slayer of Silence
  • Scout
  • *
    • Tumblr
  • steam engenius, you see
  • Posts: 327
Re: Bugs and things to fix
« Reply #182 on: December 30, 2014, 01:10:09 PM »
Okay: from what I can tell, a random alphanumeric string of 72 characters or more in length beginning with http:// will result in a blank post. If it does not begin with http://, it does not get blanked, regardless of length.

ETA: Been testing to reproduce on this thread in the broken mirror of the site, since just modifying a post over and over seems to be really buggy and unreliable. In fact, the whole thing is not as consistently reproducible as I would like. So far:

Long strings beginning with these do NOT result in a blank post: http, a:, a://, asdf://, hsdf://, ftp:/

Long strings beginning with these DO result in a blank post: http://, https://, ftp://, http:/, http:, ftp:
« Last Edit: December 30, 2014, 01:55:38 PM by hushpiper »

Sunflower

  • Saraswati
  • Admiral of a Sunken Ship
  • *
  • Preferred pronouns: She/her
  • Posts: 4158
Re: Bugs and things to fix
« Reply #183 on: December 30, 2014, 01:18:45 PM »


I followed your link and got a "This Page is Not Available" error message on my Chrome browser.

IIUC, JoB suspects that the character length of the link could trigger The Silence?


EDIT:  That's weird, it didn't quote any of hushpiper's message (as you can see from the BBC code above). 
"The music of what happens," said great Fionn, "that is the finest music in the world."
:chap3:  :chap4:  :chap5:  :book2:  :chap12:  :chap13:  :chap14:   :chap15:  :chap16:

Speak some:  :france:  :mexico:  :vaticancity:  Ein bisschen: :germany:

hushpiper

  • Slayer of Silence
  • Scout
  • *
    • Tumblr
  • steam engenius, you see
  • Posts: 327
Re: Bugs and things to fix
« Reply #184 on: December 30, 2014, 01:30:34 PM »
I followed your link and got a "This Page is Not Available" error message on my Chrome browser.

IIUC, JoB suspects that the character length of the link could trigger The Silence?


EDIT:  That's weird, it didn't quote any of hushpiper's message (as you can see from the BBC code above).

Ah, that's cuz it wasn't a real link, it was a random string of characters I jammed http:// to the beginning of for testing. ;) And yes, you understand correctly. What I'm wondering now is how and if this relates to the issues with quotes and PMs. I'm trawling through my examples of broken posts now, so far they all have 72-character or longer links in them. Will report as I get more info. Those of you with missing PMs, do you know if they had links in them?

Sunflower

  • Saraswati
  • Admiral of a Sunken Ship
  • *
  • Preferred pronouns: She/her
  • Posts: 4158
Re: Bugs and things to fix
« Reply #185 on: December 30, 2014, 01:52:25 PM »
What I'm wondering now is how and if this relates to the issues with quotes and PMs. I'm trawling through my examples of broken posts now, so far they all have 72-character or longer links in them.

Those of you with missing PMs, do you know if they had links in them?

I'm pretty sure all mine had links.  To quote a PM I sent Eich re: the problem:

I now have about *20* sent PMs eaten!!  They run from about Dec. 18 to the present -- but oddly, not EVERY message in that time frame -- about 1 in 10 is preserved.

Most of those, esp. on Dec. 22 and 23, were fairly routine:  "Merry Christmas, _____.  Here's your Christmas music playlist, at Dropbox (long URL follows)."  But a few were fairly lengthy personal correspondence -- so it would be nice to get them back. 


EVERY PM I sent that contained a Dropbox (i.e. long) URL has disappeared.  All my sent PMs that have *no* links are still OK (as far as I can tell -- I haven't done a complete inventory).  Sent PMs with short(ish) links seem OK -- though again, I haven't done a complete inventory. 

Do you think there would be a difference between an embedded link and one where you just paste in the URL? 
« Last Edit: December 30, 2014, 02:03:36 PM by Sunflower »
"The music of what happens," said great Fionn, "that is the finest music in the world."
:chap3:  :chap4:  :chap5:  :book2:  :chap12:  :chap13:  :chap14:   :chap15:  :chap16:

Speak some:  :france:  :mexico:  :vaticancity:  Ein bisschen: :germany:

mithrysc

  • Ranger
  • ****
  • indefinite hiatus
  • Posts: 806
Re: Bugs and things to fix
« Reply #186 on: December 30, 2014, 01:55:59 PM »
I'm fairly certain all my missing PMs had links in them, since all of them were people turning in gifts for the Secret Santa/ me sending out said gifts. I currently have three PMs-with-links remaining, and all those have urls less than 72 characters long.

Sunflower

  • Saraswati
  • Admiral of a Sunken Ship
  • *
  • Preferred pronouns: She/her
  • Posts: 4158
Re: Bugs and things to fix
« Reply #187 on: December 30, 2014, 02:04:26 PM »
What I'm wondering now is how and if this relates to the issues with quotes and PMs. I'm trawling through my examples of broken posts now, so far they all have 72-character or longer links in them.

Those of you with missing PMs, do you know if they had links in them?

I'm pretty sure all mine had links.  To quote a PM I sent Eich re: the problem:

I now have about *20* sent PMs eaten!!  They run from about Dec. 18 to the present -- but oddly, not EVERY message in that time frame -- about 1 in 10 is preserved.

Most of those, esp. on Dec. 22 and 23, were fairly routine:  "Merry Christmas, _____.  Here's your Christmas music playlist, at Dropbox (long URL follows)."  But a few were fairly lengthy personal correspondence -- so it would be nice to get them back. 


EVERY PM I sent that contained a Dropbox (i.e. long) URL has disappeared.  All my sent PMs that have *no* links are still OK (as far as I can tell -- I haven't done a complete inventory).  Sent PMs with short(ish) links seem OK -- though again, I haven't done a complete inventory. 

Do you think there would be a difference between an embedded link and one where you just paste in the URL? 

EDIT:

Whoooooaaa, I tried pasting in a long URL just for chuckles, and that wiped out the previous message.

But short links like the one quoted below don't seem to trigger the problem:

...I'd welcome you all with a key to the city, cake, and coffee (like proper Swedish fika).
« Last Edit: December 30, 2014, 02:09:21 PM by Sunflower »
"The music of what happens," said great Fionn, "that is the finest music in the world."
:chap3:  :chap4:  :chap5:  :book2:  :chap12:  :chap13:  :chap14:   :chap15:  :chap16:

Speak some:  :france:  :mexico:  :vaticancity:  Ein bisschen: :germany:

Sunflower

  • Saraswati
  • Admiral of a Sunken Ship
  • *
  • Preferred pronouns: She/her
  • Posts: 4158
Re: Bugs and things to fix
« Reply #188 on: December 30, 2014, 02:11:41 PM »
I'm fairly certain all my missing PMs had links in them, since all of them were people turning in gifts for the Secret Santa/ me sending out said gifts. I currently have three PMs-with-links remaining, and all those have urls less than 72 characters long.

So it seems like you've narrowed the problem down to PMs/posts with long links in them.

Do we know if *72* characters in the URL is the maximum?
"The music of what happens," said great Fionn, "that is the finest music in the world."
:chap3:  :chap4:  :chap5:  :book2:  :chap12:  :chap13:  :chap14:   :chap15:  :chap16:

Speak some:  :france:  :mexico:  :vaticancity:  Ein bisschen: :germany:

hushpiper

  • Slayer of Silence
  • Scout
  • *
    • Tumblr
  • steam engenius, you see
  • Posts: 327
Re: Bugs and things to fix
« Reply #189 on: December 30, 2014, 02:30:06 PM »
I grabbed the number "72" from the link JoB tested with above, which blanked out at 72 characters but showed at 71, from what he said. Testing on the broken mirror, a 64 character string with http:// on the beginning (72 chars) goes blank; a 63 character string with http:// on the beginning (71) does not.

...Except for when it does. Because it is not at all consistent, which makes it really hard to test different variables. So no, I wouldn't say we're sure, but there does seem to be some kind of correlation there.

ETA: Looks like 64 is the magic number, actually. ftp:// followed by 64 characters seems to consistently blank out, ftp:// followed by 63 doesn't. Same with http://.
« Last Edit: December 30, 2014, 03:38:11 PM by hushpiper »

JoB

  • Mage of the Great Restructuring
  • Admiral of a Sunken Ship
  • ******
  • Posts: 4100
Re: Bugs and things to fix
« Reply #190 on: December 30, 2014, 03:41:28 PM »
What I'm wondering now is how and if this relates to the issues with quotes and PMs.
I note that in a normal quote (with the "[ quote ]" having "author", "link", and "date" parameters), the "Quote from ..." caption above the actual quote is automatically turned into a link ... with an URL typically 60+ characters long, slowly increasing as the (decimal) board and message IDs get more digits ...
native: :de: secondary: :us: :fr:
:artd: :book1+: :book2: :book3: :book4: etc.
PGP Key 0xBEF02A15, Fingerprint C12C 53DC BB92 2FE5 9725  C1AE 5E0F F1AF BEF0 2A15

Sunflower

  • Saraswati
  • Admiral of a Sunken Ship
  • *
  • Preferred pronouns: She/her
  • Posts: 4158
Re: Bugs and things to fix
« Reply #191 on: December 30, 2014, 04:01:28 PM »

ETA: Looks like 64 is the magic number, actually. ftp:// followed by 64 characters seems to consistently blank out, ftp:// followed by 63 doesn't. Same with http://.

Huh, interesting.  64 is 2 to the 6th power -- could it have something to do with how the binary code on the underlying database is structured?  (I have no idea if my speculations are helpful.  I just used "Inspect Element" on my Chrome browser for the first time and it started showing me all this bizarre Web-wizardly stuff -- the underlying HTML code for this Forum, I suppose.  I am full of awe of anyone who can actually understand it.... but I'm going to back away quietly before I break something.)
"The music of what happens," said great Fionn, "that is the finest music in the world."
:chap3:  :chap4:  :chap5:  :book2:  :chap12:  :chap13:  :chap14:   :chap15:  :chap16:

Speak some:  :france:  :mexico:  :vaticancity:  Ein bisschen: :germany:

JoB

  • Mage of the Great Restructuring
  • Admiral of a Sunken Ship
  • ******
  • Posts: 4100
Re: Bugs and things to fix
« Reply #192 on: December 30, 2014, 05:40:48 PM »
Huh, interesting.  64 is 2 to the 6th power -- could it have something to do with how the binary code on the underlying database is structured?
That would imply that the database tries to parse words / links out of the stored content of the entire post (and actually bases the storage format on the result). I didn't dive into any forum software so far, but I don't think I've ever seen anything much different from the DB storing a VARCHAR and the parsing being done in the application on top in such cases - not the least reason being that it'ld take years until changes to the parsing (say, recognizing https:// URLs in addition to http://) would be reliably available in common databases.

Having that said (and not even knowing what language SMF is programmed in), there is parsing going on (to change URLs entered as plain text into links, if nothing else) and powers of two may have meaning there ...

... wait a second! 64 characters in full-blown Unicode coding occupy 4*64 = 256 = 2^8 bytes, i.e. anything beyond that makes a one-byte index roll over. Eich, any chance that the problem started at the same time as one of your attempts to convert the forum to Unicode?
native: :de: secondary: :us: :fr:
:artd: :book1+: :book2: :book3: :book4: etc.
PGP Key 0xBEF02A15, Fingerprint C12C 53DC BB92 2FE5 9725  C1AE 5E0F F1AF BEF0 2A15

hushpiper

  • Slayer of Silence
  • Scout
  • *
    • Tumblr
  • steam engenius, you see
  • Posts: 327
Re: Bugs and things to fix
« Reply #193 on: December 30, 2014, 05:44:58 PM »
Yeah, I came to that same conclusion that unicode might be the issue JoB. I trawled through the code to find anything that looked likely to affect URL parsing or post parsing, I did find a utf8 conversion script that had a couple 63 character cutoffs that looked likely, so I tried updating the mirror site to use utf8. (Updated the table collations to use unicode, installed the utf8 english language pack.) It worked in terms of making alternative character sets work, so that was nice! :P Buuuuuut it didn't affect the blank posts. Maybe I missed something?

JoB

  • Mage of the Great Restructuring
  • Admiral of a Sunken Ship
  • ******
  • Posts: 4100
Re: Bugs and things to fix
« Reply #194 on: December 30, 2014, 06:00:36 PM »
I did find a utf8 conversion script that had a couple 63 character cutoffs that looked likely, so I tried updating the mirror site to use utf8. (Updated the table collations to use unicode, installed the utf8 english language pack.) It worked in terms of making alternative character sets work, so that was nice! :P Buuuuuut it didn't affect the blank posts. Maybe I missed something?
Converting SMFs to Unicode has apparently been done in various, more or less different ways at different times, sometimes with two (maybe more?) official methods at the same time. I'm afraid that we can't be sure short of retracing Eichs exact steps ... which is one reason why I'ld guess trying to identify the problem in the code-as-is still is more likely to succeed than reproducing the timeline.

FWIW, UTF-8 tries to use only one byte per character whenever possible. The "full-blown Unicode coding" with four byte per character I mentioned above would be UTF-32.
native: :de: secondary: :us: :fr:
:artd: :book1+: :book2: :book3: :book4: etc.
PGP Key 0xBEF02A15, Fingerprint C12C 53DC BB92 2FE5 9725  C1AE 5E0F F1AF BEF0 2A15