Opened 8 years ago

Closed 7 years ago

#4217 closed defect (fixed)

url in text handling problem

Reported by: rabio Owned by:
Priority: normal Milestone: 0.13
Component: usability Version: hg
Severity: normal Keywords:
Cc: Blocked By:
Blocking: OS: All

Description

Not all urls in messages are converted to clickable form (with libsexy). Example on screenshot

Attachments (1)

screenshot.png (90.1 KB) - added by rabio 8 years ago.
screenshot for libsexy problem

Download all attachments as: .zip

Change History (9)

Changed 8 years ago by rabio

screenshot for libsexy problem

comment:1 Changed 8 years ago by asterix

libsexy is not used here, only in chat windows banner. But your urls all works for me with latest svn. Could you test again, things have changed recently.

comment:2 Changed 8 years ago by rabio

Unfortunately problem still exists. I tried to find what type of urls are not handled but it looks for me that is not particularly connected with some type of urls. When I got massage: XML code on paste.pocoo.org/show/83401/ due to spam protection mechanism.

urls: (1.) h t t p://rss.slashdot.org/~r/Slashdot/slashdot/~3/375217335/article.pl and (2.) h t t p://rss.slashdot.org/~r/Slashdot/slashdot/~3/375253688/article.pl weren't clicable -- all other were. (I add spaces after '' here to workaround spam protection mechanism.)

When I send this again the same urls weren't clicable. But when I resend only part (second slashdot news) the url (1.) was clicable. XML code: paste.pocoo.org/show/83400/

The same when third massage send alone -- all urls made clicable. Second and third send together -- first url for third news (California's Wireless...) not clicable.

NB it can't recognize strings not starting with 'www' or 'http://' as urls at all.

Can you write me what is used for url handling/recognizing in gajim code?

Thanks in advance

comment:3 Changed 8 years ago by asterix

there are indeed problem with URL regex ... some / are considered as itlic marker instead of url in http://

erlehmann?

comment:4 Changed 8 years ago by erlehmann

if ascii formatting is off, stuff works.

comment:5 Changed 8 years ago by asterix

formatting regex has to be updated to detect "/test/ /test/" but not " /testtest/"

a space (or begining of line) is needed before the first / and a space (or end of line) is needed after the second / I think

this is the regex:

r'(?<!\w|\<)' r'/[^\s/]' r'([^/]*[^\s/])?' r'/(?!\w)|'

but I'm not familiar enough with regex

comment:6 Changed 7 years ago by mcepl

The first part is a lookbehind for \w or \< . If I recall correctly, then \< means word boundary, and it cannot appear before / . So the first part effectively means that a word-constituent character cannot appear before the first /. If we wanted to forbid / as well, we would change it to r'(?<![\w/])' . If we want to allow space only, we would use r'(?<!\S)' .

The last component contains a similar component: lookforward. Again, we can replace \w by [\w/] or \S . Then there is a vertical bar; I think that it means that if no other match is found, an empty match at the beginning of the string is returned instead. I would suggest to remove the vertical bar, but that would mean fixing the code handling the return value...

Tu sum up, try this:

r'(?<!\S)' r'/[^\s/]' r'([^/]*[^\s/])?' r'/(?!\S)|'

Feel free to contact me if this approximation is not what you wanted.

About identity: me is not him. Stepan Kasal speaking, abusing Matej Cepl's login. I hope the Snake will forgive me.

comment:7 Changed 7 years ago by johnny

  • Milestone set to 0.13

comment:8 Changed 7 years ago by Yann Leboulanger <asterix@…>

  • Resolution set to fixed
  • Status changed from new to closed

(In [59d3d109ab43]) [Stepan Kasal] fix italic detection with . Fixes #4217

Note: See TracTickets for help on using tickets.