Quantcast

utf8 bibtex file; should it be auto-recognized?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

utf8 bibtex file; should it be auto-recognized?

James Howison
Hi,

PLOS is publishing their bibtex in utf8 (as a downloaded .bib file).  
Which is fine, if one opens the file with utf8 encoding.  However when  
I double click it, BibDesk (1.3.18) gives the "Unable to parse string  
as BibTeX" error, which suggests editing, but not trying a different  
encoding.

I just wondered whether bibdesk ought to be able to assess the  
encoding of the file (TextMate seems to be able to), or whether this  
error message might suggest trying a different encoding?

The example bibtex downloads from here (download citation)

http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000108

Thanks,
James

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

James Howison-3
For the record, I hadn't read the previous thread while writing this :)

On Aug 21, 2008, at 3:34 PM, James Howison wrote:

> Hi,
>
> PLOS is publishing their bibtex in utf8 (as a downloaded .bib file).
> Which is fine, if one opens the file with utf8 encoding.  However when
> I double click it, BibDesk (1.3.18) gives the "Unable to parse string
> as BibTeX" error, which suggests editing, but not trying a different
> encoding.
>
> I just wondered whether bibdesk ought to be able to assess the
> encoding of the file (TextMate seems to be able to), or whether this
> error message might suggest trying a different encoding?
>
> The example bibtex downloads from here (download citation)
>
> http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000108
>
> Thanks,
> James
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Bibdesk-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bibdesk-users
>


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

Maxwell, Adam R
In reply to this post by James Howison
On 08/21/08 12:34, "James Howison" <[hidden email]> wrote:

> PLOS is publishing their bibtex in utf8 (as a downloaded .bib file).
> Which is fine, if one opens the file with utf8 encoding.  However when
> I double click it, BibDesk (1.3.18) gives the "Unable to parse string
> as BibTeX" error, which suggests editing, but not trying a different
> encoding.

Try dropping the file on your document's main window, which I should have
suggested to JT as well.  That will force BibDesk to guess the encoding, and
UTF-8 will be tried if the file does not have a Unicode BOM (unless that's
changed in the last few months).  Double-clicking the file only uses your
default encoding.

> I just wondered whether bibdesk ought to be able to assess the
> encoding of the file (TextMate seems to be able to), or whether this
> error message might suggest trying a different encoding?

TextMate always tries UTF-8; since a file can't be misinterpreted as UTF-8,
this is safe (BibDesk does it as well, in the case I mentioned above).
Unfortunately, to try and guess encoding when opening a BibTeX document from
the Finder would be problematic with BibDesk's error display, among other
things, so it has to be specified by the user.

--
Adam


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

Christiaan Hofman
In reply to this post by James Howison
If you have set ASCII as the default encoding in the Files prefs, you  
can change that to UTF-8. If you get a warning when opening a file  
that was saved with ASCII encoding, you can safely ignore that.

BibDesk could try to guess the encoding of the file, but that would be  
wrong and lying to you. With lots of bad consequences, including files  
that may not save. Note that being able to open a file with a  
particular encoding is no guarantee that that's the right one. And if  
it isn't, you will have messed up text without knowing it, and you  
probably won't be able to save the file. That's why BibDesk always  
either fails or warns. Also note that, unlike TextMate, you don't  
really see the plain text that's downloaded.

Note that you can also use the Open... menu item to open a file with a  
particular encoding.

Christiaan

On 21 Aug 2008, at 9:34 PM, James Howison wrote:

> Hi,
>
> PLOS is publishing their bibtex in utf8 (as a downloaded .bib file).
> Which is fine, if one opens the file with utf8 encoding.  However when
> I double click it, BibDesk (1.3.18) gives the "Unable to parse string
> as BibTeX" error, which suggests editing, but not trying a different
> encoding.
>
> I just wondered whether bibdesk ought to be able to assess the
> encoding of the file (TextMate seems to be able to), or whether this
> error message might suggest trying a different encoding?
>
> The example bibtex downloads from here (download citation)
>
> http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000108
>
> Thanks,
> James


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

James Howison-3
In reply to this post by Maxwell, Adam R

On Aug 21, 2008, at 4:28 PM, Maxwell, Adam R wrote:

> On 08/21/08 12:34, "James Howison" <[hidden email]> wrote:
>
>> PLOS is publishing their bibtex in utf8 (as a downloaded .bib file).
>> Which is fine, if one opens the file with utf8 encoding.  However  
>> when
>> I double click it, BibDesk (1.3.18) gives the "Unable to parse string
>> as BibTeX" error, which suggests editing, but not trying a different
>> encoding.
>
> Try dropping the file on your document's main window, which I should  
> have
> suggested to JT as well.  That will force BibDesk to guess the  
> encoding, and
> UTF-8 will be tried if the file does not have a Unicode BOM (unless  
> that's
> changed in the last few months).  Double-clicking the file only uses  
> your
> default encoding.

Dropping the file I linked to does import the entry, but it produces a  
different (wrong) result (the umlauted i char is messed up) than using  
the open-with encoding option.

>> I just wondered whether bibdesk ought to be able to assess the
>> encoding of the file (TextMate seems to be able to), or whether this
>> error message might suggest trying a different encoding?
>
> TextMate always tries UTF-8; since a file can't be misinterpreted as  
> UTF-8,
> this is safe (BibDesk does it as well, in the case I mentioned above).
> Unfortunately, to try and guess encoding when opening a BibTeX  
> document from
> the Finder would be problematic with BibDesk's error display, among  
> other
> things, so it has to be specified by the user.

Christiaan wrote:

> If you have set ASCII as the default encoding in the Files prefs, you
> can change that to UTF-8. If you get a warning when opening a file
> that was saved with ASCII encoding, you can safely ignore that.

Yes, that works, the file now opens with a double click (and the ï  
char shows up properly).

> BibDesk could try to guess the encoding of the file, but that would be
> wrong and lying to you. With lots of bad consequences, including files
> that may not save. Note that being able to open a file with a
> particular encoding is no guarantee that that's the right one. And if
> it isn't, you will have messed up text without knowing it, and you
> probably won't be able to save the file. That's why BibDesk always
> either fails or warns. Also note that, unlike TextMate, you don't
> really see the plain text that's downloaded.
>
> Note that you can also use the Open... menu item to open a file with a
> particular encoding.

Perhaps the error display dialog could simply suggest "You could try  
opening this file with a different encoding"?

The current message is:

"There was a problem reading the file.  Do you want to give up, edit  
the file to correct the errors, or keep going with everything that  
could be analyzed?"

I suggest:

"There was a problem reading the file.  Do you want to give up, edit  
the file to correct the errors, keep going with everything that could  
be analyzed, or try to open the file after specifying a different  
encoding?"

and adding an "Open With Encoding" button, which goes to the regular  
Open dialog box.

--J



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

Christiaan Hofman

On 21 Aug 2008, at 11:18 PM, James Howison wrote:

>
> On Aug 21, 2008, at 4:28 PM, Maxwell, Adam R wrote:
>
>> On 08/21/08 12:34, "James Howison" <[hidden email]> wrote:
>>
>>> PLOS is publishing their bibtex in utf8 (as a downloaded .bib file).
>>> Which is fine, if one opens the file with utf8 encoding.  However
>>> when
>>> I double click it, BibDesk (1.3.18) gives the "Unable to parse  
>>> string
>>> as BibTeX" error, which suggests editing, but not trying a different
>>> encoding.
>>
>> Try dropping the file on your document's main window, which I should
>> have
>> suggested to JT as well.  That will force BibDesk to guess the
>> encoding, and
>> UTF-8 will be tried if the file does not have a Unicode BOM (unless
>> that's
>> changed in the last few months).  Double-clicking the file only uses
>> your
>> default encoding.
>
> Dropping the file I linked to does import the entry, but it produces a
> different (wrong) result (the umlauted i char is messed up) than using
> the open-with encoding option.

It probably used Unicode, because that's tried before UTF-8. Shows my  
point that you can't just trust it only because it didn't fail.

>
>>> I just wondered whether bibdesk ought to be able to assess the
>>> encoding of the file (TextMate seems to be able to), or whether this
>>> error message might suggest trying a different encoding?
>>
>> TextMate always tries UTF-8; since a file can't be misinterpreted as
>> UTF-8,
>> this is safe (BibDesk does it as well, in the case I mentioned  
>> above).
>> Unfortunately, to try and guess encoding when opening a BibTeX
>> document from
>> the Finder would be problematic with BibDesk's error display, among
>> other
>> things, so it has to be specified by the user.
>
> Christiaan wrote:
>
>> If you have set ASCII as the default encoding in the Files prefs, you
>> can change that to UTF-8. If you get a warning when opening a file
>> that was saved with ASCII encoding, you can safely ignore that.
>
> Yes, that works, the file now opens with a double click (and the ï
> char shows up properly).
>
>> BibDesk could try to guess the encoding of the file, but that would  
>> be
>> wrong and lying to you. With lots of bad consequences, including  
>> files
>> that may not save. Note that being able to open a file with a
>> particular encoding is no guarantee that that's the right one. And if
>> it isn't, you will have messed up text without knowing it, and you
>> probably won't be able to save the file. That's why BibDesk always
>> either fails or warns. Also note that, unlike TextMate, you don't
>> really see the plain text that's downloaded.
>>
>> Note that you can also use the Open... menu item to open a file  
>> with a
>> particular encoding.
>
> Perhaps the error display dialog could simply suggest "You could try
> opening this file with a different encoding"?
>
> The current message is:
>
> "There was a problem reading the file.  Do you want to give up, edit
> the file to correct the errors, or keep going with everything that
> could be analyzed?"
>
> I suggest:
>
> "There was a problem reading the file.  Do you want to give up, edit
> the file to correct the errors, keep going with everything that could
> be analyzed, or try to open the file after specifying a different
> encoding?"
>
> and adding an "Open With Encoding" button, which goes to the regular
> Open dialog box.
>
> --J

We can't offer that option, as the document has already failed at that  
point. At that point there's no way back to try again. (well, there  
might be by completely rewriting the document based architecture,  
that's not an option).

Christiaan


Christiaan


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

James Howison-3

On Aug 21, 2008, at 5:38 PM, Christiaan Hofman wrote:

>
> On 21 Aug 2008, at 11:18 PM, James Howison wrote:
>
>>
>> On Aug 21, 2008, at 4:28 PM, Maxwell, Adam R wrote:
>>
>>> On 08/21/08 12:34, "James Howison" <[hidden email]> wrote:
>>>
>>>> PLOS is publishing their bibtex in utf8 (as a downloaded .bib  
>>>> file).
>>>> Which is fine, if one opens the file with utf8 encoding.  However
>>>> when
>>>> I double click it, BibDesk (1.3.18) gives the "Unable to parse
>>>> string
>>>> as BibTeX" error, which suggests editing, but not trying a  
>>>> different
>>>> encoding.
>>>
>>> Try dropping the file on your document's main window, which I should
>>> have
>>> suggested to JT as well.  That will force BibDesk to guess the
>>> encoding, and
>>> UTF-8 will be tried if the file does not have a Unicode BOM (unless
>>> that's
>>> changed in the last few months).  Double-clicking the file only uses
>>> your
>>> default encoding.
>>
>> Dropping the file I linked to does import the entry, but it  
>> produces a
>> different (wrong) result (the umlauted i char is messed up) than  
>> using
>> the open-with encoding option.
>
> It probably used Unicode, because that's tried before UTF-8. Shows my
> point that you can't just trust it only because it didn't fail.
>
>>
>>>> I just wondered whether bibdesk ought to be able to assess the
>>>> encoding of the file (TextMate seems to be able to), or whether  
>>>> this
>>>> error message might suggest trying a different encoding?
>>>
>>> TextMate always tries UTF-8; since a file can't be misinterpreted as
>>> UTF-8,
>>> this is safe (BibDesk does it as well, in the case I mentioned
>>> above).
>>> Unfortunately, to try and guess encoding when opening a BibTeX
>>> document from
>>> the Finder would be problematic with BibDesk's error display, among
>>> other
>>> things, so it has to be specified by the user.
>>
>> Christiaan wrote:
>>
>>> If you have set ASCII as the default encoding in the Files prefs,  
>>> you
>>> can change that to UTF-8. If you get a warning when opening a file
>>> that was saved with ASCII encoding, you can safely ignore that.
>>
>> Yes, that works, the file now opens with a double click (and the ï
>> char shows up properly).
>>
>>> BibDesk could try to guess the encoding of the file, but that would
>>> be
>>> wrong and lying to you. With lots of bad consequences, including
>>> files
>>> that may not save. Note that being able to open a file with a
>>> particular encoding is no guarantee that that's the right one. And  
>>> if
>>> it isn't, you will have messed up text without knowing it, and you
>>> probably won't be able to save the file. That's why BibDesk always
>>> either fails or warns. Also note that, unlike TextMate, you don't
>>> really see the plain text that's downloaded.
>>>
>>> Note that you can also use the Open... menu item to open a file
>>> with a
>>> particular encoding.
>>
>> Perhaps the error display dialog could simply suggest "You could try
>> opening this file with a different encoding"?
>>
>> The current message is:
>>
>> "There was a problem reading the file.  Do you want to give up, edit
>> the file to correct the errors, or keep going with everything that
>> could be analyzed?"
>>
>> I suggest:
>>
>> "There was a problem reading the file.  Do you want to give up, edit
>> the file to correct the errors, keep going with everything that could
>> be analyzed, or try to open the file after specifying a different
>> encoding?"
>>
>> and adding an "Open With Encoding" button, which goes to the regular
>> Open dialog box.
>>
>> --J
>
> We can't offer that option, as the document has already failed at that
> point. At that point there's no way back to try again. (well, there
> might be by completely rewriting the document based architecture,
> that's not an option).

It's not possible to open a file dialog with that file selected?  Fair  
enough, well, maybe just a note?

btw, PLOS is asking me (I reported a bug) is they could write a BOM  
mark or something to make the encoding detectable for double click  
opening?

--J
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

Christiaan Hofman

On 22 Aug 2008, at 12:20 AM, James Howison wrote:

>
> On Aug 21, 2008, at 5:38 PM, Christiaan Hofman wrote:
>
>>
>> On 21 Aug 2008, at 11:18 PM, James Howison wrote:
>>
>>>
>>> On Aug 21, 2008, at 4:28 PM, Maxwell, Adam R wrote:
>>>
>>>> On 08/21/08 12:34, "James Howison" <[hidden email]> wrote:
>>>>
>>>>> PLOS is publishing their bibtex in utf8 (as a downloaded .bib
>>>>> file).
>>>>> Which is fine, if one opens the file with utf8 encoding.  However
>>>>> when
>>>>> I double click it, BibDesk (1.3.18) gives the "Unable to parse
>>>>> string
>>>>> as BibTeX" error, which suggests editing, but not trying a
>>>>> different
>>>>> encoding.
>>>>
>>>> Try dropping the file on your document's main window, which I  
>>>> should
>>>> have
>>>> suggested to JT as well.  That will force BibDesk to guess the
>>>> encoding, and
>>>> UTF-8 will be tried if the file does not have a Unicode BOM (unless
>>>> that's
>>>> changed in the last few months).  Double-clicking the file only  
>>>> uses
>>>> your
>>>> default encoding.
>>>
>>> Dropping the file I linked to does import the entry, but it
>>> produces a
>>> different (wrong) result (the umlauted i char is messed up) than
>>> using
>>> the open-with encoding option.
>>
>> It probably used Unicode, because that's tried before UTF-8. Shows my
>> point that you can't just trust it only because it didn't fail.
>>
>>>
>>>>> I just wondered whether bibdesk ought to be able to assess the
>>>>> encoding of the file (TextMate seems to be able to), or whether
>>>>> this
>>>>> error message might suggest trying a different encoding?
>>>>
>>>> TextMate always tries UTF-8; since a file can't be misinterpreted  
>>>> as
>>>> UTF-8,
>>>> this is safe (BibDesk does it as well, in the case I mentioned
>>>> above).
>>>> Unfortunately, to try and guess encoding when opening a BibTeX
>>>> document from
>>>> the Finder would be problematic with BibDesk's error display, among
>>>> other
>>>> things, so it has to be specified by the user.
>>>
>>> Christiaan wrote:
>>>
>>>> If you have set ASCII as the default encoding in the Files prefs,
>>>> you
>>>> can change that to UTF-8. If you get a warning when opening a file
>>>> that was saved with ASCII encoding, you can safely ignore that.
>>>
>>> Yes, that works, the file now opens with a double click (and the ï
>>> char shows up properly).
>>>
>>>> BibDesk could try to guess the encoding of the file, but that would
>>>> be
>>>> wrong and lying to you. With lots of bad consequences, including
>>>> files
>>>> that may not save. Note that being able to open a file with a
>>>> particular encoding is no guarantee that that's the right one. And
>>>> if
>>>> it isn't, you will have messed up text without knowing it, and you
>>>> probably won't be able to save the file. That's why BibDesk always
>>>> either fails or warns. Also note that, unlike TextMate, you don't
>>>> really see the plain text that's downloaded.
>>>>
>>>> Note that you can also use the Open... menu item to open a file
>>>> with a
>>>> particular encoding.
>>>
>>> Perhaps the error display dialog could simply suggest "You could try
>>> opening this file with a different encoding"?
>>>
>>> The current message is:
>>>
>>> "There was a problem reading the file.  Do you want to give up, edit
>>> the file to correct the errors, or keep going with everything that
>>> could be analyzed?"
>>>
>>> I suggest:
>>>
>>> "There was a problem reading the file.  Do you want to give up, edit
>>> the file to correct the errors, keep going with everything that  
>>> could
>>> be analyzed, or try to open the file after specifying a different
>>> encoding?"
>>>
>>> and adding an "Open With Encoding" button, which goes to the regular
>>> Open dialog box.
>>>
>>> --J
>>
>> We can't offer that option, as the document has already failed at  
>> that
>> point. At that point there's no way back to try again. (well, there
>> might be by completely rewriting the document based architecture,
>> that's not an option).
>
> It's not possible to open a file dialog with that file selected?  Fair
> enough, well, maybe just a note?
>

The message is already pretty long and complex.

> btw, PLOS is asking me (I reported a bug) is they could write a BOM
> mark or something to make the encoding detectable for double click
> opening?
>
> --J

BOM is only for Unicode. There is no general way to note an encoding.  
Encodings is a pretty stupid system, I'm sure if it were to be  
designed from scratch it would be very different.

Christiaan


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

Maxwell, Adam R
In reply to this post by Christiaan Hofman
On 08/21/08 14:38, "Christiaan Hofman" <[hidden email]> wrote:

>
> On 21 Aug 2008, at 11:18 PM, James Howison wrote:
>
>>
>> On Aug 21, 2008, at 4:28 PM, Maxwell, Adam R wrote:
>>
>>> On 08/21/08 12:34, "James Howison" <[hidden email]> wrote:
>>>
>>>> PLOS is publishing their bibtex in utf8 (as a downloaded .bib file).
>>>> Which is fine, if one opens the file with utf8 encoding.  However
>>>> when
>>>> I double click it, BibDesk (1.3.18) gives the "Unable to parse
>>>> string
>>>> as BibTeX" error, which suggests editing, but not trying a different
>>>> encoding.
>>>
>>> Try dropping the file on your document's main window, which I should
>>> have
>>> suggested to JT as well.  That will force BibDesk to guess the
>>> encoding, and
>>> UTF-8 will be tried if the file does not have a Unicode BOM (unless
>>> that's
>>> changed in the last few months).  Double-clicking the file only uses
>>> your
>>> default encoding.
>>
>> Dropping the file I linked to does import the entry, but it produces a
>> different (wrong) result (the umlauted i char is messed up) than using
>> the open-with encoding option.
>
> It probably used Unicode, because that's tried before UTF-8. Shows my
> point that you can't just trust it only because it didn't fail.

It only uses Unicode if the file has the appropriate BOM, and UTF-8 must not
have that.  James, what is the encoding of the document you dropped the file
on?  If it's Mac Roman or Latin 1, it's probably "succeeding" with that
encoding and never tries UTF-8.  Mac Roman is gapless so you'll always get
something out of it.

>> "There was a problem reading the file.  Do you want to give up, edit
>> the file to correct the errors, or keep going with everything that
>> could be analyzed?"
>>
>> I suggest:
>>
>> "There was a problem reading the file.  Do you want to give up, edit
>> the file to correct the errors, keep going with everything that could
>> be analyzed, or try to open the file after specifying a different
>> encoding?"
>>
>> and adding an "Open With Encoding" button, which goes to the regular
>> Open dialog box.
>>
>> --J
>
> We can't offer that option, as the document has already failed at that
> point. At that point there's no way back to try again. (well, there
> might be by completely rewriting the document based architecture,
> that's not an option).

That's a modal panel that runs before returning yes or no, so it hasn't
failed as far as the document is concerned.  I think the larger problem is
that it's a generic error message that may be shown for syntax and other
errors, not just encoding.

--
Adam


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

James Howison-3

On Aug 21, 2008, at 8:39 PM, Maxwell, Adam R wrote:

> On 08/21/08 14:38, "Christiaan Hofman" <[hidden email]> wrote:
>
>>
>> On 21 Aug 2008, at 11:18 PM, James Howison wrote:
>>
>>>
>>> On Aug 21, 2008, at 4:28 PM, Maxwell, Adam R wrote:
>>>
>>>> On 08/21/08 12:34, "James Howison" <[hidden email]> wrote:
>>>>
>>>>> PLOS is publishing their bibtex in utf8 (as a downloaded .bib  
>>>>> file).
>>>>> Which is fine, if one opens the file with utf8 encoding.  However
>>>>> when
>>>>> I double click it, BibDesk (1.3.18) gives the "Unable to parse
>>>>> string
>>>>> as BibTeX" error, which suggests editing, but not trying a  
>>>>> different
>>>>> encoding.
>>>>
>>>> Try dropping the file on your document's main window, which I  
>>>> should
>>>> have
>>>> suggested to JT as well.  That will force BibDesk to guess the
>>>> encoding, and
>>>> UTF-8 will be tried if the file does not have a Unicode BOM (unless
>>>> that's
>>>> changed in the last few months).  Double-clicking the file only  
>>>> uses
>>>> your
>>>> default encoding.
>>>
>>> Dropping the file I linked to does import the entry, but it  
>>> produces a
>>> different (wrong) result (the umlauted i char is messed up) than  
>>> using
>>> the open-with encoding option.
>>
>> It probably used Unicode, because that's tried before UTF-8. Shows my
>> point that you can't just trust it only because it didn't fail.
>
> It only uses Unicode if the file has the appropriate BOM, and UTF-8  
> must not
> have that.  James, what is the encoding of the document you dropped  
> the file
> on?  If it's Mac Roman or Latin 1, it's probably "succeeding" with  
> that
> encoding and never tries UTF-8.  Mac Roman is gapless so you'll  
> always get
> something out of it.
'ere tis:





AFAICS it's utf-8 (ok, that's what TextMate reports and when opened  
with that encoding the ï character (umlaut-i) shows up fine.

--J
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users

10.1371%2Fjournal.pcbi.1000108.bib (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

Maxwell, Adam R
On 08/21/08 17:59, "James Howison" <[hidden email]> wrote:

>
> On Aug 21, 2008, at 8:39 PM, Maxwell, Adam R wrote:
>
>> On 08/21/08 14:38, "Christiaan Hofman" <[hidden email]> wrote:
>>  
>>> It probably used Unicode, because that's tried before UTF-8. Shows my
>>> point that you can't just trust it only because it didn't fail.
>>
>> It only uses Unicode if the file has the appropriate BOM, and UTF-8
>> must not
>> have that.  James, what is the encoding of the document you dropped
>> the file
>> on?  If it's Mac Roman or Latin 1, it's probably "succeeding" with
>> that
>> encoding and never tries UTF-8.  Mac Roman is gapless so you'll
>> always get
>> something out of it.
>
> AFAICS it's utf-8 (ok, that's what TextMate reports and when opened
> with that encoding the ï character (umlaut-i) shows up fine.

Okay, but what was the encoding of the .bib document you dropped this /on/?

When you drop a file, BibDesk guesses encoding in this order:

1) encoding of /destination/ document
2) extended attribute com.apple.TextEncoding
3) check for BOM; if present, use UTF-16 (big or little-endian, as
appropriate)
4) try default C string encoding (typically Mac Roman)
5) try ISO Latin 1

So if your document uses Mac Roman, you'll never get past the first
condition.

--
Adam


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

James Howison-3

On Aug 21, 2008, at 9:16 PM, Maxwell, Adam R wrote:

> On 08/21/08 17:59, "James Howison" <[hidden email]> wrote:
>>
>> On Aug 21, 2008, at 8:39 PM, Maxwell, Adam R wrote:
>>
>>> On 08/21/08 14:38, "Christiaan Hofman" <[hidden email]> wrote:
>>>
>>>> It probably used Unicode, because that's tried before UTF-8.  
>>>> Shows my
>>>> point that you can't just trust it only because it didn't fail.
>>>
>>> It only uses Unicode if the file has the appropriate BOM, and UTF-8
>>> must not
>>> have that.  James, what is the encoding of the document you dropped
>>> the file
>>> on?  If it's Mac Roman or Latin 1, it's probably "succeeding" with
>>> that
>>> encoding and never tries UTF-8.  Mac Roman is gapless so you'll
>>> always get
>>> something out of it.
>>
>> AFAICS it's utf-8 (ok, that's what TextMate reports and when opened
>> with that encoding the ï character (umlaut-i) shows up fine.
>
> Okay, but what was the encoding of the .bib document you dropped  
> this /on/?
>
> When you drop a file, BibDesk guesses encoding in this order:
>
> 1) encoding of /destination/ document
> 2) extended attribute com.apple.TextEncoding
> 3) check for BOM; if present, use UTF-16 (big or little-endian, as
> appropriate)
> 4) try default C string encoding (typically Mac Roman)
> 5) try ISO Latin 1
>
> So if your document uses Mac Roman, you'll never get past the first
> condition.

Sorry Adam, should have read more carefully :)

It was ASCII.  I confirmed that dropping it on a new bib, saved as  
UTF-8, meant that it was properly inserted.

It would be nice if com.apple.TextEncoding was set by Safari (at  
least) when it saved a document with UTF-8 headers.  But I checked and  
while it receives that header, it doesn't add the xattr:

xattr -l Downloads/10.1371%2Fjournal.pcbi.1000108.bib

com.apple.metadata:kMDItemWhereFroms:

com.apple.quarantine:

meh, encodings suck :)

--J


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 bibtex file; should it be auto-recognized?

Adam R. Maxwell

On Aug 21, 2008, at 7:32 PM, James Howison wrote:

>
> On Aug 21, 2008, at 9:16 PM, Maxwell, Adam R wrote:
>
>> On 08/21/08 17:59, "James Howison" <[hidden email]> wrote:
>>>
>>> On Aug 21, 2008, at 8:39 PM, Maxwell, Adam R wrote:
>>>
>>>> On 08/21/08 14:38, "Christiaan Hofman" <[hidden email]> wrote:
>>>>
>>>>> It probably used Unicode, because that's tried before UTF-8.
>>>>> Shows my
>>>>> point that you can't just trust it only because it didn't fail.
>>>>
>>>> It only uses Unicode if the file has the appropriate BOM, and UTF-8
>>>> must not
>>>> have that.  James, what is the encoding of the document you dropped
>>>> the file
>>>> on?  If it's Mac Roman or Latin 1, it's probably "succeeding" with
>>>> that
>>>> encoding and never tries UTF-8.  Mac Roman is gapless so you'll
>>>> always get
>>>> something out of it.
>>>
>>> AFAICS it's utf-8 (ok, that's what TextMate reports and when opened
>>> with that encoding the ï character (umlaut-i) shows up fine.
>>
>> Okay, but what was the encoding of the .bib document you dropped
>> this /on/?
>>
>> When you drop a file, BibDesk guesses encoding in this order:
>>
>> 1) encoding of /destination/ document
>> 2) extended attribute com.apple.TextEncoding
>> 3) check for BOM; if present, use UTF-16 (big or little-endian, as
>> appropriate)
>> 4) try default C string encoding (typically Mac Roman)
>> 5) try ISO Latin 1
>>
>> So if your document uses Mac Roman, you'll never get past the first
>> condition.
>
> Sorry Adam, should have read more carefully :)
>
> It was ASCII.  I confirmed that dropping it on a new bib, saved as
> UTF-8, meant that it was properly inserted.
Hmmm...yeah, that makes sense too, unfortunately, and I didn't think  
of it at the time.  Sorry about that :(.  If you can compile from  
source, try editing NSString_BDSKExtensions.m as follows:

- (NSString *)initWithContentsOfFile:(NSString *)path encoding:
(NSStringEncoding)encoding guessEncoding:(BOOL)try;
{
     if(self = [self init]){
         NSData *data = [[NSData alloc] initWithContentsOfFile:path  
options:NSMappedRead error:NULL];

         NSString *string = nil;

         // if we're guessing, try the reliable encodings first
         if(try && dataHasUnicodeByteOrderMark(data) && encoding !=  
NSUnicodeStringEncoding)
             string = [[NSString alloc] initWithData:data  
encoding:NSUnicodeStringEncoding];
         if(try && nil == string && encoding != NSUTF8StringEncoding)
             string = [[NSString alloc] initWithData:data  
encoding:NSUTF8StringEncoding];

         // read com.apple.TextEncoding on Leopard, or when reading a  
Tiger file saved on Leopard
         if(try && nil == string) {
             encoding = [[NSFileManager defaultManager]  
appleStringEncodingAtPath:path error:NULL];
             if (encoding > 0)
                 string = [[NSString alloc] initWithData:data  
encoding:encoding];
         }

         // try the encoding passed as a parameter, if non-zero (zero  
encoding is never valid)
         if(nil == string && encoding > 0)
             string = [[NSString alloc] initWithData:data  
encoding:encoding];

         // now we just try a few wild guesses
         if(nil == string && try && encoding != [NSString  
defaultCStringEncoding])
             string = [[NSString alloc] initWithData:data encoding:
[NSString defaultCStringEncoding]];
         if(nil == string && try && encoding !=  
[BDSKStringEncodingManager defaultEncoding])
             string = [[NSString alloc] initWithData:data encoding:
[BDSKStringEncodingManager defaultEncoding]];
         // final fallback is Mac Roman (gapless)
         if(nil == string && try && encoding !=  
NSMacOSRomanStringEncoding)
             string = [[NSString alloc] initWithData:data  
encoding:NSMacOSRomanStringEncoding];

         [data release];
         [self release];
         self = string;
     }
     return self;
}

This changes the heuristic to check for UTF-16, UTF-8,  
com.apple.TextEncoding, then the supplied encoding parameter, and  
finally to use a WAG.

--
Adam
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users

smime.p7s (3K) Download Attachment
Loading...