Upgrade from 2.8.3 to 2.10.0-latin1 to utf8

ismaelmagaw

I have the same problem. I went to phpmyadmin and changed everything paar utf8_general_ci on the left side and according to the database. In the database everything was utf8_general_ci ... but in the "general definitions" of phpmyadmin I tried to change utf8mb4_general_ci to utf8_general_ci but not fixed. How do I change and fix? I think it solved the problem. On the first attempt to update from 2.83 to 2.10 the WebsiteBaker only changed the pages, but in this new update I did changed the whole text on all pages. My language is Portuguese. Where words have accented has changed everything. I went to the administration of tools and in the module of the language I edited the PT and tried to change some errors that were there and were changing ... see:
/ Set the language information
$ language_code = 'PT';
$ language_name = 'English';
$ language_version = '2.8';
$ language_platform = '2.8.x';
$ language_author = 'Daniel Neto';
$ language_license = 'GNU General Public License';
$ MENU ['ACCESS'] = 'Access';
$ MENU ['ADDON'] = 'Add-on';
$ MENU ['ADDONS'] = 'Add-ons';
$ MENU ['ADMINTOOLS'] = 'Admin-Tools';
$ MENU ['BREADCRUMB'] = 'You are here:';
$ MENU ['FORGOT'] = 'Receive Login Details';
$ MENU ['GROUP'] = 'Group';
$ MENU ['GROUPS'] = 'Groups';
$ MENU ['HELP'] = 'Help';
$ MENU ['LANGUAGES'] = 'Languages';
$ MENU ['LOGIN'] = 'Login';
$ MENU ['LOGOUT'] = 'Log-out';
$ MENU ['MEDIA'] = 'Day';
$ MENU ['MODULES'] = 'Module';
$ MENU ['PAGES'] = 'Pages';
$ MENU ['PREFERENCES'] = 'Preferences';
$ MENU ['SETTINGS'] = 'Configuration & Conditions';
$ MENU ['START'] = 'In' ';
$ MENU ['TEMPLATES'] = 'Themes';
$ MENU ['USERS'] = 'User';
$ MENU ['VIEW'] = 'Preview';
$ TEXT ['ACCOUNT_SIGNUP'] = 'Account Signature';
$ TEXT ['ACTIONS'] = 'A & ccdil;';
$ TEXT ['ACTIVE'] = 'Active';
$ TEXT ['ADD'] = 'Add';
$ TEXT ['ADDON'] = 'Add-On';
$ TEXT ['ADD_SECTION'] = 'Add Sessions';
$ TEXT ['ADMIN'] = 'Admin';
$ TEXT ['ADMINISTRATION'] = 'Administration & ccdil;';
$ TEXT ['ADMINISTRATION_TOOL'] = 'Administration Tool';
$ TEXT ['ADMINISTRATOR'] = 'Administrator';
$ TEXT ['ADMINISTRATORS'] = 'Administrators';
$ TEXT ['ADVANCED'] = 'Advanced';
$ TEXT ['ALLOWED_FILETYPES_ON_UPLOAD'] = 'Allowed filetypes on upload';
$ TEXT ['ALLOWED_VIEWERS'] = 'Allowed Viewers';
$ TEXT ['ALLOW_MULTIPLE_SELECTIONS'] = 'Allow multiple selections';
$ TEXT ['ALL_WORDS'] = 'All Words';
$ TEXT ['ANCHOR'] = 'Anchor';
$ TEXT ['ANONYMOUS'] = 'Anomaly';
$ TEXT ['ANY_WORDS'] = 'Any Word';
$ TEXT ['APP_NAME'] = 'Application Name';
$ TEXT ['ARE_YOU_SURE'] = ' are you sure?';
$ TEXT ['AUTHOR'] = 'Author';
$ TEXT ['BACK'] = 'Return';
$ TEXT ['BACKUP'] = 'Backup';
$ TEXT ['BACKUP_ALL_TABLES'] = 'Backup of All Tables in the Database';
$ TEXT ['BACKUP_DATABASE'] = 'Database Backup';
$ TEXT ['BACKUP_MEDIA'] = 'Media Backup';
$ TEXT ['BACKUP_WB_SPECIFIC'] = 'Backup tables only';
$ TEXT ['BASIC'] = 'Basic';
$ TEXT ['BLOCK'] = 'Block';
$ TEXT ['CALENDAR'] = 'Calender';
$ TEXT ['CANCEL'] = 'Cancel';
$ TEXT ['CAN_DELETE_HIMSELF'] = 'Can delete himself';
$ TEXT ['CAPTCHA_VERIFICATION'] = 'Check & Capture';
$ TEXT ['CAP_EDIT_CSS'] = 'Edit CSS';
$ TEXT ['CHANGE'] = 'Change';
$ TEXT ['CHANGES'] = 'Change & ccdil;';
$ TEXT ['CHANGE_SETTINGS'] = 'Change Settings & Settings';
$ TEXT ['CHARSET'] = 'Charset';

I do not know what to do anymore, I think I'm going to return to 2.83 again. I've tried to do the update 3 times and this error always happens.

Gast

QuoteOnly thing that I found a problem with now is my email link

email-link has nothing to do with utf8. a mail-adress (not the maillink) is stored in the database as normal text, in wb the filter "email" search for a combination with mailto (or href), to discover links in the content. now, we have two way's
- you add a maillink in your content and want have a link with protected, encrypted mailadress like
<a href="javascript:mdcr('jiZngthfoFjbzf','')">My Linktext</a>

- you add a maillink in your content and want have a link without protected, encrypted mailadress like
<a href="mailto:uwe@jacobi22.de">My Linktext</a>

for solution 1, you have to load the file mdcr.js. in the original-package of WB 2.10.0, it was not possible to load this file automatically and only, if you need this encryption. the file was loaded everytime, when the filter was activated or generally loaded, also, if you dont need this option (i'm not sure)
but i remember, that Darkviper publish a new version, that load the mdcr.js only, if you need this to protect a mail. so, not sure, which version is active in your WB at the moment. for a test, you can use the file from the WB-Download-Package in modules/output_filter/filters/filterEmail.php. This have to work in every case

new version of this file are here -> http://project.websitebaker.org/projects/wb-2-10/repository/raw/branches/main/modules/output_filter/filters/filterEmail.php

contactjw

I found the problem with the email link. It wasn't related to the charset change. So as far as I can tell everything is working OK with all tables and fields changed to utf8_unicode_ci. Unless I come across anything down the road looks like everything is good to go. Thanks for your help with this jacobi.  (Y)

contactjw

OK got rid of the "?". All my tables and fields are now set to utf8. Only thing that I found a problem with now is my email link. Now this may have nothing to do with the character set changes, but I click on my email link and nothing happens. It was working before in 2.8.3 but I'm not sure if I checked it after upgrading.

Gast

do you mean the ? ? ? ? in the front of some languages?. i see it sometime's and in my case, it was a result of the converting to utf8mb4_general_ci. utf8mb4 use 4-byte-chars and utf8 only 2-byte-chars.
best solution is a correction directly in your database. copy the name of this addon (with ????) from the info.php, use a doubleclick on the field with the ???? in phpmyadmin and paste the addon name.

contactjw

OK. Changed addons table and fields. Everything I checked is OK except the 2 drop downs in the language section of the addons menu. Some entries have all or a few "?" for the actual letters. I can still select them and in "View Details" it shows the actual letters for the language. Should I be concerned about this, or will this not have any affect on anything else in the program?

Gast

the drag&drop- or better, the sortable-problem is reported on the WB-Project-Page as Ticket #50
it has nothing to do with a charset, its a ajax-problem

contactjw

OK. I changed all the news module tables and fields to utf8. Everything seems to work OK except I can't seem to get the drag and drop working for rearranging the order of posts. It seems to act unpredictably. Also one of the posts won't work even when I hit the up and down icon. It only works with the down icon if it's in the top position but otherwise it just goes up on either icon. Don't know if this has anything to do with the changing of character sets but wouldn't think so.  :?

Gast

QuoteEverything seems to be working OK.

I'm wondering about a few things still. Should I change the fields in the news_settings table to Utf8 as well? What about the database? Should I set that to Utf8 also or leave it as latin1_swedish_ci?

WB use the same Table Collation and Table Engine for all database tables. Since WB 2.8.3 SP7 only UTF8 as Charset is possible, no other charsets.
Maybe everything works now and you see no problems, but it comes, if you need your SQL-Backup the next time. Dont think about german or other special letters like ä, ö, ü, é, ó, í etc, also the special chars in your english text like apostroph, paragraph, %, & or the simple ? have to be encoded in the database.
maybe you have to change the provider, lower price, more service, combination with smart-tv etc. and you have to import your backup there. you have actual some tables with utf8-encoded chars and some tables with latin1. whatever you select in the charset box for the sql-importer, its only possible to use one charset and you start to correct the rest again. why not now and without pressure. step by step, every day a table?

okay, if you dont think about the future and dont think about the providers etc, its of course possible, to leave it at it is, it works, if nobody change the conditions
recommendation is: use only utf8 as default charset and utf8_general_ci or utf8_unicode_ci as table collation for all WebsiteBaker-tables

contactjw

Took a while for me to get back to working on this. So what I did is I changed the fields in the table news_posts to utf8_unicode_ci in phpMyadmin. Then I changed the table to Utf8_unicode_ci. I got a few places where the character didn't convert and I lost a little bit of text but I just retyped it in the ckeditor without too much effort. Everything seems to be working OK.

I'm wondering about a few things still. Should I change the fields in the news_settings table to Utf8 as well? What about the database? Should I set that to Utf8 also or leave it as latin1_swedish_ci?

One more thing. I'm not really sure I need to convert to Utf8 at all since I really don't have anything but English on my site. The instances of having to post a word or two in another language is rare and I can always get by with substituting any letters outside of English with an English letter. I figure if I can change to Utf8 I would be better off but don't know if it's really worth it. WDYT?

Gast

QuoteI tried using PHPMyadmin to change the collation in particular columns like the content_short, content_long columns of the news_posts table to utf8mb4_unicode_ci in the 2.8.3 installation. I could see the text when I opened the website in the browser, but the ckeditor was just blank without any text. Any idea why that would happen?

see my answer in the top :
Quote from: jacobi22P.S.: if the editor founds a not-readable char and its not possibe to convert it, the editor show nothing in the editor field and show also no toolbar. its a javascript error and its not possible for javascript to show not-existing-chars

the ckeditor and also a lot of functions inside of all the modules and the core doesnt support utf8mb4
utf8mb4 needs special multibyte-string-functions to work, see also
http://www.phptherightway.com/#php_and_utf8

contactjw

Quote from: jacobi22 on July 30, 2017, 10:20:25 AM
QuoteCould no longer connect with db

cannot be a problem with the charset
and if you mean, the charset is the problem, why does it work well for one week??   :-o :-o :-o

You're right. I found out later that it was a denial of service attack and somehow I guess my connection got messed up in the process.

Back to my charset. I tried using PHPMyadmin to change the collation in particular columns like the content_short, content_long columns of the news_posts table to utf8mb4_unicode_ci in the 2.8.3 installation. I could see the text when I opened the website in the browser, but the ckeditor was just blank without any text. Any idea why that would happen?

Gast

QuoteCould no longer connect with db

cannot be a problem with the charset
and if you mean, the charset is the problem, why does it work well for one week??   :-o :-o :-o

contactjw

Well, all was well for about a week until yesterday. Could no longer connect with db. So I ended up reverting back to 2.8.3. I'm wondering if maybe a better approach would be to identify the columns where utf8 is actually needed and using the drop downs in PHPMyadmin change just the ones that need it.

contactjw

From what I've read latin1_swedish_ci is the default for MySQL from it's beginning and which has never changed.

utf8mb4 seems to be the latest recommended setting as far as capabilities to deal with a wider range of characters.

utf8mb4_unicode_ci seems to be a more accurate collation setting than utf8mb4_general_ci and seems to be the recommended setting.

Thanks for the offer to try and convert my db for me. If it's not too much trouble for you I can send you my zipped sql-backup. Would you want the backup before I upgraded or after upgrading?

Also found this for converting from latin1 to utf8 in MySQL:

ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Is this an option?

Gast

thanks for the infos, but i see, that the simple notepad doesnt show the real SQL   :x sorry for the trouble

here the real UTF8-Chars from the table wb_addons in real SQL without Converter, only visible in a editor, but not in your browser



and the same code in notepad, notepad++ or other IDE's with included converter



but back to your post

the editors like ckeditor, tiny_mc etc has a own converter included, if activated, the editor convert special chars via html_entities(), a php-function. i dont now, how much, but html_entities has a small range, not possible, to convert all chars from all over the world. thats why, utf8 is the better solution, there are no limits. because of that, the php-group make a recommendation to use utf8 as default charset on every server since PHP 5.5. the most providers follow this instruction. in older php-versions it was the choice of the provider. german providers prefer the latin-charsets in the past and today its nearly not possible, to change the default charset without trouble. if they change the server charset for all the customers at the same time, you have a big chaos on the service line on one day or one week, so its part of the customers, to choose a new charset (like your case), so you have maybe only 1 user with trouble every day and not all in the same time  :wink:

to your code-examples
thats okay in ckeditor source view and also the version with entities is okay in ckeditor source view
QuoteWhen the road you're trudging seems all uphill,</p> (ckeditor source view)

the editor and his own converter convert the entities from the database and show the real char (like your apostroph). it doesnt matter, what the script found in the database. if its a readable char, its possible to convert it
P.S.: if the editor founds a not-readable char and its not possibe to convert it, the editor show nothing in the editor field and show also no toolbar. its a javascript error and its not possible for javascript to show not-existing-chars

But.... whats happend, if you save this content??
QuoteWhen the road you&rsquo;re trudging seems all uphill,</p> (phpmyadmin content view)

after saving this content from your example (with the correct charset and correct table collation) the phpmyadmin must show the real chars like
QuoteWhen the road you're trudging seems all uphill,</p>

its okay for older CKeditor versions with activated entities-converter, but not for the newer version. if i remenber correct, we change the default settings for the ckeditor in WB 2.8.3 SP6, but its possible for every User, to use the old method, if he set config.entities to TRUE
Background for this was the view in tools like phpmyadmin. its difficult to read the content, when i use Entities like this
QuoteWhen the road you&rsquo;re trudging seems all uphil
Without entities, i've the same view in phpmyadmin  like my output in Front- or backend, the used charset convert the SQL-language to readable chars without entities

notepad++ has a good converter, but not possible to convert different methods at the same time, because of that, we use php-converters, in my case with a big array of chars to replace. Maybe you can send me your zipped sql-backup and i'll try to convert it.

for my locale development, i use xampp for windows with the default settings from xampp. if i look into phpmyadmin at the databases overview, i found in the last line latin1_swedish_ci, dont know, where it comes from. the same on the online-servers, dont know why, maybe a default setting somewhere.

in one project, i've a lot of trouble like yours, a long period of search&&try, but then i found the database charset settings in this overview and it was set to latin1 also, so i change it to utf8_general_ci and all the problems are gone after converting all the chars in the backup
to change it, go to phpadmin, startpage with the overview (show all databases as list in the big, right content-field, (like my last picture). Click on the link to your prefered database and after reload in the top-menu on the OPERATION-Link, there you can change the collation for this database


contactjw

#14
Version 4.6.2.

if (!$bWbConfigSetting ) { $ckeditor->config['entities'] = false; }

config.entities = false;

(This is from a news module post)

When the road you're trudging seems all uphill,</p> (ckeditor source view)

When the road you&rsquo;re trudging seems all uphill,</p> (phpmyadmin content view)

When the road you&rsquo;re trudging seems all uphill,</p> (notepad)

(This is from a normal page.)

<a href="https://en.wikipedia.org/wiki/Fran%C3%A7ois_Cluzet" target="_blank">François Cluzet</a><br /> (ckeditor source view)

<a href="https://en.wikipedia.org/wiki/Fran%C3%A7ois_Cluzet" target="_blank">François Cluzet</a><br /> (phpmyadmin content view)

<a href=\"https://en.wikipedia.org/wiki/Fran%C3%A7ois_Cluzet\" target=\"_blank\">François Cluzet</a><br />\n (notepad)

Also, while looking at phpmyadmin I noticed that my localhost db is set to latin1 swedish with all the tables the same as my wb db was before I changed it. Should I change my localhost db like I did my wb db?

Gast

what is the used ckeditor version and do you use the editor setting $ckeditor->config['entities'] = true ??

you found this setting (if avaiable) in include.php of the editor (modules/ckeditor/include.php), nearly in the middle of the file, in my version at line 183

or the same setting in wb_ckconfig.js
config.entities = false;

to explain it: this option converted (if TRUE) special chars to html-entities (for example: the apostroph ' is saved as &#39; )
is this option deactivated or false, the script use the charset from WB, save it in sql-language as ’ and in the browser readable as '

pls check the table collation of this table. open this section with the ckeditor, search there for the section-id in the head of this section and search in the module-table mod_wysiwyg (if its is a wysiwyg section). what do you see in the database tool like phpmyadmin?
export this table and open it with a simple editor like notepad (not notepad++) what do you see  there???


contactjw

I changed the db to utf8_unicode_ci. Then I changed all the latin1 entries to utf8 in notepad++. Now all my tables show utf8_general_ci except the ones that were already utf8_unicode_ci. They stayed the same. I also used the ckeditor to change the few remaining entries like ’ that I missed when I first did the ones that were not reading properly. Everything seems to be working ok.

Did I do anything wrong? Is there anything else I need to do?

contactjw

#6 examples are from pages in ckeditor in source code view.

My database is latin1_swedish_ci. All my tables are latin1_swedish_ci except for droplets, output filter, and settings which are all utf8_unicode_ci. Not sure why it's swedish?  :?

Here's how my db file looks:

(Addons table latin1)

CREATE TABLE `wbaddons` (
  `addon_id` int(11) NOT NULL AUTO_INCREMENT,
  `type` varchar(255) NOT NULL DEFAULT '',
  `directory` varchar(255) NOT NULL DEFAULT '',
  `name` varchar(255) NOT NULL DEFAULT '',
  `description` text NOT NULL,
  `function` varchar(255) NOT NULL DEFAULT '',
  `version` varchar(255) NOT NULL DEFAULT '',
  `platform` varchar(255) NOT NULL DEFAULT '',
  `author` varchar(255) NOT NULL DEFAULT '',
  `license` varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`addon_id`),
  UNIQUE KEY `ident` (`directory`)
) ENGINE=MyISAM AUTO_INCREMENT=198 DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

(Settings table utf8.)

CREATE TABLE `wbsettings` (
  `name` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  `value` text COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

So if I change all the default latin1 entries to utf 8 will that be enough?

Gast

#10
question: the examples in #6 ist from the SQL-Backup-file???

looks for me like a wrong encoding on the way browser-> used charset
’ is the UTF8-Char for the apostroph, it has to show in your browser as ', if the browser use charset utf8 and (if you read this with a tool like phpmyadmin from the database) for a wrong table collation.
But....
its also possible, that you use the wrong method in the import of the repaired backup

1. at first (see post from Darkviper): Never use the simple Notepad / (Windows-)Editor or also Wordpad for a job like this. better is a editor like Notepad++ or a good IDE like Netbeans etc with a included converter
2. look at first into the database with a tool like phpmyadmin or mysqldumper etc and search there for the table collation and there for the fields "collation" in the overview (see my example)





the whole database has a collation (picture 1) and also every table (see pic #2). use here utf8_general_ci or utf8_unicode_ci.
you can change this in your backup-file. at the end of every table structure for found the settings like
QuoteENGINE=MyISAM AUTO_INCREMENT=28 DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
DEFAULT CHARSET has to be change to utf8
COLLATE has to be change to  utf8_general_ci or utf8_unicode_ci

3. look into every table for the collations of the fields. see the next picture from my backup. in the install-process, every field get the collation from the settings in the wb-installer. you say, your old wb runs with latin1, so i'm sure, every single field has now the wrong collation like my screenshot



you have to change every field collation in the backup file here also from (here in the example) latin1_german1_ci to utf8_general_ci or utf8_unicode_ci   :roll: if you dont do this, you write the wrong collation for this tables, when you import the repaired backup and you see the problem in the output later. best method for a job like this is a php-based-converter, works with the method search&&replace. a lot of users has a own converter, but its very complexe, too many possibilitys. for example: if i add in my converter-code the job: search for latin1_german1_ci and i use in my database latin1_german2_ci, the script found nothing and replace nothing. in the next job, i use latin1_swedish_ci instead of latin1_german1_ci. in simple words: if you use the search&&replace method, its not possible to build a converter for everybody, that works in every case  :| my converter use a long list with ~ 20 different collations and writing forms and everytime, when i found a new combination in a backup-file, i add a new line in my converter and start the job again.
its also possible, to replace the wrong collations with notepad++, but you have to check every table structure (same problem like the php-converter, if you use different collations)


back to your code and to notepad++
open the backup-sql-file with notepad++, go in the top-menu to "ENCODING". set notepad++ here to UTF8 without BOM (important!!). your code show's  and thats a sign for the simple UTF8 (with BOM)
if you open the file, the position of the black point show's the actual status, here UTF8 ohne BOM



if the actual status show not UTF8 without BOM, go in the same menu to the point "Convert to UTF8 without BOM"
now make a check for some special chars
- replace all  with nothing
- search for other combinations, the most UTF8-chars start with   or  à or  Å . these are chars from spain or swedish language, not used in english. Search for this letters and check every result. best method is a single edit for every result or a single search for every utf8-char. See also a simple list from here Search for the blue marked letters in your file, try it for lower case letters and upper case letters and replace it with the real chars ( ’ == ', â€" == - etc)

at last step, add this line somewhere in the top of the sql-file
SET NAMES 'utf8' COLLATE 'utf8_general_ci';

save the repaired file with a new file name, maybe with date + time at the end of the filename. now you can import the file as UTF8-formated File

 

hgs

That makes courage! I also have a web page that I urgently need the current version. :-D
LG Harald

"Fange nie an, aufzuhören - höre nie auf, anzufangen." Marcus Tullius Cicero (106-43 v.Chr.)

contactjw

OK. I looked it over and there weren't that many entries that needed to be corrected so I went ahead and did them manually. Funny thing is there wasn't any consistency to the problem entries. Some places had them and others didn't.  :?

contactjw

OK. Changed all DB entries from latin1 to utf8. However, that doesn't seem to have resolved the problem. Seems like the problem is only in the news module pages and one page I use with a sortable list. The other pages are OK. When I access one of the affected pages from the editor and check the source code I find ’ for all of the ' (single quote) throughout a page.  (See first example of code below)
For the second problem in the sortable table list I get  only in some places on one side or both sides of an entry but not in others. (See bottom portion of the second example below.)
Seems this is the editor changing or adding these characters. Is there a way to resolve this without manually fixing each entry?

1.
<div class="alignCenter">When things go wrong, as they sometimes will,<br />
When the road you’re trudging seems all uphill,<br />
When the funds are low and the debts are high,<br />
And you want to smile, but you have to sigh,<br />
When care is pressing you down a bit,<br />
Rest, if you mustâ€"but don’t you quit.<br />
<br />

2.
<tr>
      <td width="135">
      <a href="http://en.wikipedia.org/wiki/Marathon_(2005_film)" target="_blank">Marathon</a></td>
      <td width="175">
      <a href="http://en.wikipedia.org/wiki/Jo_Seung-woo" target="_blank" title="Jo Seung-woo">Jo Seung-woo</a><br />
      <a href="http://en.wikipedia.org/wiki/Kim_Mi-sook" target="_blank" title="Kim Mi-sook">Kim Mi-sook</a><br />
      <a href="http://en.wikipedia.org/wiki/Lee_Ki-young" target="_blank" title="Lee Ki-young">Lee Ki-young</a></td>
      <td width="137">&nbsp;based on (Korean w/ Engish subtitles)</td>
      <td width="111">courage</td>
      <td width="52">2005</td>
      <td width="85">Amazon</td>
      <td width="72">Netflix</td>
   </tr>