summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--LICENSE340
-rw-r--r--Makefile28
-rw-r--r--jmdict.cpp186
-rw-r--r--jmdict_import.cpp149
-rw-r--r--kana2romaji.cpp362
-rw-r--r--kana2romaji.h22
-rw-r--r--sqlite.h121
-rw-r--r--xmlparser.h107
8 files changed, 1315 insertions, 0 deletions
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..5b6e7c6
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,340 @@
+ GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+ 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) year name of author
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+ <signature of Ty Coon>, 1 April 1989
+ Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Library General
+Public License instead of this License.
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..c810ef6
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,28 @@
+PREFIX ?= /usr
+
+OPTS=-Wall -Wextra -ansi -pedantic-errors $(CXXFLAGS)
+DICTIONARY_PATH=$(PREFIX)/share/jmdict
+DICTIONARY_NAME="\"$(DICTIONARY_PATH)/database\""
+BINDIR=${DESTDIR}${PREFIX}/bin
+
+all: jmdict jmdict_import
+clean:
+ @echo cleaning up...
+ @rm -f jmdict jmdict_import jmdict.o jmdict_import.o kana2romaji.o
+kana2romaji.o: kana2romaji.cpp kana2romaji.h
+ $(CXX) $(OPTS) -c -o kana2romaji.o kana2romaji.cpp
+jmdict: jmdict.o kana2romaji.o
+ $(CXX) $(OPTS) -o jmdict jmdict.o kana2romaji.o -lsqlite3
+jmdict.o: jmdict.cpp sqlite.h
+ $(CXX) $(OPTS) -c -o jmdict.o jmdict.cpp -DDICTIONARY_PATH=$(DICTIONARY_NAME)
+jmdict_import: jmdict_import.o kana2romaji.o
+ $(CXX) $(OPTS) -o jmdict_import jmdict_import.o kana2romaji.o -lsqlite3 -lexpat
+
+jmdict_import.o: jmdict_import.cpp sqlite.h xmlparser.h kana2romaji.h
+ $(CXX) $(OPTS) -c -o jmdict_import.o jmdict_import.cpp -DDICTIONARY_PATH=$(DICTIONARY_NAME)
+
+install:
+ install -d ${DESTDIR}$(DICTIONARY_PATH)
+ install -d $(BINDIR)
+ install jmdict $(BINDIR)
+ install jmdict_import $(BINDIR)
diff --git a/jmdict.cpp b/jmdict.cpp
new file mode 100644
index 0000000..2d76110
--- /dev/null
+++ b/jmdict.cpp
@@ -0,0 +1,186 @@
+/*
+jmdict, a frontend to the JMdict file. http://mandrill.fuxx0r.net/jmdict.php
+Copyright (C) 2004 Florian Bluemel (florian.bluemel@uni-dortmund.de)
+
+This program is free software; you can redistribute it and/or
+modify it under the terms of the GNU General Public License
+as published by the Free Software Foundation; either version 2
+of the License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+*/
+#include <cstdlib>
+#include <iostream>
+#include <ostream>
+#include <iomanip>
+#include <string>
+#include <stdexcept>
+#include <exception>
+#include <memory>
+#include "sqlite.h"
+#include "kana2romaji.h"
+using namespace std;
+
+void usage() {
+ cout << "jmdict [options] subject\n"
+ " -b search for entries beginning with <subject>\n"
+ " -f perform a fulltext search\n"
+ " -i case-insensitive search (implied by -b or -f)\n"
+ "\n"
+ " -j translate from japanese\n"
+ " -J translate to japanese\n"
+ " if neither -j nor -J is given, source language will be guessed\n"
+ "\n"
+ " -l lang target language is lang, where lang is a three-letter language code\n"
+ " default: eng\n";
+}
+
+namespace options {
+ enum Language { UNKNOWN, JAPANESE, JAPANESE_ROMAJI, NOT_JAPANESE };
+
+ Language source = UNKNOWN;
+ string target("eng");
+ bool fulltext = false;
+ bool beginning = false;
+ bool ci_search = false;
+
+ class invalid_option : public std::runtime_error {
+ invalid_option(const string& s) : std::runtime_error(s) {}
+ };
+
+ void getFrom(int argc, char** argv) {
+ int opt;
+ while ((opt = getopt(argc, argv, "bfijJl:")) != -1)
+ switch (opt) {
+ case 'b': beginning = true; break;
+ case 'f': fulltext = true; break;
+ case 'i': ci_search = true; break;
+ case 'j': source = JAPANESE; break;
+ case 'J': source = NOT_JAPANESE; break;
+ case 'l': target = optarg; break;
+ case '?': throw invalid_argument(string("unrecognized option"));
+ }
+ }
+}
+
+auto_ptr<sql::db> db;
+unsigned entries(0);
+
+int accumulate(void* to, int, char** what, char**) {
+ string& app = *static_cast<string*>(to);
+ if (app.size())
+ app += ", ";
+ app += *what;
+ return 0;
+}
+
+int showGloss(void* s, int, char** value, char**) {
+ string& sense = *static_cast<string*>(s);
+ if (sense != value[0]) {
+ sense = value[0];
+ cout << " " << setw(2) << sense << ") ";
+ }
+ else
+ cout << " ";
+ cout << value[1] << endl;
+ return 0;
+}
+
+int showEntry(void*, int, char** value, char**) {
+ string kanji, kana;
+ db->exec(
+ sql::query("SELECT kanji FROM kanji WHERE entry=%s") % *value,
+ accumulate, &kanji);
+ db->exec(
+ sql::query("SELECT kana FROM reading WHERE entry=%s") % *value,
+ accumulate, &kana);
+
+ string rom;
+ kana2romaji(kana,rom);
+ if (kanji.size())
+ cout << kanji << " (" << kana << ") (" << rom << ')' << endl;
+ else
+ cout << kana << " (" << rom << ')' << endl;
+
+ string sense;
+ db->exec(
+ sql::query("SELECT sense, gloss FROM gloss WHERE lang=%Q AND entry=%s "
+ "ORDER BY sense") % options::target % *value,
+ showGloss, &sense);
+ ++entries;
+ return 0;
+}
+
+string compare() {
+ if (options::fulltext)
+ return " LIKE '%%%q%%'";
+ if (options::beginning)
+ return " LIKE '%q%%'";
+ if (options::ci_search)
+ return " LIKE %Q";
+ return "=%Q";
+}
+
+void fromRomaji(const string& r) {
+ db->exec(
+ sql::query("SELECT DISTINCT entry FROM reading WHERE romaji" + compare()) % r,
+ showEntry);
+}
+
+void fromJapanese(const string& j) {
+ db->exec(
+ sql::query("SELECT DISTINCT entry FROM reading WHERE kana" + compare()) % j,
+ showEntry);
+ db->exec(
+ sql::query("SELECT DISTINCT entry FROM kanji WHERE kanji" + compare()) % j,
+ showEntry);
+}
+
+void toJapanese(const string& e) {
+ sql::query q;
+ q = "SELECT DISTINCT entry FROM gloss WHERE lang=%Q AND gloss" + compare();
+ db->exec(q % options::target % e, showEntry);
+}
+
+void guessLanguage(const std::string& subject) {
+ bool isUTF8 = subject[0] & 0x80;
+ if (options::source == options::JAPANESE && !isUTF8)
+ options::source = options::JAPANESE_ROMAJI;
+ else if (options::source == options::UNKNOWN)
+ options::source = isUTF8 ? options::JAPANESE : options::NOT_JAPANESE;
+}
+
+int main(int argc, char** argv)
+try {
+ initRomaji();
+ options::getFrom(argc, argv);
+ if (optind == argc) {
+ usage();
+ return EXIT_FAILURE;
+ }
+ string subject = argv[optind];
+ db.reset(new sql::db(DICTIONARY_PATH));
+
+ guessLanguage(subject);
+ if (options::source == options::JAPANESE)
+ fromJapanese(subject);
+ else if (options::source == options::JAPANESE_ROMAJI)
+ fromRomaji(subject);
+ else
+ toJapanese(subject);
+ cout << entries << " match(es) found." << endl;
+
+ return EXIT_SUCCESS;
+}
+catch(const std::exception& e)
+{
+ cerr << e.what() << '\n';
+ return EXIT_FAILURE;
+}
diff --git a/jmdict_import.cpp b/jmdict_import.cpp
new file mode 100644
index 0000000..4e16deb
--- /dev/null
+++ b/jmdict_import.cpp
@@ -0,0 +1,149 @@
+/*
+jmdict, a frontend to the JMdict file. http://mandrill.fuxx0r.net/jmdict.php
+Copyright (C) 2004 Florian Bluemel (florian.bluemel@uni-dortmund.de)
+
+This program is free software; you can redistribute it and/or
+modify it under the terms of the GNU General Public License
+as published by the Free Software Foundation; either version 2
+of the License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+*/
+#include <cstdlib>
+#include <exception>
+#include <ostream>
+#include <iostream>
+#include <fstream>
+#include <ctime>
+#include <string>
+#include <stack>
+#include "sqlite.h"
+#include "xmlparser.h"
+#include "kana2romaji.h"
+using namespace std;
+
+class Dictionary {
+public:
+ Dictionary(const string& name) : db(name) {
+ db.exec("DROP TABLE kanji");
+ db.exec("DROP TABLE reading");
+ db.exec("DROP TABLE gloss");
+ db.exec("CREATE TABLE kanji (entry INT NOT NULL, kanji TINYTEXT NOT NULL)");
+ db.exec("CREATE TABLE reading (entry INT NOT NULL, kana TINYTEXT NOT NULL, romaji TINYTEXT NOT NULL)");
+ db.exec("CREATE TABLE gloss (entry INT NOT NULL, sense INT NOT NULL, lang TINYTEXT NOT NULL, gloss TEXT NOT NULL)");
+ db.exec("BEGIN");
+ }
+
+ ~Dictionary() {
+ db.exec("COMMIT");
+ }
+
+ void createIndices() {
+ db.exec("CREATE INDEX k_entry ON kanji (entry)");
+ db.exec("CREATE INDEX r_entry ON reading (entry)");
+ db.exec("CREATE INDEX r_kana ON reading (kana)");
+ db.exec("CREATE INDEX r_romaji ON reading (romaji)");
+ db.exec("CREATE INDEX g_entry ON gloss (entry)");
+ db.exec("CREATE INDEX g_gloss ON gloss (gloss)");
+ }
+
+ void push(const xml::Tag& tag) {
+ tags.push(tag);
+ }
+
+ xml::Tag& top() {
+ return tags.top();
+ }
+
+ void pop() {
+ xml::Tag& tag = top();
+ if (tag.name() == "ent_seq") {
+ entry_seq = atoi(tag.text().c_str());
+ sense_seq = 1;
+ }
+ else if (tag.name() == "keb")
+ insert_kanji(tag.text());
+ else if (tag.name() == "reb")
+ insert_reading(tag.text());
+ else if (tag.name() == "sense")
+ ++sense_seq;
+ else if (tag.name() == "gloss")
+ insert_gloss(tag.attribute("xml:lang"), tag.text());
+ tags.pop();
+ }
+
+private:
+ void insert_kanji(const string& kanji) {
+ db.exec(sql::query("INSERT INTO kanji (entry, kanji) VALUES (%u, %Q)") % entry_seq % kanji);
+ }
+
+ void insert_reading(const string& reading) {
+ string romaji;
+ kana2romaji(reading, romaji);
+ db.exec(sql::query("INSERT INTO reading (entry, kana, romaji) VALUES (%u, %Q, %Q)") % entry_seq % reading % romaji);
+ }
+
+ void insert_gloss(string lang, const string& text) {
+ if (lang == "")
+ lang = "en";
+ db.exec(
+ sql::query("INSERT INTO gloss (entry, sense, lang, gloss) "
+ "VALUES (%u, %u, %Q, %Q)") % entry_seq % sense_seq % lang % text);
+
+ static unsigned seq = 0;
+ if (++seq % 50000 == 0) {
+ db.exec("COMMIT");
+ db.exec("BEGIN");
+ }
+ }
+
+ stack<xml::Tag> tags;
+ unsigned entry_seq;
+ unsigned sense_seq;
+ sql::db db;
+};
+
+int main(int argc, char** argv)
+try {
+ if(argc < 2 || argc > 3) {
+ cerr << "Usage: jmdict_import <dictfile> [dest_dir]\n";
+ return EXIT_FAILURE;
+ }
+
+ const string dict_file = argv[1],
+ database_name = argc == 2 ? DICTIONARY_PATH : string(argv[2]) + DICTIONARY_PATH;
+
+ initRomaji();
+ if (std::remove(database_name.c_str()) == 0)
+ std::cout << "removed old dictionary database\n";
+
+ Dictionary dict(database_name);
+ xml::Parser<Dictionary> parser(dict);
+
+ ifstream in(dict_file.c_str());
+ if (!in.is_open()) {
+ cerr << "could not open dictionary file '" << dict_file << "'\n";
+ return EXIT_FAILURE;
+ }
+ cout << "filling database... " << flush;
+ time_t start = time(0);
+ parser.parse(in);
+ cout << time(0) - start << "s" << endl;
+ cout << "creating indices... " << flush;
+ start = time(0);
+ dict.createIndices();
+ cout << time(0) - start << "s" << endl;
+
+ return EXIT_SUCCESS;
+}
+catch (const std::exception& e) {
+ cerr << e.what() << '\n';
+ return EXIT_FAILURE;
+}
diff --git a/kana2romaji.cpp b/kana2romaji.cpp
new file mode 100644
index 0000000..2ba77fd
--- /dev/null
+++ b/kana2romaji.cpp
@@ -0,0 +1,362 @@
+/*
+jmdict, a frontend to the JMdict file. http://mandrill.fuxx0r.net/jmdict.php
+Copyright (C) 2004 Florian Bluemel (florian.bluemel@uni-dortmund.de)
+
+This program is free software; you can redistribute it and/or
+modify it under the terms of the GNU General Public License
+as published by the Free Software Foundation; either version 2
+of the License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+*/
+// encoding: utf-8
+#include "kana2romaji.h"
+#include <map>
+#include <iostream>
+#include <ostream>
+#include <string>
+
+using namespace std;
+
+namespace {
+void utfchar(const string& from, string::size_type pos, string& to) {
+ string::value_type first = from[pos];
+ if ((first & 0x80) == 0)
+ to = from[pos];
+ else {
+ string::size_type len = 0;
+ while (first & 0x80) {
+ ++len;
+ first <<= 1;
+ }
+ to = from.substr(pos, len);
+ }
+}
+}
+
+typedef map<string, string> romaji_map;
+romaji_map romaji;
+
+void initRomaji() {
+ // -- hiragana -----
+ romaji["あ"] = "a";
+ romaji["い"] = "i";
+ romaji["う"] = "u";
+ romaji["え"] = "e";
+ romaji["お"] = "o";
+ romaji["か"] = "ka";
+ romaji["き"] = "ki";
+ romaji["く"] = "ku";
+ romaji["け"] = "ke";
+ romaji["こ"] = "ko";
+ romaji["さ"] = "sa";
+ romaji["し"] = "shi";
+ romaji["す"] = "su";
+ romaji["せ"] = "se";
+ romaji["そ"] = "so";
+ romaji["た"] = "ta";
+ romaji["ち"] = "chi";
+ romaji["つ"] = "tsu";
+ romaji["て"] = "te";
+ romaji["と"] = "to";
+ romaji["な"] = "na";
+ romaji["に"] = "ni";
+ romaji["ぬ"] = "nu";
+ romaji["ね"] = "ne";
+ romaji["の"] = "no";
+ romaji["は"] = "ha";
+ romaji["ひ"] = "hi";
+ romaji["ふ"] = "fu";
+ romaji["へ"] = "he";
+ romaji["ほ"] = "ho";
+ romaji["ま"] = "ma";
+ romaji["み"] = "mi";
+ romaji["む"] = "mu";
+ romaji["め"] = "me";
+ romaji["も"] = "mo";
+ romaji["や"] = "ya";
+ romaji["ゆ"] = "yu";
+ romaji["よ"] = "yo";
+ romaji["ら"] = "ra";
+ romaji["り"] = "ri";
+ romaji["る"] = "ru";
+ romaji["れ"] = "re";
+ romaji["ろ"] = "ro";
+ romaji["わ"] = "wa";
+ romaji["ゐ"] = "wi";
+ romaji["ゑ"] = "we";
+ romaji["を"] = "wo";
+ romaji["ん"] = "n";
+
+ romaji["ぁ"] = "\1a";
+ romaji["ぃ"] = "\1i";
+ romaji["ぇ"] = "\1e";
+ romaji["ぉ"] = "\1o";
+ romaji["ゃ"] = "\1ya";
+ romaji["ゅ"] = "\1yu";
+ romaji["ょ"] = "\1yo";
+ romaji["っ"] = "\2";
+
+ romaji["ゔ"] = "vu";
+ romaji["が"] = "ga";
+ romaji["ぎ"] = "gi";
+ romaji["ぐ"] = "gu";
+ romaji["げ"] = "ge";
+ romaji["ご"] = "go";
+ romaji["ざ"] = "za";
+ romaji["じ"] = "ji";
+ romaji["ず"] = "zu";
+ romaji["ぜ"] = "ze";
+ romaji["ぞ"] = "zo";
+ romaji["だ"] = "da";
+ romaji["ぢ"] = "dzi";
+ romaji["づ"] = "dzu";
+ romaji["で"] = "de";
+ romaji["ど"] = "do";
+ romaji["ば"] = "ba";
+ romaji["び"] = "bi";
+ romaji["ぶ"] = "bu";
+ romaji["べ"] = "be";
+ romaji["ぼ"] = "bo";
+ romaji["ぱ"] = "pa";
+ romaji["ぴ"] = "pi";
+ romaji["ぷ"] = "pu";
+ romaji["ぺ"] = "pe";
+ romaji["ぽ"] = "po";
+
+ // -- katakana -----
+ romaji["ア"] = "a";
+ romaji["イ"] = "i";
+ romaji["ウ"] = "u";
+ romaji["エ"] = "e";
+ romaji["オ"] = "o";
+ romaji["カ"] = "ka";
+ romaji["キ"] = "ki";
+ romaji["ク"] = "ku";
+ romaji["ケ"] = "ke";
+ romaji["コ"] = "ko";
+ romaji["サ"] = "sa";
+ romaji["シ"] = "shi";
+ romaji["ス"] = "su";
+ romaji["セ"] = "se";
+ romaji["ソ"] = "so";
+ romaji["タ"] = "ta";
+ romaji["チ"] = "chi";
+ romaji["ツ"] = "tsu";
+ romaji["テ"] = "te";
+ romaji["ト"] = "to";
+ romaji["ナ"] = "na";
+ romaji["ニ"] = "ni";
+ romaji["ヌ"] = "nu";
+ romaji["ネ"] = "ne";
+ romaji["ノ"] = "no";
+ romaji["ハ"] = "ha";
+ romaji["ヒ"] = "hi";
+ romaji["フ"] = "fu";
+ romaji["ヘ"] = "he";
+ romaji["ホ"] = "ho";
+ romaji["マ"] = "ma";
+ romaji["ミ"] = "mi";
+ romaji["ム"] = "mu";
+ romaji["メ"] = "me";
+ romaji["モ"] = "mo";
+ romaji["ヤ"] = "ya";
+ romaji["ユ"] = "yu";
+ romaji["ヨ"] = "yo";
+ romaji["ラ"] = "ra";
+ romaji["リ"] = "ri";
+ romaji["ル"] = "ru";
+ romaji["レ"] = "re";
+ romaji["ロ"] = "ro";
+ romaji["ワ"] = "wa";
+ romaji["ヰ"] = "wi";
+ romaji["ヱ"] = "we";
+ romaji["ヲ"] = "wo";
+ romaji["ン"] = "n";
+
+ romaji["ァ"] = "\1a";
+ romaji["ィ"] = "\1i";
+ romaji["ゥ"] = "\1u";
+ romaji["ェ"] = "\1e";
+ romaji["ォ"] = "\1o";
+ romaji["ヮ"] = "\1wa";
+ romaji["ャ"] = "\1ya";
+ romaji["ュ"] = "\1yu";
+ romaji["ョ"] = "\1yo";
+ romaji["ッ"] = "\2";
+
+ romaji["ヴ"] = "vu";
+ romaji["ガ"] = "ga";
+ romaji["ギ"] = "gi";
+ romaji["グ"] = "gu";
+ romaji["ゲ"] = "ge";
+ romaji["ゴ"] = "go";
+ romaji["ザ"] = "za";
+ romaji["ジ"] = "ji";
+ romaji["ズ"] = "zu";
+ romaji["ゼ"] = "ze";
+ romaji["ゾ"] = "zo";
+ romaji["ダ"] = "da";
+ romaji["ヂ"] = "dzi";
+ romaji["ヅ"] = "dzu";
+ romaji["デ"] = "de";
+ romaji["ド"] = "do";
+ romaji["バ"] = "ba";
+ romaji["ビ"] = "bi";
+ romaji["ブ"] = "bu";
+ romaji["ベ"] = "be";
+ romaji["ボ"] = "bo";
+ romaji["パ"] = "pa";
+ romaji["ピ"] = "pi";
+ romaji["プ"] = "pu";
+ romaji["ペ"] = "pe";
+ romaji["ポ"] = "po";
+ romaji["ー"] = "";
+
+ // -- double width letters ------
+ romaji["A"] = "A";
+ romaji["B"] = "B";
+ romaji["C"] = "C";
+ romaji["D"] = "D";
+ romaji["E"] = "E";
+ romaji["F"] = "F";
+ romaji["G"] = "G";
+ romaji["H"] = "H";
+ romaji["I"] = "I";
+ romaji["J"] = "J";
+ romaji["K"] = "K";
+ romaji["L"] = "L";
+ romaji["M"] = "M";
+ romaji["N"] = "N";
+ romaji["O"] = "O";
+ romaji["P"] = "P";
+ romaji["Q"] = "Q";
+ romaji["R"] = "R";
+ romaji["S"] = "S";
+ romaji["T"] = "T";
+ romaji["U"] = "U";
+ romaji["V"] = "V";
+ romaji["W"] = "W";
+ romaji["X"] = "X";
+ romaji["Y"] = "Y";
+ romaji["Z"] = "Z";
+
+ romaji["a"] = "a";
+ romaji["b"] = "b";
+ romaji["c"] = "c";
+ romaji["d"] = "d";
+ romaji["e"] = "e";
+ romaji["f"] = "f";
+ romaji["g"] = "g";
+ romaji["h"] = "h";
+ romaji["i"] = "i";
+ romaji["j"] = "j";
+ romaji["k"] = "k";
+ romaji["l"] = "l";
+ romaji["m"] = "m";
+ romaji["n"] = "n";
+ romaji["o"] = "o";
+ romaji["p"] = "p";
+ romaji["q"] = "q";
+ romaji["r"] = "r";
+ romaji["s"] = "s";
+ romaji["t"] = "t";
+ romaji["u"] = "u";
+ romaji["v"] = "v";
+ romaji["w"] = "w";
+ romaji["x"] = "x";
+ romaji["y"] = "y";
+ romaji["z"] = "z";
+
+ romaji["0"] = "0";
+ romaji["1"] = "1";
+ romaji["2"] = "2";
+ romaji["3"] = "3";
+ romaji["4"] = "4";
+ romaji["5"] = "5";
+ romaji["6"] = "6";
+ romaji["7"] = "7";
+ romaji["8"] = "8";
+ romaji["9"] = "9";
+
+ romaji["!"] = "!";
+ romaji["""] = "\"";
+ romaji["#"] = "#";
+ romaji["$"] = "$";
+ romaji["%"] = "%";
+ romaji["&"] = "&";
+ romaji["'"] = "'"; // TODO:
+ romaji["("] = "(";
+ romaji[")"] = ")";
+ romaji["*"] = "*";
+ romaji["+"] = "+";
+ romaji[","] = ",";
+ romaji["-"] = "-";
+ romaji["."] = ".";
+ romaji["/"] = "/";
+
+ romaji[":"] = ":";
+ romaji[";"] = ";";
+ romaji["<"] = "<";
+ romaji["="] = "=";
+ romaji[">"] = ">";
+ romaji["?"] = "?";
+ romaji["@"] = "@";
+
+ romaji["["] = "[";
+ romaji["\"] = "\\";
+ romaji["]"] = "]";
+ romaji["^"] = "^";
+ romaji["_"] = "_";
+ romaji["`"] = "`";
+
+ romaji["{"] = "{";
+ romaji["|"] = "|";
+ romaji["}"] = "}";
+ romaji["~"] = "~";
+
+
+ // don't know where those belong to
+ romaji["〜"] = "~";
+ romaji["、"] = ","; // TODO:
+ romaji["−"] = "-";
+
+ romaji[" "] = " ";
+ romaji["―"] = "-";
+ romaji["・"] = "-"; // FIXME
+}
+
+void kana2romaji(const string& kana, string& rom) {
+ rom.clear();
+ for (string::size_type pos = 0; pos < kana.size(); ) {
+ string ch;
+ utfchar(kana, pos, ch);
+ romaji_map::const_iterator trans = romaji.find(ch);
+ if (trans == romaji.end()) {
+ rom += ch;
+ if (ch.size() > 1)
+ cout << "Don't know how to translate '" << ch << "' in '" << kana << "' to romaji." << endl;
+ }
+ else
+ rom += trans->second;
+ pos += ch.size();
+ }
+ for (string::size_type pos = 0; pos < rom.size(); ++pos)
+ if (rom[pos] == '\1') {
+ string::size_type from = pos, count = 1;
+ if (pos > 1 && (rom[pos - 2] == 'h' || rom[pos - 2] == 'j')) {
+ --from;
+ count = (pos + 1 < rom.size() && rom[pos + 1] == 'y') ? 3 : 2;
+ }
+ rom.erase(from, count);
+ }
+ else if (rom[pos] == '\2' && pos + 1 < rom.size())
+ rom[pos] = rom[pos + 1];
+}
diff --git a/kana2romaji.h b/kana2romaji.h
new file mode 100644
index 0000000..bf5ebca
--- /dev/null
+++ b/kana2romaji.h
@@ -0,0 +1,22 @@
+/*
+jmdict, a frontend to the JMdict file. http://mandrill.fuxx0r.net/jmdict.php
+Copyright (C) 2004 Florian Bluemel (florian.bluemel@uni-dortmund.de)
+
+This program is free software; you can redistribute it and/or
+modify it under the terms of the GNU General Public License
+as published by the Free Software Foundation; either version 2
+of the License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+*/
+#include <string>
+
+void initRomaji();
+void kana2romaji(const std::string& kana, std::string& romaji);
diff --git a/sqlite.h b/sqlite.h
new file mode 100644
index 0000000..0b73ea8
--- /dev/null
+++ b/sqlite.h
@@ -0,0 +1,121 @@
+/*
+jmdict, a frontend to the JMdict file. http://mandrill.fuxx0r.net/jmdict.php
+Copyright (C) 2004 Florian Bluemel (florian.bluemel@uni-dortmund.de)
+
+This program is free software; you can redistribute it and/or
+modify it under the terms of the GNU General Public License
+as published by the Free Software Foundation; either version 2
+of the License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+*/
+#include <cstring>
+#include <string>
+#include <cctype>
+#include <stdexcept>
+#include <sqlite3.h>
+
+namespace sql {
+ template<typename T> struct query_traits;
+ template<> struct query_traits<const char* const> { static std::string chars() { return "qQs"; } };
+ template<> struct query_traits<char* const> { static std::string chars() { return "qQs"; } };
+ template<> struct query_traits<const int> { static std::string chars() { return "cdi"; } };
+ template<> struct query_traits<const unsigned> { static std::string chars() { return "ouxX"; } };
+ typedef int (*callback)(void*, int, char**, char**);
+
+ class query {
+ class mem_guard {
+ public:
+ mem_guard(char* p) : p(p) {}
+ ~mem_guard() { sqlite3_free(p); }
+ operator const char*() const { return p; }
+ private:
+ char* p;
+ };
+
+ public:
+ explicit query(const std::string& q = "") : q(q), pos(0) {
+ ff();
+ }
+
+ query& operator=(const std::string& s) {
+ q = s;
+ pos = 0;
+ ff();
+ return *this;
+ }
+
+ class wrong_format : public std::logic_error {
+ public:
+ wrong_format(char type, const std::string& format)
+ : std::logic_error(std::string("wrong format string in sql query, expected one of the following: ")
+ + format + ", got " + type + ".") {}
+ };
+
+ template<typename T> query& operator%(const T& t)
+ {
+ if(type >= q.size() || query_traits<const T>::chars().find(q[type]) == std::string::npos)
+ throw wrong_format(q[type], query_traits<const T>::chars());
+ mem_guard repl(sqlite3_mprintf(q.substr(pos, type - pos + 1).c_str(), t));
+ put(repl);
+ return *this;
+ }
+
+ query& operator%(const std::string& s) {
+ return *this % s.c_str();
+ }
+
+ const std::string& str() const {
+ return q;
+ }
+
+ private:
+ void put(const char* repl) {
+ q.replace(pos, type - pos + 1, repl);
+ pos += strlen(repl);
+ ff();
+ }
+
+ void ff() {
+ static const std::string CONV("diouxXeEfFgGaAcsCSPnqQ");
+ pos = q.find('%', pos);
+ while (pos < q.size() - 1 && q[pos + 1] == '%')
+ pos = q.find('%', pos + 2);
+ type = pos;
+ while (type < q.size() && CONV.find(q[type]) == std::string::npos)
+ ++type;
+ }
+
+ std::string q;
+ std::string::size_type pos;
+ std::string::size_type type;
+ };
+
+ struct db {
+ explicit db(const std::string& name) {
+ if (sqlite3_open(name.c_str(), &raw) != SQLITE_OK)
+ throw std::runtime_error("Could not connect to sqlite database '" + name + "'.");
+ }
+ ~db() {
+ sqlite3_close(raw);
+ }
+
+ void exec(const std::string& query, callback cb = 0, void* arg = 0) {
+ sqlite3_exec(raw, query.c_str(), cb, arg, 0);
+ }
+
+ void exec(const query& query, callback cb = 0, void* arg = 0) {
+ exec(query.str(), cb, arg);
+ }
+
+ private:
+ sqlite3* raw;
+ };
+} // namespace sql
diff --git a/xmlparser.h b/xmlparser.h
new file mode 100644
index 0000000..29745eb
--- /dev/null
+++ b/xmlparser.h
@@ -0,0 +1,107 @@
+/*
+jmdict, a frontend to the JMdict file. http://mandrill.fuxx0r.net/jmdict.php
+Copyright (C) 2004 Florian Bluemel (florian.bluemel@uni-dortmund.de)
+
+This program is free software; you can redistribute it and/or
+modify it under the terms of the GNU General Public License
+as published by the Free Software Foundation; either version 2
+of the License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+*/
+#include <expat.h>
+#include <istream>
+#include <string>
+#include <map>
+
+namespace xml {
+
+class Tag {
+public:
+ Tag(const std::string& name, const char** attrs) : m_name(name) {
+ while (*attrs) {
+ m_attributes[*attrs] = *(attrs + 1);
+ attrs += 2;
+ }
+ }
+
+ const std::string& name() const {
+ return m_name;
+ }
+
+ std::string attribute(const std::string& name) const {
+ std::map<std::string, std::string>::const_iterator val = m_attributes.find(name);
+ if (val == m_attributes.end())
+ return "";
+ return val->second;
+ }
+
+ const std::string& text() const {
+ return m_text;
+ }
+
+ void append(const std::string& t) {
+ m_text += t;
+ }
+
+ void append(const char* t, int len) {
+ m_text.append(t, len);
+ }
+
+private:
+ std::string m_name;
+ std::string m_text;
+ std::map<std::string, std::string> m_attributes;
+};
+
+template<class Stack>
+class Parser {
+public:
+ Parser(Stack& stack) : m_stack(stack) {
+ m_parser = XML_ParserCreate(0);
+ XML_SetUserData(m_parser, this);
+ XML_SetElementHandler(m_parser, &Parser::start, &Parser::end);
+ XML_SetCharacterDataHandler(m_parser, Parser::chardata);
+ }
+
+ ~Parser() {
+ XML_ParserFree(m_parser);
+ }
+
+ void parse(std::istream& in) {
+ const size_t BLOCK_SIZE = 1 << 15;
+ while (in) {
+ char* buffer = static_cast<char*>(XML_GetBuffer(m_parser, BLOCK_SIZE));
+ in.read(buffer, BLOCK_SIZE);
+ XML_ParseBuffer(m_parser, in.gcount(), in.eof());
+ }
+ }
+
+private:
+ static void start(void* data, const char* e, const char** a) {
+ Parser& parser = *static_cast<Parser*>(data);
+ parser.m_stack.push(Tag(e, a));
+ }
+
+ static void chardata(void* data, const XML_Char* text, int len) {
+ Parser& parser = *static_cast<Parser*>(data);
+ parser.m_stack.top().append(text, len);
+ }
+
+ static void end(void* data, const char*) {
+ Parser& parser = *static_cast<Parser*>(data);
+ parser.m_stack.pop();
+ }
+
+ XML_Parser m_parser;
+ Stack& m_stack;
+};
+
+} // namespace xml