Parser for pure LaTeX

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












Is it possible to create a parser for pure LaTeX (no plain TeX, no TeX primitive) without using any TeX engine that supports total LaTeX?
I know about an iOS app to create and typeset LaTeX and I don't think that they are using a TeX engine.
Is there such a Parser?










share|improve this question



















  • 2




    Probably yes. Maybe no. What exactly do you want/need?
    – Johannes_B
    3 hours ago






  • 1




    Could you elaborate on how this differs from your other question? At the least, that had much more detail, and several worthwhile comments that could help you better phrase your question.
    – Teepeemm
    3 hours ago






  • 3




    Although this question is about parsing TeX code at its lowest (the way I interpreted it, at least) I think that the tex-core tag is to the inner working of TeX itself, and you specifically ruled it out, so I would remove it... Either way, I think it's a good question but requires more details. You want to parse only proper LaTeX code, or plain TeX needs to be included. Packages considered? Other formats? Please explain you question better or it will probably be closed as "too broad".
    – Phelype Oleinik
    3 hours ago










  • A clearly written Turing complete LaTeX parser in a high level language would be lovely to see.
    – Anush
    2 hours ago







  • 2




    What is pure LaTeX in your view? Are commands like def (TeX command, but used in many LaTeX documents) out of scope? And what should the output of the parser be?
    – TeXnician
    2 hours ago














up vote
2
down vote

favorite












Is it possible to create a parser for pure LaTeX (no plain TeX, no TeX primitive) without using any TeX engine that supports total LaTeX?
I know about an iOS app to create and typeset LaTeX and I don't think that they are using a TeX engine.
Is there such a Parser?










share|improve this question



















  • 2




    Probably yes. Maybe no. What exactly do you want/need?
    – Johannes_B
    3 hours ago






  • 1




    Could you elaborate on how this differs from your other question? At the least, that had much more detail, and several worthwhile comments that could help you better phrase your question.
    – Teepeemm
    3 hours ago






  • 3




    Although this question is about parsing TeX code at its lowest (the way I interpreted it, at least) I think that the tex-core tag is to the inner working of TeX itself, and you specifically ruled it out, so I would remove it... Either way, I think it's a good question but requires more details. You want to parse only proper LaTeX code, or plain TeX needs to be included. Packages considered? Other formats? Please explain you question better or it will probably be closed as "too broad".
    – Phelype Oleinik
    3 hours ago










  • A clearly written Turing complete LaTeX parser in a high level language would be lovely to see.
    – Anush
    2 hours ago







  • 2




    What is pure LaTeX in your view? Are commands like def (TeX command, but used in many LaTeX documents) out of scope? And what should the output of the parser be?
    – TeXnician
    2 hours ago












up vote
2
down vote

favorite









up vote
2
down vote

favorite











Is it possible to create a parser for pure LaTeX (no plain TeX, no TeX primitive) without using any TeX engine that supports total LaTeX?
I know about an iOS app to create and typeset LaTeX and I don't think that they are using a TeX engine.
Is there such a Parser?










share|improve this question















Is it possible to create a parser for pure LaTeX (no plain TeX, no TeX primitive) without using any TeX engine that supports total LaTeX?
I know about an iOS app to create and typeset LaTeX and I don't think that they are using a TeX engine.
Is there such a Parser?







parsing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 hours ago

























asked 3 hours ago









John webner

483




483







  • 2




    Probably yes. Maybe no. What exactly do you want/need?
    – Johannes_B
    3 hours ago






  • 1




    Could you elaborate on how this differs from your other question? At the least, that had much more detail, and several worthwhile comments that could help you better phrase your question.
    – Teepeemm
    3 hours ago






  • 3




    Although this question is about parsing TeX code at its lowest (the way I interpreted it, at least) I think that the tex-core tag is to the inner working of TeX itself, and you specifically ruled it out, so I would remove it... Either way, I think it's a good question but requires more details. You want to parse only proper LaTeX code, or plain TeX needs to be included. Packages considered? Other formats? Please explain you question better or it will probably be closed as "too broad".
    – Phelype Oleinik
    3 hours ago










  • A clearly written Turing complete LaTeX parser in a high level language would be lovely to see.
    – Anush
    2 hours ago







  • 2




    What is pure LaTeX in your view? Are commands like def (TeX command, but used in many LaTeX documents) out of scope? And what should the output of the parser be?
    – TeXnician
    2 hours ago












  • 2




    Probably yes. Maybe no. What exactly do you want/need?
    – Johannes_B
    3 hours ago






  • 1




    Could you elaborate on how this differs from your other question? At the least, that had much more detail, and several worthwhile comments that could help you better phrase your question.
    – Teepeemm
    3 hours ago






  • 3




    Although this question is about parsing TeX code at its lowest (the way I interpreted it, at least) I think that the tex-core tag is to the inner working of TeX itself, and you specifically ruled it out, so I would remove it... Either way, I think it's a good question but requires more details. You want to parse only proper LaTeX code, or plain TeX needs to be included. Packages considered? Other formats? Please explain you question better or it will probably be closed as "too broad".
    – Phelype Oleinik
    3 hours ago










  • A clearly written Turing complete LaTeX parser in a high level language would be lovely to see.
    – Anush
    2 hours ago







  • 2




    What is pure LaTeX in your view? Are commands like def (TeX command, but used in many LaTeX documents) out of scope? And what should the output of the parser be?
    – TeXnician
    2 hours ago







2




2




Probably yes. Maybe no. What exactly do you want/need?
– Johannes_B
3 hours ago




Probably yes. Maybe no. What exactly do you want/need?
– Johannes_B
3 hours ago




1




1




Could you elaborate on how this differs from your other question? At the least, that had much more detail, and several worthwhile comments that could help you better phrase your question.
– Teepeemm
3 hours ago




Could you elaborate on how this differs from your other question? At the least, that had much more detail, and several worthwhile comments that could help you better phrase your question.
– Teepeemm
3 hours ago




3




3




Although this question is about parsing TeX code at its lowest (the way I interpreted it, at least) I think that the tex-core tag is to the inner working of TeX itself, and you specifically ruled it out, so I would remove it... Either way, I think it's a good question but requires more details. You want to parse only proper LaTeX code, or plain TeX needs to be included. Packages considered? Other formats? Please explain you question better or it will probably be closed as "too broad".
– Phelype Oleinik
3 hours ago




Although this question is about parsing TeX code at its lowest (the way I interpreted it, at least) I think that the tex-core tag is to the inner working of TeX itself, and you specifically ruled it out, so I would remove it... Either way, I think it's a good question but requires more details. You want to parse only proper LaTeX code, or plain TeX needs to be included. Packages considered? Other formats? Please explain you question better or it will probably be closed as "too broad".
– Phelype Oleinik
3 hours ago












A clearly written Turing complete LaTeX parser in a high level language would be lovely to see.
– Anush
2 hours ago





A clearly written Turing complete LaTeX parser in a high level language would be lovely to see.
– Anush
2 hours ago





2




2




What is pure LaTeX in your view? Are commands like def (TeX command, but used in many LaTeX documents) out of scope? And what should the output of the parser be?
– TeXnician
2 hours ago




What is pure LaTeX in your view? Are commands like def (TeX command, but used in many LaTeX documents) out of scope? And what should the output of the parser be?
– TeXnician
2 hours ago










2 Answers
2






active

oldest

votes

















up vote
3
down vote













If you write a parser you can define the subset of latex that you support. (There isn't really a useful definition of "Pure LaTeX with no primitives".)



For instance MathJax has a parser for a subset of LaTeX math markup, written in JavaScript, and LaTeXML has a parser for almost complete TeX written in perl, which does not include any TeX execution. LaTeXML's parser is perhaps the closest to what you ask, as far as I understand the question. https://github.com/brucemiller/LaTeXML



Here is an example that only uses commands defined in core latex (the shortvrb package is part of the base LaTeX2e release, so it is as fundamental art of latex as say section which is defined in article class from the same base release files



documentclassarticle
usepackageshortvrb


begindocument

MakeShortVerb*

bfseries ** some text

DeleteShortVerb*

bfseries ** some text

enddocument


Note that it is not possible to statically assign any tokenisation to *}* in the first case it produces the two character tokens improve this answer





















  • I know that parsers like pandoc can only parse a subset of the commands. E.g. Pandoc can not parse emphtextsectiontexttext. I am looking for a full pure LaTeX parser.
    – John webner
    2 hours ago










  • @Johnwebner Tbh, you shouldn't use such markup, but of course a parser might want to spit out something…
    – TeXnician
    2 hours ago










  • I think you are better off using tex then. doubt there can be a full parser without implementing tex or a subset.
    – GrandFleet
    2 hours ago














up vote
3
down vote













If you write a parser you can define the subset of latex that you support. (There isn't really a useful definition of "Pure LaTeX with no primitives".)



For instance MathJax has a parser for a subset of LaTeX math markup, written in JavaScript, and LaTeXML has a parser for almost complete TeX written in perl, which does not include any TeX execution. LaTeXML's parser is perhaps the closest to what you ask, as far as I understand the question. https://github.com/brucemiller/LaTeXML



Here is an example that only uses commands defined in core latex (the shortvrb package is part of the base LaTeX2e release, so it is as fundamental art of latex as say section which is defined in article class from the same base release files



documentclassarticle
usepackageshortvrb


begindocument

MakeShortVerb*

bfseries ** some text

DeleteShortVerb*

bfseries ** some text

enddocument


Note that it is not possible to statically assign any tokenisation to ** in the first case it produces the two character tokens  








up vote
3
down vote










up vote
3
down vote









If you write a parser you can define the subset of latex that you support. (There isn't really a useful definition of "Pure LaTeX with no primitives".)



For instance MathJax has a parser for a subset of LaTeX math markup, written in JavaScript, and LaTeXML has a parser for almost complete TeX written in perl, which does not include any TeX execution. LaTeXML's parser is perhaps the closest to what you ask, as far as I understand the question. https://github.com/brucemiller/LaTeXML



Here is an example that only uses commands defined in core latex (the shortvrb package is part of the base LaTeX2e release, so it is as fundamental art of latex as say section which is defined in article class from the same base release files



documentclassarticle
usepackageshortvrb


begindocument

MakeShortVerb*

bfseries ** some text

DeleteShortVerb*

bfseries ** some text

enddocument


Note that it is not possible to statically assign any tokenisation to ** in the first case it produces the two character tokens improve this answer















If you write a parser you can define the subset of latex that you support. (There isn't really a useful definition of "Pure LaTeX with no primitives".)



For instance MathJax has a parser for a subset of LaTeX math markup, written in JavaScript, and LaTeXML has a parser for almost complete TeX written in perl, which does not include any TeX execution. LaTeXML's parser is perhaps the closest to what you ask, as far as I understand the question. https://github.com/brucemiller/LaTeXML



Here is an example that only uses commands defined in core latex (the shortvrb package is part of the base LaTeX2e release, so it is as fundamental art of latex as say section which is defined in article class from the same base release files



documentclassarticle
usepackageshortvrb


begindocument

MakeShortVerb*

bfseries ** some text

DeleteShortVerb*

bfseries ** some text

enddocument


Note that it is not possible to statically assign any tokenisation to ** in the first case it produces the two character tokens { in the second case it produces two character tokens ** (the first one being bold).



It would be reasonable to produce a LaTeX parser for a subset of the language that did not include this kind of construct, but you need to define the subset it isn't enough to say "not plain TeX or primitives" there are plain constructs that can be easily parsed, and there are LaTeX constructions that can not be parsed in general without access to a full tex typesetting system.







share|improve this answer














share|improve this answer



share|improve this answer








edited 17 mins ago

























answered 2 hours ago









David Carlisle

473k3811011832




473k3811011832




















      up vote
      0
      down vote













      I think this already occurs for document conversion software such as pandoc, and others on the internet. Generally speaking these converters only parse a subset of the commands. In addition regex can be used to extract certain tags of interest.






      share|improve this answer




















      • I know that parsers like pandoc can only parse a subset of the commands. E.g. Pandoc can not parse emphtextsectiontexttext. I am looking for a full pure LaTeX parser.
        – John webner
        2 hours ago










      • @Johnwebner Tbh, you shouldn't use such markup, but of course a parser might want to spit out something…
        – TeXnician
        2 hours ago










      • I think you are better off using tex then. doubt there can be a full parser without implementing tex or a subset.
        – GrandFleet
        2 hours ago














      up vote
      0
      down vote













      I think this already occurs for document conversion software such as pandoc, and others on the internet. Generally speaking these converters only parse a subset of the commands. In addition regex can be used to extract certain tags of interest.






      share|improve this answer




















      • I know that parsers like pandoc can only parse a subset of the commands. E.g. Pandoc can not parse emphtextsectiontexttext. I am looking for a full pure LaTeX parser.
        – John webner
        2 hours ago










      • @Johnwebner Tbh, you shouldn't use such markup, but of course a parser might want to spit out something…
        – TeXnician
        2 hours ago










      • I think you are better off using tex then. doubt there can be a full parser without implementing tex or a subset.
        – GrandFleet
        2 hours ago












      up vote
      0
      down vote










      up vote
      0
      down vote









      I think this already occurs for document conversion software such as pandoc, and others on the internet. Generally speaking these converters only parse a subset of the commands. In addition regex can be used to extract certain tags of interest.






      share|improve this answer












      I think this already occurs for document conversion software such as pandoc, and others on the internet. Generally speaking these converters only parse a subset of the commands. In addition regex can be used to extract certain tags of interest.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 3 hours ago









      GrandFleet

      1565




      1565











      • I know that parsers like pandoc can only parse a subset of the commands. E.g. Pandoc can not parse emphtextsectiontexttext. I am looking for a full pure LaTeX parser.
        – John webner
        2 hours ago










      • @Johnwebner Tbh, you shouldn't use such markup, but of course a parser might want to spit out something…
        – TeXnician
        2 hours ago










      • I think you are better off using tex then. doubt there can be a full parser without implementing tex or a subset.
        – GrandFleet
        2 hours ago
















      • I know that parsers like pandoc can only parse a subset of the commands. E.g. Pandoc can not parse emphtextsectiontexttext. I am looking for a full pure LaTeX parser.
        – John webner
        2 hours ago










      • @Johnwebner Tbh, you shouldn't use such markup, but of course a parser might want to spit out something…
        – TeXnician
        2 hours ago










      • I think you are better off using tex then. doubt there can be a full parser without implementing tex or a subset.
        – GrandFleet
        2 hours ago















      I know that parsers like pandoc can only parse a subset of the commands. E.g. Pandoc can not parse emphtextsectiontexttext. I am looking for a full pure LaTeX parser.
      – John webner
      2 hours ago




      I know that parsers like pandoc can only parse a subset of the commands. E.g. Pandoc can not parse emphtextsectiontexttext. I am looking for a full pure LaTeX parser.
      – John webner
      2 hours ago












      @Johnwebner Tbh, you shouldn't use such markup, but of course a parser might want to spit out something…
      – TeXnician
      2 hours ago




      @Johnwebner Tbh, you shouldn't use such markup, but of course a parser might want to spit out something…
      – TeXnician
      2 hours ago












      I think you are better off using tex then. doubt there can be a full parser without implementing tex or a subset.
      – GrandFleet
      2 hours ago




      I think you are better off using tex then. doubt there can be a full parser without implementing tex or a subset.
      – GrandFleet
      2 hours ago

















       

      draft saved


      draft discarded















































       


      draft saved


      draft discarded














      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f456947%2fparser-for-pure-latex%23new-answer', 'question_page');

      );

      Post as a guest













































































      Comments

      Popular posts from this blog

      Long meetings (6-7 hours a day): Being “babysat” by supervisor

      Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

      Confectionery