Class: Opal::Lexer

Inherits:

Object

Object
Opal::Lexer

show all

Defined in:: opal/lib/opal/parser/lexer.rb

Overview

Lexer is used by Parser to step through ruby code, and returning tokens representing each chunk of ruby code.

Tokens are in the form:

[token, [value, location]]

where location is in the form [line_number, column_number]. The location data can be used to produce source maps in the compiler. Tokens are generally ruby symbols, and the value will always be a string value.

The main method used by the parser is #next_token, which is called repeatedly until a token of value false is returned, which indicated the EOF has been reached.

Generally this class is only used by Parser directly.

Constant Summary

STR_FUNC_ESCAPE =

0x01

STR_FUNC_EXPAND =

0x02

STR_FUNC_REGEXP =

0x04

STR_FUNC_QWORDS =

0x08

STR_FUNC_SYMBOL =

0x10

STR_FUNC_INDENT =

0x20

STR_FUNC_XQUOTE =

0x40

STR_SQUOTE =

0x00

STR_DQUOTE =

STR_FUNC_EXPAND

STR_XQUOTE =

STR_FUNC_EXPAND | STR_FUNC_XQUOTE

STR_REGEXP =

STR_FUNC_REGEXP | STR_FUNC_ESCAPE | STR_FUNC_EXPAND

STR_SWORD =

STR_FUNC_QWORDS

STR_DWORD =

STR_FUNC_QWORDS | STR_FUNC_EXPAND

STR_SSYM =

STR_FUNC_SYMBOL

STR_DSYM =

STR_FUNC_SYMBOL | STR_FUNC_EXPAND

Instance Attribute Summary collapse

#column ⇒ Object readonly
Returns the value of attribute column.
#eof_content ⇒ Object readonly
Returns the value of attribute eof_content.
#lex_state ⇒ Object
Returns the value of attribute lex_state.
#line ⇒ Object
Returns the value of attribute line.
#parser ⇒ Object
Returns the value of attribute parser.
#scanner ⇒ Object
Returns the value of attribute scanner.
#scope ⇒ Object readonly
Returns the value of attribute scope.
#strterm ⇒ Object
Returns the value of attribute strterm.
#yylval ⇒ Object
Returns the value of attribute yylval.

Instance Method Summary collapse

#add_string_content(str_buffer, str_parse) ⇒ Object
#after_operator? ⇒ Boolean
#arg? ⇒ Boolean
#beg? ⇒ Boolean
#check(regexp) ⇒ Object
#cmdarg? ⇒ Boolean
#cmdarg_lexpop ⇒ Object
#cmdarg_pop ⇒ Object
#cmdarg_push(n) ⇒ Object
#cond? ⇒ Boolean
#cond_lexpop ⇒ Object
#cond_pop ⇒ Object
#cond_push(n) ⇒ Object
#end? ⇒ Boolean
#has_local?(local) ⇒ Boolean
#here_document(str_parse) ⇒ Object
#heredoc_identifier ⇒ Object
#initialize(source, file) ⇒ Lexer constructor
Create a new instance using the given ruby code and filename for reference.
#label_state? ⇒ Boolean
#matched ⇒ Object
#new_op_asgn(value) ⇒ Object
#new_strterm(func, term, paren) ⇒ Object
#new_strterm2(func, term, paren) ⇒ Object
#next_token ⇒ Array
Returns next token from source input stream.
#parse_string ⇒ Object
#peek_variable_name ⇒ Object
#process_identifier(matched, cmd_start) ⇒ Object
#process_numeric ⇒ Object
#pushback(n) ⇒ Object
#read_escape ⇒ Object
#scan(regexp) ⇒ Object
#set_arg_state ⇒ Object
#skip(regexp) ⇒ Object
#space? ⇒ Boolean
#spcarg? ⇒ Boolean
#yylex ⇒ Object
Does the heavy lifting for next_token.

Constructor Details

#initialize(source, file) ⇒ `Lexer`

Create a new instance using the given ruby code and filename for reference.

Opal::Lexer.new("ruby code", "my_file.rb")

Parameters:

source (String) —
ruby code to lex
file (String) —
filename of given ruby code

# File 'opal/lib/opal/parser/lexer.rb', line 59

def initialize(source, file)
  @lex_state  = :expr_beg
  @cond       = 0
  @cmdarg     = 0
  @line       = 1
  @tok_line   = 1
  @column     = 0
  @tok_column = 0
  @file       = file

  @scanner = StringScanner.new(source)
  @scanner_stack = [@scanner]

  @case_stmt = nil
  @start_of_lambda = nil
end

Instance Attribute Details

#column ⇒ `Object` (readonly)

Returns the value of attribute column



42
43
44

# File 'opal/lib/opal/parser/lexer.rb', line 42

def column
  @column
end

#eof_content ⇒ `Object` (readonly)

Returns the value of attribute eof_content



44
45
46

# File 'opal/lib/opal/parser/lexer.rb', line 44

def eof_content
  @eof_content
end

#lex_state ⇒ `Object`

Returns the value of attribute lex_state



46
47
48

# File 'opal/lib/opal/parser/lexer.rb', line 46

def lex_state
  @lex_state
end

#line ⇒ `Object`

Returns the value of attribute line



42
43
44

# File 'opal/lib/opal/parser/lexer.rb', line 42

def line
  @line
end

#parser ⇒ `Object`

Returns the value of attribute parser



50
51
52

# File 'opal/lib/opal/parser/lexer.rb', line 50

def parser
  @parser
end

#scanner ⇒ `Object`

Returns the value of attribute scanner



48
49
50

# File 'opal/lib/opal/parser/lexer.rb', line 48

def scanner
  @scanner
end

#scope ⇒ `Object` (readonly)

Returns the value of attribute scope



43
44
45

# File 'opal/lib/opal/parser/lexer.rb', line 43

def scope
  @scope
end

#strterm ⇒ `Object`

Returns the value of attribute strterm



47
48
49

# File 'opal/lib/opal/parser/lexer.rb', line 47

def strterm
  @strterm
end

#yylval ⇒ `Object`

Returns the value of attribute yylval



49
50
51

# File 'opal/lib/opal/parser/lexer.rb', line 49

def yylval
  @yylval
end

Instance Method Details

#add_string_content(str_buffer, str_parse) ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 420

def add_string_content(str_buffer, str_parse)
  func = str_parse[:func]

  end_str_re = Regexp.new(Regexp.escape(str_parse[:term]))

  qwords = (func & STR_FUNC_QWORDS) != 0
  expand = (func & STR_FUNC_EXPAND) != 0
  regexp = (func & STR_FUNC_REGEXP) != 0
  escape = (func & STR_FUNC_ESCAPE) != 0
  xquote = (func == STR_XQUOTE)

  until scanner.eos?
    c = nil
    handled = true

    if check end_str_re
      # eos
      # if its just balancing, add it ass normal content..
      if str_parse[:balance] && (str_parse[:nesting] != 0)
        # we only checked above, so actually scan it
        scan end_str_re
        c = scanner.matched
        str_parse[:nesting] -= 1
      else
        # not balancing, so break (eos!)
        break
      end

    elsif str_parse[:balance] and scan Regexp.new(Regexp.escape(str_parse[:paren]))
      str_parse[:nesting] += 1
      c = scanner.matched

    elsif qwords && scan(/\s/)
      pushback(1)
      break
    elsif expand && check(/#(?=[\$\@\{])/)
      break
    elsif qwords and scan(/\s/)
      pushback(1)
      break
    elsif scan(/\\/)
      if xquote # opal - treat xstrings as dquotes? forces us to double escape
        c = "\\" + scan(/./)
      elsif qwords and scan(/\n/)
        str_buffer << "\n"
        next
      elsif expand and scan(/\n/)
        next
      elsif qwords and scan(/\s/)
        c = ' '
      elsif regexp
        if scan(/(.)/)
          c = "\\" + scanner.matched
        end
      elsif expand
        c = self.read_escape
      elsif scan(/\n/)
        # nothing..
      elsif scan(/\\/)
        if escape
          c = "\\\\"
        else
          c = scanner.matched
        end
      else # \\
        unless scan(end_str_re)
          str_buffer << "\\"
        else
          #c = scanner.matched
        end
      end
    else
      handled = false
    end

    unless handled
      reg = if qwords
              Regexp.new("[^#{Regexp.escape str_parse[:term]}\#\0\n\ \\\\]+|.")
            elsif str_parse[:balance]
              Regexp.new("[^#{Regexp.escape str_parse[:term]}#{Regexp.escape str_parse[:paren]}\#\0\\\\]+|.")
            else
              Regexp.new("[^#{Regexp.escape str_parse[:term]}\#\0\\\\]+|.")
            end

      scan reg
      c = scanner.matched
    end

    c ||= scanner.matched
    str_buffer << c
  end

  raise "reached EOF while in string" if scanner.eos?
end

#after_operator? ⇒ `Boolean`

Returns:

(Boolean)



143
144
145

# File 'opal/lib/opal/parser/lexer.rb', line 143

def after_operator?
  [:expr_fname, :expr_dot].include? @lex_state
end

#arg? ⇒ `Boolean`

Returns:

(Boolean)



131
132
133

# File 'opal/lib/opal/parser/lexer.rb', line 131

def arg?
  [:expr_arg, :expr_cmdarg].include? @lex_state
end

#beg? ⇒ `Boolean`

Returns:

(Boolean)



139
140
141

# File 'opal/lib/opal/parser/lexer.rb', line 139

def beg?
  [:expr_beg, :expr_value, :expr_mid, :expr_class].include? @lex_state
end

#check(regexp) ⇒ `Object`



181
182
183

# File 'opal/lib/opal/parser/lexer.rb', line 181

def check(regexp)
  @scanner.check regexp
end

#cmdarg? ⇒ `Boolean`

Returns:

(Boolean)



127
128
129

# File 'opal/lib/opal/parser/lexer.rb', line 127

def cmdarg?
  (@cmdarg & 1) != 0
end

#cmdarg_lexpop ⇒ `Object`



123
124
125

# File 'opal/lib/opal/parser/lexer.rb', line 123

def cmdarg_lexpop
  @cmdarg = (@cmdarg >> 1) | (@cmdarg & 1)
end

#cmdarg_pop ⇒ `Object`



119
120
121

# File 'opal/lib/opal/parser/lexer.rb', line 119

def cmdarg_pop
  @cmdarg = @cmdarg >> 1
end

#cmdarg_push(n) ⇒ `Object`



115
116
117

# File 'opal/lib/opal/parser/lexer.rb', line 115

def cmdarg_push(n)
  @cmdarg = (@cmdarg << 1) | (n & 1)
end

#cond? ⇒ `Boolean`

Returns:

(Boolean)



111
112
113

# File 'opal/lib/opal/parser/lexer.rb', line 111

def cond?
  (@cond & 1) != 0
end

#cond_lexpop ⇒ `Object`



107
108
109

# File 'opal/lib/opal/parser/lexer.rb', line 107

def cond_lexpop
  @cond = (@cond >> 1) | (@cond & 1)
end

#cond_pop ⇒ `Object`



103
104
105

# File 'opal/lib/opal/parser/lexer.rb', line 103

def cond_pop
  @cond = @cond >> 1
end

#cond_push(n) ⇒ `Object`



99
100
101

# File 'opal/lib/opal/parser/lexer.rb', line 99

def cond_push(n)
  @cond = (@cond << 1) | (n & 1)
end

#end? ⇒ `Boolean`

Returns:

(Boolean)



135
136
137

# File 'opal/lib/opal/parser/lexer.rb', line 135

def end?
  [:expr_end, :expr_endarg, :expr_endfn].include? @lex_state
end

#has_local?(local) ⇒ `Boolean`

Returns:

(Boolean)



95
96
97

# File 'opal/lib/opal/parser/lexer.rb', line 95

def has_local?(local)
  parser.scope.has_local?(local.to_sym)
end

#here_document(str_parse) ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 279

def here_document(str_parse)
  eos_regx = /[ \t]*#{Regexp.escape(str_parse[:term])}(\r*\n|$)/
  expand = true

  # Don't escape single-quoted heredoc identifiers
  escape = str_parse[:func] != STR_SQUOTE

  if check(eos_regx)
    scan(/[ \t]*#{Regexp.escape(str_parse[:term])}/)

    if str_parse[:scanner]
      @scanner_stack << str_parse[:scanner]
      @scanner = str_parse[:scanner]
    end

    return :tSTRING_END
  end

  str_buffer = []

  if scan(/#/)
    if tok = peek_variable_name
      return tok
    end

    str_buffer << '#'
  end

  until check(eos_regx) && scanner.bol?
    if scanner.eos?
      raise "reached EOF while in heredoc"
    end

    if scan(/\n/)
      str_buffer << scanner.matched
    elsif expand && check(/#(?=[\$\@\{])/)
      break
    elsif scan(/\\/)
      str_buffer << (escape ? self.read_escape : scanner.matched)
    else
      reg = Regexp.new("[^\#\0\\\\\n]+|.")

      scan reg
      str_buffer << scanner.matched
    end
  end

  complete_str = str_buffer.join ''
  @line += complete_str.count("\n")

  self.yylval = complete_str
  return :tSTRING_CONTENT
end

#heredoc_identifier ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 515

def heredoc_identifier
  if scan(/(-?)(['"])?(\w+)\2?/)
    escape_method = (@scanner[2] == "'") ? STR_SQUOTE : STR_DQUOTE
    heredoc = @scanner[3]

    self.strterm = new_strterm(escape_method, heredoc, heredoc)
    self.strterm[:type] = :heredoc

    # if ruby code at end of line after heredoc, we have to store it to
    # parse after heredoc is finished parsing
    end_of_line = scan(/.*\n/)
    self.strterm[:scanner] = StringScanner.new(end_of_line) if end_of_line != "\n"

    self.line += 1
    self.yylval = heredoc
    return :tSTRING_BEG
  end
end

#label_state? ⇒ `Boolean`

Returns:

(Boolean)



147
148
149

# File 'opal/lib/opal/parser/lexer.rb', line 147

def label_state?
  [:expr_beg, :expr_endfn].include?(@lex_state) or arg?
end

#matched ⇒ `Object`



189
190
191

# File 'opal/lib/opal/parser/lexer.rb', line 189

def matched
  @scanner.matched
end

#new_op_asgn(value) ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 207

def new_op_asgn(value)
  self.yylval = value
  :tOP_ASGN
end

#new_strterm(func, term, paren) ⇒ `Object`



198
199
200

# File 'opal/lib/opal/parser/lexer.rb', line 198

def new_strterm(func, term, paren)
  { :type => :string, :func => func, :term => term, :paren => paren }
end

#new_strterm2(func, term, paren) ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 202

def new_strterm2(func, term, paren)
  term = new_strterm(func, term, paren)
  term.merge({ :balance => true, :nesting => 0 })
end

#next_token ⇒ `Array`

Returns next token from source input stream.

Token in form:

[token, [value, [source_line, source_column]]]

Returns:

(Array)

# File 'opal/lib/opal/parser/lexer.rb', line 83

def next_token
  token     = self.yylex
  value     = self.yylval
  location  = [@tok_line, @tok_column]

  # once location is stored, ensure next token starts in correct place
  @tok_column = @column
  @tok_line = @line

  [token, [value, location]]
end

#parse_string ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 333

def parse_string
  str_parse = self.strterm
  func = str_parse[:func]

  space = false

  qwords = (func & STR_FUNC_QWORDS) != 0
  expand = (func & STR_FUNC_EXPAND) != 0
  regexp = (func & STR_FUNC_REGEXP) != 0

  space = true if qwords and scan(/\s+/)

  # if not end of string, so we must be parsing contents
  str_buffer = []

  if scan Regexp.new(Regexp.escape(str_parse[:term]))
    if qwords && !str_parse[:done_last_space]#&& space
      str_parse[:done_last_space] = true
      pushback(1)
      self.yylval = ' '
      return :tSPACE
    end

    if str_parse[:balance]
      if str_parse[:nesting] == 0

        if regexp
          self.yylval = scan(/\w+/)
          return :tREGEXP_END
        end
        return :tSTRING_END
      else
        str_buffer << scanner.matched
        str_parse[:nesting] -= 1
        self.strterm = str_parse
      end
    elsif regexp
      @lex_state = :expr_end
      self.yylval = scan(/\w+/)
      return :tREGEXP_END
    else
      if str_parse[:scanner]
        @scanner_stack << str_parse[:scanner]
        @scanner = str_parse[:scanner]
      end

      return :tSTRING_END
    end
  end

  if space
    self.yylval = ' '
    return :tSPACE
  end

  if str_parse[:balance] and scan Regexp.new(Regexp.escape(str_parse[:paren]))
    str_buffer << scanner.matched
    str_parse[:nesting] += 1
  elsif check(/#[@$]/)
    scan(/#/)
    if expand
      return :tSTRING_DVAR
    else
      str_buffer << scanner.matched
    end

  elsif scan(/#\{/)
    if expand
      return :tSTRING_DBEG
    else
      str_buffer << scanner.matched
    end

  # causes error, so we will just collect it later on with other text
  elsif scan(/\#/)
    str_buffer << '#'
  end

  add_string_content str_buffer, str_parse

  complete_str = str_buffer.join ''
  @line += complete_str.count("\n")

  self.yylval = complete_str
  return :tSTRING_CONTENT
end

#peek_variable_name ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 271

def peek_variable_name
  if check(/[@$]/)
    :tSTRING_DVAR
  elsif scan(/\{/)
    :tSTRING_DBEG
  end
end

#process_identifier(matched, cmd_start) ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 534

def process_identifier(matched, cmd_start)
  last_state = @lex_state

  if label_state? and !check(/::/) and scan(/:/)
    @lex_state = :expr_beg
    self.yylval = matched
    return :tLABEL
  end

  if matched == 'defined?'
    if after_operator?
      @lex_state = :expr_end
      return :tIDENTIFIER
    end

    @lex_state = :expr_arg
    return :kDEFINED
  end

  if matched.end_with? '?', '!'
    result = :tIDENTIFIER
  else
    if @lex_state == :expr_fname
      if !check(/\=\>/) and scan(/\=/)
        result = :tIDENTIFIER
        matched += scanner.matched
      end

    elsif matched =~ /#{REGEXP_START}[A-Z]/
      result = :tCONSTANT
    else
      result = :tIDENTIFIER
    end
  end

  if @lex_state != :expr_dot and kw = Keywords.keyword(matched)
    old_state = @lex_state
    @lex_state = kw.state

    if old_state == :expr_fname
      self.yylval = kw.name
      return kw.id[0]
    end

    if @lex_state == :expr_beg
      cmd_start = true
    end

    if matched == "do"
      if after_operator?
        @lex_state = :expr_end
        return :tIDENTIFIER
      end

      if @start_of_lambda
        @start_of_lambda = false
        @lex_state = :expr_beg
        return :kDO_LAMBDA
      elsif cond?
        @lex_state = :expr_beg
        return :kDO_COND
      elsif cmdarg? && @lex_state != :expr_cmdarg
        @lex_state = :expr_beg
        return :kDO_BLOCK
      elsif @lex_state == :expr_endarg
        return :kDO_BLOCK
      else
        @lex_state = :expr_beg
        return :kDO
      end
    else
      if old_state == :expr_beg or old_state == :expr_value
        self.yylval = matched
        return kw.id[0]
      else
        if kw.id[0] != kw.id[1]
          @lex_state = :expr_beg
        end

        self.yylval = matched
        return kw.id[1]
      end
    end
  end

  if [:expr_beg, :expr_dot, :expr_mid, :expr_arg, :expr_cmdarg].include? @lex_state
    @lex_state = cmd_start ? :expr_cmdarg : :expr_arg
  elsif @lex_state == :expr_fname
    @lex_state = :expr_endfn
  else
    @lex_state = :expr_end
  end

  if ![:expr_dot, :expr_fname].include?(last_state) and has_local?(matched)
    @lex_state = :expr_end
  end

  return matched =~ /#{REGEXP_START}[A-Z]/ ? :tCONSTANT : :tIDENTIFIER
end

#process_numeric ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 212

def process_numeric
  @lex_state = :expr_end

  if scan(/[\d_]+\.[\d_]+\b|[\d_]+(\.[\d_]+)?[eE][-+]?[\d_]+\b/) # FLOATS
    self.yylval = scanner.matched.gsub(/_/, '').to_f
    return :tFLOAT
  elsif scan(/([^0][\d_]*|0)\b/)                                 # BASE 10
    self.yylval = scanner.matched.gsub(/_/, '').to_i
    return :tINTEGER
  elsif scan(/0[bB](0|1|_)+/)                                    # BASE 2
    self.yylval = scanner.matched.to_i(2)
    return :tINTEGER
  elsif scan(/0[xX](\d|[a-f]|[A-F]|_)+/)                         # BASE 16
    self.yylval = scanner.matched.to_i(16)
    return :tINTEGER
  elsif scan(/0[oO]?([0-7]|_)+/)                                 # BASE 8
    self.yylval = scanner.matched.to_i(8)
    return :tINTEGER
  elsif scan(/0[dD]([0-9]|_)+/)                                  # BASE 10
    self.yylval = scanner.matched.gsub(/_/, '').to_i
    return :tINTEGER
  else
    raise "Lexing error on numeric type: `#{scanner.peek 5}`"
  end
end

#pushback(n) ⇒ `Object`



185
186
187

# File 'opal/lib/opal/parser/lexer.rb', line 185

def pushback(n)
  @scanner.pos -= n
end

#read_escape ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 238

def read_escape
  if scan(/\\/)
    "\\"
  elsif scan(/n/)
    "\n"
  elsif scan(/t/)
    "\t"
  elsif scan(/r/)
    "\r"
  elsif scan(/f/)
    "\f"
  elsif scan(/v/)
    "\v"
  elsif scan(/a/)
    "\a"
  elsif scan(/b/)
    "\b"
  elsif scan(/e/)
    "\e"
  elsif scan(/s/)
    " "
  elsif scan(/[0-7]{1,3}/)
    (matched.to_i(8) % 0x100).chr
  elsif scan(/x([0-9a-fA-F]{1,2})/)
    scanner[1].to_i(16).chr
  elsif scan(/u([0-9a-zA-Z]{1,4})/)
    scanner[1].to_i(16).chr(Encoding::UTF_8)
  else
    # escaped char doesnt need escaping, so just return it
    scan(/./)
  end
end

#scan(regexp) ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 163

def scan(regexp)
  if result = @scanner.scan(regexp)
    @column += result.length
    @yylval += @scanner.matched
  end

  result
end

#set_arg_state ⇒ `Object`



159
160
161

# File 'opal/lib/opal/parser/lexer.rb', line 159

def set_arg_state
  @lex_state = after_operator? ? :expr_arg : :expr_beg
end

#skip(regexp) ⇒ `Object`

# File 'opal/lib/opal/parser/lexer.rb', line 172

def skip(regexp)
  if result = @scanner.scan(regexp)
    @column += result.length
    @tok_column = @column
  end

  result
end

#space? ⇒ `Boolean`

Returns:

(Boolean)



155
156
157

# File 'opal/lib/opal/parser/lexer.rb', line 155

def space?
  @scanner.check(/\s/)
end

#spcarg? ⇒ `Boolean`

Returns:

(Boolean)



151
152
153

# File 'opal/lib/opal/parser/lexer.rb', line 151

def spcarg?
  arg? and @space_seen and !space?
end

#yylex ⇒ `Object`

Does the heavy lifting for next_token.

# File 'opal/lib/opal/parser/lexer.rb', line 635

def yylex
  @yylval = ''
  @space_seen = false
  cmd_start = false
  c = ''

  if self.strterm
    if self.strterm[:type] == :heredoc
      token = here_document(self.strterm)
    else
      token = parse_string
    end

    if token == :tSTRING_END or token == :tREGEXP_END
      self.strterm = nil
      @lex_state = :expr_end
    end

    return token
  end

  while true
    if skip(/\ |\t|\r/)
      @space_seen = true
      next

    elsif skip(/(\n|#)/)
      c = scanner.matched
      if c == '#'
        skip(/(.*)/)
      else
        self.line += 1
      end

      skip(/(\n+)/)

      if scanner.matched
        self.line += scanner.matched.length
      end

      next if [:expr_beg, :expr_dot].include? @lex_state

      if skip(/([\ \t\r\f\v]*)\./)
        @space_seen = true unless scanner[1].empty?
        pushback(1)

        next unless check(/\.\./)
      end

      cmd_start = true
      @lex_state = :expr_beg
      self.yylval = '\\n'
      return :tNL

    elsif scan(/\;/)
      @lex_state = :expr_beg
      return :tSEMI

    elsif check(/\*/)
      if scan(/\*\*\=/)
        @lex_state = :expr_beg
        return new_op_asgn('**')
      elsif scan(/\*\*/)
        self.set_arg_state
        return :tPOW
      elsif scan(/\*\=/)
        @lex_state = :expr_beg
        return new_op_asgn('*')
      else
        scan(/\*/)

        if after_operator?
          @lex_state = :expr_arg
          return :tSTAR2
        elsif @space_seen && check(/\S/)
          @lex_state = :expr_beg
          return :tSTAR
        elsif [:expr_beg, :expr_mid].include? @lex_state
          @lex_state = :expr_beg
          return :tSTAR
        else
          @lex_state = :expr_beg
          return :tSTAR2
        end
      end

    elsif scan(/\!/)
      if after_operator?
        @lex_state = :expr_arg
        if scan(/@/)
          return :tBANG, '!'
        end
      else
        @lex_state = :expr_beg
      end

      if scan(/\=/)
        return :tNEQ
      elsif scan(/\~/)
        return :tNMATCH
      end

      return :tBANG

    elsif scan(/\=/)
      if @lex_state == :expr_beg and !@space_seen
        if scan(/begin/) and space?
          scan(/(.*)/) # end of line
          line_count = 0

          while true
            if scanner.eos?
              raise "embedded document meets end of file"
            end

            if scan(/\=end/) and space?
              @line += line_count
              return yylex
            end

            if scan(/\n/)
              line_count += 1
              next
            end

            scan(/(.*)/)
          end
        end
      end

      self.set_arg_state

      if scan(/\=/)
        if scan(/\=/)
          return :tEQQ
        end

        return :tEQ
      end

      if scan(/\~/)
        return :tMATCH
      elsif scan(/\>/)
        return :tASSOC
      end

      return :tEQL

    elsif scan(/\"/)
      self.strterm = new_strterm(STR_DQUOTE, '"', "\0")
      return :tSTRING_BEG

    elsif scan(/\'/)
      self.strterm = new_strterm(STR_SQUOTE, "'", "\0")
      return :tSTRING_BEG

    elsif scan(/\`/)
      self.strterm = new_strterm(STR_XQUOTE, "`", "\0")
      return :tXSTRING_BEG

    elsif scan(/\&/)
      if scan(/\&/)
        @lex_state = :expr_beg

        if scan(/\=/)
          return new_op_asgn('&&')
        end

        return :tANDOP

      elsif scan(/\=/)
        @lex_state = :expr_beg
        return new_op_asgn('&')
      end

      if spcarg?
        #puts "warning: `&' interpreted as argument prefix"
        result = :tAMPER
      elsif beg?
        result = :tAMPER
      else
        #puts "warn_balanced: & argument prefix"
        result = :tAMPER2
      end

      self.set_arg_state
      return result

    elsif scan(/\|/)
      if scan(/\|/)
        @lex_state = :expr_beg
        if scan(/\=/)
          return new_op_asgn('||')
        end

        return :tOROP

      elsif scan(/\=/)
        return new_op_asgn('|')
      end

      self.set_arg_state
      return :tPIPE

    elsif scan(/\%[QqWwixrs]/)
      str_type = scanner.matched[1, 1]
      paren = term = scan(/./)

      case term
      when '(' then term = ')'
      when '[' then term = ']'
      when '{' then term = '}'
      when '<' then term = '>'
      else paren = "\0"
      end

      token, func = case str_type
                    when 'Q'
                      [:tSTRING_BEG, STR_DQUOTE]
                    when 'q'
                      [:tSTRING_BEG, STR_SQUOTE]
                    when 'W'
                      skip(/\s*/)
                      [:tWORDS_BEG, STR_DWORD]
                    when 'w', 'i'
                      skip(/\s*/)
                      [:tAWORDS_BEG, STR_SWORD]
                    when 'x'
                      [:tXSTRING_BEG, STR_XQUOTE]
                    when 'r'
                      [:tREGEXP_BEG, STR_REGEXP]
                    when 's'
                      [:tSTRING_BEG, STR_SQUOTE]
                    end

      self.strterm = new_strterm2(func, term, paren)
      return token

    elsif scan(/\//)
      if beg?
        self.strterm = new_strterm(STR_REGEXP, '/', '/')
        return :tREGEXP_BEG
      elsif scan(/\=/)
        @lex_state = :expr_beg
        return new_op_asgn('/')
      end

      if arg?
        if !check(/\s/) && @space_seen
          self.strterm = new_strterm(STR_REGEXP, '/', '/')
          return :tREGEXP_BEG
        end
      end

      if after_operator?
        @lex_state = :expr_arg
      else
        @lex_state = :expr_beg
      end

      return :tDIVIDE

    elsif scan(/\%/)
      if scan(/\=/)
        @lex_state = :expr_beg
        return new_op_asgn('%')
      elsif check(/[^\s]/)
        if @lex_state == :expr_beg or (@lex_state == :expr_arg && @space_seen)
          start_word  = scan(/./)
          end_word    = { '(' => ')', '[' => ']', '{' => '}' }[start_word] || start_word
          self.strterm = new_strterm2(STR_DQUOTE, end_word, start_word)
          return :tSTRING_BEG
        end
      end

      self.set_arg_state

      return :tPERCENT

    elsif scan(/\\/)
      if scan(/\r?\n/)
        @space_seen = true
        next
      end

      raise SyntaxError, "backslash must appear before newline :#{@file}:#{@line}"

    elsif scan(/\(/)
      result = scanner.matched
      if beg?
        result = :tLPAREN
      elsif @space_seen && arg?
        result = :tLPAREN_ARG
      else
        result = :tLPAREN2
      end

      @lex_state = :expr_beg
      cond_push 0
      cmdarg_push 0

      return result

    elsif scan(/\)/)
      cond_lexpop
      cmdarg_lexpop
      @lex_state = :expr_end
      return :tRPAREN

    elsif scan(/\[/)
      result = scanner.matched

      if after_operator?
        @lex_state = :expr_arg
        if scan(/\]=/)
          return :tASET
        elsif scan(/\]/)
          return :tAREF
        else
          raise "Unexpected '[' token"
        end
      elsif beg?
        result = :tLBRACK
      elsif arg? && @space_seen
        result =  :tLBRACK
      else
        result = :tLBRACK2
      end

      @lex_state = :expr_beg
      cond_push 0
      cmdarg_push 0
      return result

    elsif scan(/\]/)
      cond_lexpop
      cmdarg_lexpop
      @lex_state = :expr_end
      return :tRBRACK

    elsif scan(/\}/)
      cond_lexpop
      cmdarg_lexpop
      @lex_state = :expr_end

      return :tRCURLY

    elsif scan(/\.\.\./)
      @lex_state = :expr_beg
      return :tDOT3

    elsif scan(/\.\./)
      @lex_state = :expr_beg
      return :tDOT2

    elsif scan(/\./)
      @lex_state = :expr_dot unless @lex_state == :expr_fname
      return :tDOT

    elsif scan(/\:\:/)
      if beg?
        @lex_state = :expr_beg
        return :tCOLON3
      elsif spcarg?
        @lex_state = :expr_beg
        return :tCOLON3
      end

      @lex_state = :expr_dot
      return :tCOLON2

    elsif scan(/\:/)
      if end? || check(/\s/)
        unless check(/\w/)
          @lex_state = :expr_beg
          return :tCOLON
        end

        @lex_state = :expr_fname
        return :tSYMBEG
      end

      if scan(/\'/)
        self.strterm = new_strterm(STR_SSYM, "'", "\0")
      elsif scan(/\"/)
        self.strterm = new_strterm(STR_DSYM, '"', "\0")
      end

      @lex_state = :expr_fname
      return :tSYMBEG

    elsif scan(/\^\=/)
      @lex_state = :expr_beg
      return new_op_asgn('^')

    elsif scan(/\^/)
      self.set_arg_state
      return :tCARET

    elsif check(/\</)
      if scan(/\<\<\=/)
        @lex_state = :expr_beg
        return new_op_asgn('<<')

      elsif scan(/\<\</)
        if after_operator?
          @lex_state = :expr_arg
          return :tLSHFT
        elsif !after_operator? && !end? && (!arg? || @space_seen)
          if token = heredoc_identifier
            return token
          end

          @lex_state = :expr_beg
          return :tLSHFT
        end
        @lex_state = :expr_beg
        return :tLSHFT
      elsif scan(/\<\=\>/)
        if after_operator?
          @lex_state = :expr_arg
        else
          if @lex_state == :expr_class
            cmd_start = true
          end

          @lex_state = :expr_beg
        end

        return :tCMP
      elsif scan(/\<\=/)
        self.set_arg_state
        return :tLEQ

      elsif scan(/\</)
        self.set_arg_state
        return :tLT
      end

    elsif check(/\>/)
      if scan(/\>\>\=/)
        return new_op_asgn('>>')

      elsif scan(/\>\>/)
        self.set_arg_state
        return :tRSHFT

      elsif scan(/\>\=/)
        self.set_arg_state
        return :tGEQ

      elsif scan(/\>/)
        self.set_arg_state
        return :tGT
      end

    elsif scan(/->/)
      # FIXME: # should be :expr_arg, but '(' breaks it...
      @lex_state = :expr_end
      @start_of_lambda = true
      return :tLAMBDA

    elsif scan(/[+-]/)
      matched = scanner.matched
      sign, utype = if matched == '+'
                      [:tPLUS, :tUPLUS]
                    else
                      [:tMINUS, :tUMINUS]
                    end

      if beg?
        @lex_state = :expr_mid
        self.yylval = matched
        if scanner.peek(1) =~ /\d/ and
          return utype == :tUMINUS ? '-@NUM' : '+@NUM'
        else
          return utype
        end
      elsif after_operator?
        @lex_state = :expr_arg
        if scan(/@/)
          self.yylval = matched + '@'
          return :tIDENTIFIER
        end

        self.yylval = matched
        return sign
      end

      if scan(/\=/)
        @lex_state = :expr_beg
        return new_op_asgn(matched)
      end

      if spcarg?
        @lex_state = :expr_mid
        self.yylval = matched
        return utype
      end

      @lex_state = :expr_beg
      self.yylval = matched
      return sign

    elsif scan(/\?/)
      if end?
        @lex_state = :expr_beg
        return :tEH
      end

      if check(/\ |\t|\r|\s/)
        @lex_state = :expr_beg
        return :tEH
      elsif scan(/\\/)
        @lex_state = :expr_end
        self.yylval = self.read_escape
        return :tSTRING
      end

      @lex_state = :expr_end
      self.yylval = scan(/./)
      return :tSTRING

    elsif scan(/\~/)
      self.set_arg_state
      return :tTILDE

    elsif check(/\$/)
      if scan(/\$([1-9]\d*)/)
        @lex_state = :expr_end
        self.yylval = scanner.matched.sub('$', '')
        return :tNTH_REF

      elsif scan(/(\$_)(\w+)/)
        @lex_state = :expr_end
        return :tGVAR

      elsif scan(/\$[\+\'\`\&!@\"~*$?\/\\:;=.,<>_]/)
        @lex_state = :expr_end
        return :tGVAR
      elsif scan(/\$\w+/)
        @lex_state = :expr_end
        return :tGVAR
      else
        raise "Bad gvar name: #{scanner.peek(5).inspect}"
      end

    elsif scan(/\$\w+/)
      @lex_state = :expr_end
      return :tGVAR

    elsif scan(/\@\@\w*/)
      @lex_state = :expr_end
      return :tCVAR

    elsif scan(/\@\w*/)
      @lex_state = :expr_end
      return :tIVAR

    elsif scan(/\,/)
      @lex_state = :expr_beg
      return :tCOMMA

    elsif scan(/\{/)
      if @start_of_lambda
        @start_of_lambda = false
        @lex_state = :expr_beg
        return :tLAMBEG

      elsif arg? or @lex_state == :expr_end
        result = :tLCURLY
      elsif @lex_state == :expr_endarg
        result = :LBRACE_ARG
      else
        result = :tLBRACE
      end

      @lex_state = :expr_beg
      cond_push 0
      cmdarg_push 0
      return result

    elsif scanner.bol? and skip(/\__END__(\n|$)/)
      while true
        if scanner.eos?
          @eof_content = self.yylval
          return false
        end

        scan(/(.*)/)
        scan(/\n/)
      end

    elsif check(/[0-9]/)
      return process_numeric

    elsif scan(/(\w)+[\?\!]?/)
      return process_identifier scanner.matched, cmd_start
    end

    if scanner.eos?
      if @scanner_stack.size == 1 # our main scanner, we cant pop this
        self.yylval = false
        return false
      else # we were probably parsing a heredoc, so pop that parser and continue
        @scanner_stack.pop
        @scanner = @scanner_stack.last
        return yylex
      end
    end

    raise "Unexpected content in parsing stream `#{scanner.peek 5}` :#{@file}:#{@line}"
  end
end

Class: Opal::Lexer

Overview

Constant Summary

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source, file) ⇒ Lexer

Instance Attribute Details

#column ⇒ Object (readonly)

#eof_content ⇒ Object (readonly)

#lex_state ⇒ Object

#line ⇒ Object

#parser ⇒ Object

#scanner ⇒ Object

#scope ⇒ Object (readonly)

#strterm ⇒ Object

#yylval ⇒ Object

Instance Method Details

#add_string_content(str_buffer, str_parse) ⇒ Object

#after_operator? ⇒ Boolean

#arg? ⇒ Boolean

#beg? ⇒ Boolean

#check(regexp) ⇒ Object

#cmdarg? ⇒ Boolean

#cmdarg_lexpop ⇒ Object

#cmdarg_pop ⇒ Object

#cmdarg_push(n) ⇒ Object

#cond? ⇒ Boolean

#cond_lexpop ⇒ Object

#cond_pop ⇒ Object

#cond_push(n) ⇒ Object

#end? ⇒ Boolean

#has_local?(local) ⇒ Boolean

#here_document(str_parse) ⇒ Object

#heredoc_identifier ⇒ Object

#label_state? ⇒ Boolean

#matched ⇒ Object

#new_op_asgn(value) ⇒ Object

#new_strterm(func, term, paren) ⇒ Object

#new_strterm2(func, term, paren) ⇒ Object

#next_token ⇒ Array

#parse_string ⇒ Object

#peek_variable_name ⇒ Object

#process_identifier(matched, cmd_start) ⇒ Object

#process_numeric ⇒ Object

#pushback(n) ⇒ Object

#read_escape ⇒ Object

#scan(regexp) ⇒ Object

#set_arg_state ⇒ Object

#skip(regexp) ⇒ Object

#space? ⇒ Boolean

#spcarg? ⇒ Boolean

#yylex ⇒ Object

#initialize(source, file) ⇒ `Lexer`

#column ⇒ `Object` (readonly)

#eof_content ⇒ `Object` (readonly)

#lex_state ⇒ `Object`

#line ⇒ `Object`

#parser ⇒ `Object`

#scanner ⇒ `Object`

#scope ⇒ `Object` (readonly)

#strterm ⇒ `Object`

#yylval ⇒ `Object`

#add_string_content(str_buffer, str_parse) ⇒ `Object`

#after_operator? ⇒ `Boolean`

#arg? ⇒ `Boolean`

#beg? ⇒ `Boolean`

#check(regexp) ⇒ `Object`

#cmdarg? ⇒ `Boolean`

#cmdarg_lexpop ⇒ `Object`

#cmdarg_pop ⇒ `Object`

#cmdarg_push(n) ⇒ `Object`

#cond? ⇒ `Boolean`

#cond_lexpop ⇒ `Object`

#cond_pop ⇒ `Object`

#cond_push(n) ⇒ `Object`

#end? ⇒ `Boolean`

#has_local?(local) ⇒ `Boolean`

#here_document(str_parse) ⇒ `Object`

#heredoc_identifier ⇒ `Object`

#label_state? ⇒ `Boolean`

#matched ⇒ `Object`

#new_op_asgn(value) ⇒ `Object`

#new_strterm(func, term, paren) ⇒ `Object`

#new_strterm2(func, term, paren) ⇒ `Object`

#next_token ⇒ `Array`

#parse_string ⇒ `Object`

#peek_variable_name ⇒ `Object`

#process_identifier(matched, cmd_start) ⇒ `Object`

#process_numeric ⇒ `Object`

#pushback(n) ⇒ `Object`

#read_escape ⇒ `Object`

#scan(regexp) ⇒ `Object`

#set_arg_state ⇒ `Object`

#skip(regexp) ⇒ `Object`

#space? ⇒ `Boolean`

#spcarg? ⇒ `Boolean`

#yylex ⇒ `Object`